These examples are shown using VBVoice 7.1 on a Windows 7 environment in Visual Studio 2010.
Developers choosing an IVR product face a number of hardware and software choices. Chief among these decisions is the choice of development environment – it can determine success or failure of a development initiative and make the difference between a profitable project rollout or a costly failure. This document highlights some of the critical issues that should be considered:
The ideal programming environment should accomplish a number of goals, the most important being ease of use.
To maximize ease of use, most telephony and speech toolkits offer a degree of visual programming through a “drag-and-drop” style interface.
However, while visual programming has the potential to greatly enhance ease of use, it can also become a limiting factor. A short learning curve sometimes comes at the expense of other developer productivity aspects that may only become obvious after the tool is in use (extensibility is key among these). The developer's ability to extend the visual programming environment by using modern programming languages to incorporate other components not provided by the toolkit or to take full advantage of new hardware and software is critical for most real-world telephony applications. For example, the ability to integrate with .NET environments and applications is an increasingly significant decision criterion.
To further shorten the learning curve for developers new to telephony and speech, the programming language should, ideally, be industry-standard (such as C#, .NET) and not proprietary to the selected tool.
While most development environments for computer telephony applications allow for the addition of external functionality, some require knowledge of proprietary scripting languages or handling of complex, low-level programming. Preferably the application development environment will enable developers to leverage their programming knowledge and expand on the telephony controls included in the toolkit through custom code, as well as make use of third-party components.
The right development and debugging tools can save your project. Attempting to uncover call flow or recognition problems in an application without a rich set of tools sharply reduces developer productivity and increases error rates of the final application.Ideally, the chosen programming environment should:
Together these characteristics allow developers to create sophisticated and powerful applications without a long learning curve and ongoing trial and error during application development. If you're building an application on Windows®, don't miss out on the benefits of the next generation technologies from Microsoft® (your tool must support .NET).
The ability to break your speech application into co-operating modules is a must. Not only does it improve scalability, reliability and performance of your system, but it also saves you money in both development and production.
A modular system is cheaper to build and maintain. In development, programmers benefit from working in parallel on well-defined modules. In production, independent module provisioning and software
hot swaps eliminate costly system downtimes. At the same time, separating application logic from telephony and speech processing allows resource sharing, which in turn leads to more efficient utilization. Finally, distributing your modules across a local area network (LAN) enables load balancing and effortless scalability - again resulting in savings on system maintenance.
The biggest benefit, however, comes from increased reliability of a modular system. Nothing is more frustrating to callers than a system that crashes into "dead silence" in the middle of a transaction. An unreliable system will be soon pulled out of production, which always means significant financial losses.
A monolithic executable is only as reliable as its weakest component, while a modular system can stay operational even after losing one of its modules. Therefore, it is very important that application modules execute properly separated from each other and rom the system processes, so that a fatal error in one doesn't bring down the whole system. The modules should run out of process, or even better, distributed across a LAN. Ideally, modules should be compiled directly into standalone executables, not into intermediate scripts or p-code. Not only does this speed up program execution, it also removes the dependency on a shared runtime engine as a single point of failure.
The architecture of your platform should offer proven scalability and the ability to hot swap applications. It is important that applications can execute independently from each other and from the system processes, can be hot-swapped, easily provisioned and configured.
In hosting environments or in situations where multiple applications are being deployed, the platform should enable the sharing of telephony resources (e.g., speech licenses and telephony hardware) across multiple independent applications. Such a distributed architecture also allows individual applications to be interrupted for upgrades or other maintenance without interrupting other applications on the same server.
While some environments allow the sharing of telephony hardware and hot swapping of applications, developers should ensure that the platform does not create a monolithic executable for multiple applications. In such a system, the failure of one module could stop the entire system, as the monolithic executable is only as reliable as its weakest module.
While your code may be bulletproof, can you guarantee the same for all the components and libraries you have to use?
The world of telephony and speech applications is a complex one, with multiple hardware and programming interfaces and a plethora of standards. The resulting learning curve for new developers tends to be very steep. This is where rapid application development tools really shine.
Their controls encapsulate and abstract common call processes to simplify application development and shield the developer from hardware specific programming. In evaluating your application development tool, you should look not only at the raw number of these controls, but rather at their depth, customizability and extensibility. Otherwise, you may find that the feature you are looking for is simply not doable in the environment of your choice.
The tool should also support advanced protocols such as integrated services for digital network (ISDN) and voice over internet protocol (VoIP) and offer a comprehensive lineup of call control features for your particular application. For example, call queuing, agent monitoring, recording and conferencing capabilities are significant in call center applications, while other solutions may require broad fax support, web integration, switch integration via TAPI, etc.
The implementation of an effective automatic speech recognition (ASR) solution can reduce the number of agents, supervisors, trainers and quality assurance specialists that are needed by your business. If a consumer is provided the option of gathering the information s/he needs without accessing an agent more agents are free to handle calls that cannot be resolved with self service.
The market for speech recognition engines is continually evolving. Consequently, the development platform should support multiple ASR engines, letting you select the appropriate engine for each individual application development effort.
Additional speech capabilities to look out for:
Most of the top ASR vendors are now exposing their ASR engines through MRCP, a standard which allows using the same client with different engines.
Text to speech (TTS) is a technology that allows you to create a real-time link between text-based content in your database and a customer awaiting an immediate reply. TTS can read any text outloud without using prerecorded prompts. This technology is mature; it has been validated by market deployments and is already largely used in telephony services provided by carriers and enterprises alike.
The development platform should support multiple TTS engines.
Most of the top TTS vendors are now exposing their ASR engines through media resource control protocol (MRCP), a standard which allows using the same client with different engines.
The flexibility to mix and match TTS engines through a MRCP Connector is important because it offers the choice of the desired engines and voices.
Another important consideration for developers is the ease of product licensing and deployment. This is particularly true for commercial application developers, such as system integrators and independent software vendors (ISVs). Ideally, deployment of a finished application should be as painless as possible, yet assure the developer of the licensing integrity of the finished product.
Consider whether the platform requires the use of "dongles" for commercial application deployment, or whether the process is only software enabled. The latter approach allows for easy upgrades and also has the potential for the relicensing of developed applications to other customers.
Depending on your organization, standards such as .NET or VoiceXML can also play an important role in your tool selection, but other considerations will likely be more important in the day-to-day development work. In addition, standards may impose restrictions on your development effort since certain new developments and/or specific capabilities that you are looking for have not yet been reflected in a generally slower evolving public standard.
VoiceXML, for example, has an acknowledged weakness in the area of call control. For the foreseeable future, other platforms will continue to exist, and some of them are beginning to extend to support standards environments.
Programming in VB, .NET, C#, which are industry standard programming languages, eliminates the need to learn proprietary languages and shortens the learning curve for developers new to the IVR and telephony landscape without the need to worry about the telephony hardware application programming interface (APIs) or media processing layers. Thereby allowing you to rapidly create powerful IVR and voice-enabled communication solutions while significantly reducing your time to market.