Paul Donnelly, Chief technology Officer, BlueChip Technologies

Q Tell us about BlueChip Technologies. How did you get started? What technologies do you develop?

A BlueChip Technologies is a software company specializing in speech recognition and wireless applications and is based in Ireland. We have very close ties with Queens University in Belfast which has an active speech technologies research group. In 2000, the company cooperated in developing some of the research work, which the university had carried out with a particular focus on the robust recognition work, which was being developed for the wireless world. I strongly feel that achieving higher levels of recognition accuracy in mobile applications is one of the key factors facing the speech technology industry. We saw great potential in the research being done in developing an innovative new technique aimed at increasing recognition accuracy in noisy environments and negotiated an exclusive technology license with the University last year for the commercialization of the technique. Since then we have been further refining the technique which we have named "Clear" that we hope to license to other recognition engine companies. In addition it is our desire to develop vertical applications that maximize the benefits of this technique.

We're particularly excited to exhibit at Speechtek 2002 this year where we demonstrated the potential of this technique in one of the toughest environments of all, the exhibition hall.

Q What is BlueChip's basis of funding? What type of company do you expect to be in 12 months? Three years?

A The company was started with a small pool of capital injected by the founders which funded early product development. We have raised approximately $3m in venture capital and development funding in the last eighteen months from Northern Ireland based venture capital organizations.

For the next twelve months we expect to continue to be primarily a products and services based organization, primarily with our "Guardian" application. Sales have primarily come from our direct efforts so far, and we have a strong pipeline of channel partners who have already delivered significant new revenues in the last few months. In three years time we expect the organization to drive a substantial portion of its income from licensing intellectual property based on the "Clear" product referred to in question 1. This has been created primarily around noisy speech recognition, the types of real-world situations we use our mobile phone in. The intervening period will also see the company's traditional speech application products expand out of the UK and Ireland into continental Europe and North America, primarily through our channel partners.

Q What do you believe will be key market drivers for this technology in the short-term? Long-term?

A The speech technology industry has a great story to tell but probably the biggest driver in the short / medium term will be having more end users actually encountering an application in action which will grow acceptance that the technology actually works, and is not just a promise. In today's economy this represents a straightforward trade off between volume and revenue for the industry as in revenue terms it's still small.

In the long term, things like "hands free" legislation will have an impact provided recognition accuracy is good enough to meet these needs. I am not convinced that speech technology will take over a massive slice of the call center industry whilst low cost human options, such as those found in Asia offer a strong alternative.

Q What should the speech technology industry as a whole be doing to increase the growth rate of speech technology deployments?

A Beyond voice dialers, only a very small percentage of consumers have used telephone-based speech recognition applications. It is imperative that we make the user experience as rewarding and commonplace as possible. Many developers simply translate IVR applications into voice applications. In the main these offer little additional user benefits as spoken menu commands simply replace DTMF tones.

Many inexperienced organizations attempt to develop complex applications without really understanding the strengths and limitations of current speech technologies. This is most prevalent in VoiceXML where it is assumed that a team who produces XML-based Web applications can automatically develop great speech applications. This is simply not the case. Many organizations have pursued this approach to the detriment of the speech industries' profile.

For mass acceptance of speech technologies I feel that it is important to stimulate all segments of our market, not just the Fortune listed companies. Many worthwhile "cookie-cutter" application opportunities exist within medium and small enterprises. These are the organizations where reductions in cost-to-serve can have real "bottom line" impact.

Most senior executives we encounter are simply unaware of what speech technologies can do for their business and most importantly that the technology has serious credibility. Typically their exposure to speech recognition has been gained several years ago from a desktop dictation product evaluation bundled on a CD. Additionally they expect TTS to sound like a Texas Instruments Speak and Spell from the 1980's. Therefore, as an industry we have a need to reeducate these pre-conceptions.

Q What are the limiting factors in the technology's acceptance? What can be done about these factors?

A There are many factors that may limit the acceptance of our industry's technology. However I shall dwell on two main facets, namely poor interface design and poor recognition accuracy.

Many speech application development organizations have failed to see the importance of user interface design. Prompts are static, dialogs inflexible and consequently transaction completion rates low. In the main these can all be avoided by using good design principles along with a "Wizard of OZ" phase of application development. These should be backed up with ongoing reviews after application release.

My second factor relates to recognition accuracy. We all use our mobile telephones in many different environments including those with significant levels of background noise and we must assume that many speech application users will use mobile as well. Consequently recognition accuracy in even moderately noisy environments is a big issue. A telephone number consists of a string of twelve digits (ten in the US). If a recognizer gets just one digit wrong, the whole string is void. Putting this into perspective one error in twelve means just an 8% recognition error rate. With significant levels of background noise these error rates can increase ten fold, rendering the speech application useless. The problem of "noisy speech recognition" is the current hot research topic amongst speech academics. Our own "Clear" technology has been developed as a bolt on technique to help increase noise robust recognition levels. Speech application developers must endeavor to create applications sensitive to the limitations of current recognizers.

Q Describe a successful speech technology implementation and why you thought it was successful. Please include any benchmark statistics that support your thoughts.

A The simplest illustration of a successful "killer application" is that of an auto attendant. The business rationale is clear and it's a great way to get the technology into everyday commercial use. We have recently completed the deployment of an automated attendant for a large organization with thousands of employees. After consultation it was apparent that the organization's telephone agents were under extreme pressure. Call queues were long, drop out was high and the number of incorrect call transfers was unacceptable. Callers were often not getting the information they required. Call volumes varied greatly across the day and indeed there were large seasonal variances.

The application we developed has eradicated many of these issues. Call agents now have more "value add" time to spend with individual customers, better meeting caller needs and building better relationships. There simply is no more queuing. Agent resource has been redeployed on other tasks. Human agents only interact with around 10% of callers whose requirements cannot be served directly by the system. Transaction completion rates are up, especially at peak times where the eradication of queuing has had impact. The overall application return on investment period has been just a few short weeks.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues