February 1, 2003
Q & A

Lin Chase, CEO, and Yoon Kim, Chief Technology Officer, NeoSpeech

Q Tell us a little about NeoSpeech. How did you get started? What are your objectives for this business? Who are your customers? Who are some of your competitors?
A NeoSpeech is a provider of speech-enabled solutions configurable for handheld devices, desktop and network/server applications. The company offers core products and services in TTS, ASR, speaker verification and voice morphing. We are a partially owned subsidiary of Voiceware, one of Korea's leading speech technology providers. When Voiceware decided to expand into the U.S., NeoSpeech was born and the company officially launched at SpeechTEK 2002. NeoSpeech's objective is to provide best-of-breed speech technology - such as high quality, natural sounding TTS - for a wide range of devices and platforms, including small handhelds and the desktop in addition to the more typical network servers. We are also committed to the so-called "underserved" markets - small and medium-sized businesses - and to identifying, building and delivering applications to meet the needs of those markets. Our long-term objective is to be a leading U.S. technology and application provider, and to deliver compelling solutions for target markets such as interactive language skills evaluation, multimodal computer assisted language learning and pre-packaged applications for enterprises. We have announced PhoneTree and Conversay as early NeoSpeech OEM customers, and we will soon be announcing customers in the education, telephony and enterprise spaces. As far as competitors, at first glance it would appear that we are competing with large speech vendors such as AT&T, Nuance, Scansoft and SpeechWorks. In reality Nuance and Speechworks are focused on delivering telephony-based technology and applications to large enterprises and telcos, while NeoSpeech is more focused on packaged applications for small and medium-sized companies. Additionally, Neospeech's TTS engine is one of the most crisp, life-like, accurate TTS technologies on the market and it runs not only on network servers, but also on handhelds and the desktop. Q Tell us about Voiceware and what they are doing in Korea.
A As I mentioned, Voiceware is a leading provider of speech technology and solutions in Korea and is quickly expanding into other Asian markets including China and Japan. Founded in 1999, it's one of the few speech technology providers that have been profitable over the past three years. Voiceware is the Korean equivalent of Nuance or Speechworks in the U.S. However, Voiceware offers embedded speech technology for use in mobile phones, PDAs, toys, appliances and integrated circuits. Depending on the market in question, Voiceware has between 85-92% marketshare in Korea, with major customers in the airline, finance, telecom and government sectors among others. Q Why is TTS such an attractive option for companies?
A Our TTS, because of its intelligent and life-like sound, enables companies to automate much of what used to be considered unacceptable for TTS, and therefore had to be handled by live operators. When TTS is very high quality it offers the high level of customer service necessary for commercial IVRs. Outdialing applications, which previously had to be quite general in the pre-recorded messages they played, can now manage very detailed and completely dynamic information. For example, an outdialer can now inform a parent of the name of the child who missed school or can remind a library borrower of the title and date of an overdue book. With our personalized TTS, contact centers can use corporate voice talent to quickly create a new TTS voice that matches pre-recorded prompts. This allows seamless blending of all the existing prompts with dynamic information when people access account balances, mileage credits and order status. Q Your XVoice product offers the ability to animate voices. Why do you think callers prefer this animation? STM's user acceptance study seems to suggest they do not prefer animation when calling into a voice system for routine services.
A NeoSpeech's XVoice product is actually entertainment software for the PC and Internet that uses voice morphing to create personalities for animated characters. We do get some interesting inquiries on this product, however, including those from private investigators! In Korea, carriers use this technology to enable cell phone users to take on different identities - anything from a mobster to a school-aged girl - during live phone calls. The future of this product is still uncertain, but the most likely use will be in the Avatar, or virtual assistant market. Q What do you believe will be key market drivers for this technology (speech) in the short-term? Long-term?
A In the short-term, we see several drivers for speech technology, especially for high quality TTS systems. These drivers include the eventual wide acceptance of unified messaging, or the ability for a person to get all of his messages in one place and to access these messages from a PC or mobile device. Another driver is the increasing demand for remote access to corporate and customer data such as shipping and payment information and sales status. We also see great interest from many market sectors in customized outbound messages. Also in the short-term we see the education market, especially the language learning and language skills evaluation areas, as key areas of growth for speech and interactive speech technology. We have witnessed great demand for both quality English learning programs and language learning tools in Asian countries. These countries need both qualified language teachers and software-driven solutions for learning a new language in a mobile community. Interest from the blind and low-vision markets for higher quality TTS has been strong as well. Longer term, we envision more action in the handheld and telematics markets. We're ready for this and have small footprint engines ready to go. Q What is holding these solutions (speech technology) back from being mass deployed globally?
A There is both a "push" and a "pull" aspect to this. On the push side, many of the people developing speech applications have not had the speech technology background and expertise to quickly bring solutions to market. There needs to be tighter collaboration between application developers and speech technology companies so that they can deploy better solutions more quickly and can help ensure that speech applications evolve to a standard of high quality. From the pull perspective, many customers are not adequately driving these services - informing end-users about their availability, their ease of use and how consumers can benefit from using these services. Q How important are human-factors-design issues compared to the underlying technology such as accuracy performance for a successful speech solution? What are ways customers can improve the design process?
A Human factors and the accuracy of the underlying technology are equally important in designing an A+ system. It is critically important to have an accurate engine as well as to have an overall system that is specifically designed for human use. People will hang up in frustration if your engine is not accurate - while the same result will occur with a system that has not been tuned and designed specifically for a particular use. Q What are your thoughts concerning standards and their impact on speech technology?
A Current and emerging standards such as VoiceXML and SALT are critical to advancing the development and deployment of high quality voice applications. It is also important that these standards be flexible, enabling integration with non-speech standards, as is the case with Microsoft .Net. .Net is a Java-like platform for software development that integrates components such as client-server transfer protocols, content authoring and delivery, and database access into a unified framework, enabling companies to leverage their existing Web infrastructure to deploy voice applications accessible from the telephone or a (Windows enabled) multimodal device. Standard application development environments which have these kinds of features are critically important in the effort to get speech technologies into mainstream use. This is where we see real value for companies, and real help for developers in designing best-of-breed solutions.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Lin Chase, CEO, and Yoon Kim, Chief Technology Officer, NeoSpeech

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

CivAI Launches AI Voice Game to Demonstrate the Future of AI

Vuzix Introduces LX1 Smart Glasses for Warehouses