Speech Technology Magazine

 

Speech Solutions in China, Part II: Network Solutions

China is fertile ground for speech, but it needs proper cultivation.
By Judith Markowitz - Posted May 1, 2008
Page1 of 1
Bookmark and Share

China and Taiwan are potentially huge markets for speech. Their economic booms are rapidly ripening the market. The virtual absence of commercial Chinese speech technology vendors has opened the door to foreign companies. Nuance Communications, IBM, Intervoice, and Genesys Telecommunications Laboratories have already established Chinese operations.

These companies understand that Chinese markets differ from North American and European markets in more than language and writing systems. For example, unlike in the United States, where cost-cutting is king, Chinese businesses are more likely to turn to speech to increase revenue.

To reach Chinese markets, foreign companies are establishing partnerships, often with Chinese companies. These partners benefit by expanding their offerings and know-how. For example, iFLYTEK added speech recognition (SR) to its successful text-to-speech  business as a result of its affiliation with Nuance; eSoon has become a leading supplier of VoiceXML-platforms through its relationship with Genesys. 

The following three areas are where most of the development has taken place:
Call Centers  Virtually all Chinese businesses have call centers. Management is always looking to improve operations. Here are some major automation factors:
1. Revenue enhancement. Companies often use call centers to attract new customers, making customer satisfaction  extremely important. Human agents provide personal attention, but the agent-to-caller ratio is so low that waiting times are long.
Each company has a single telephone/service number for all call center operations. Consequently, touch-tone automation produces long, complex menus and poor customer satisfaction. As a result, automation rates are low, even for after-hours usage when automation is most attractive. SR and natural language processing could provide better service, but few local integrators have good SR application and human-factors design skills.
2. Cost. Historically, the cost of hiring, training, and paying call center agents has been low. Conversely, most automation has been proprietary, which can be expensive, especially when professional services are involved.
The economic boom is changing everything.  Pay scales are skyrocketing and turnover is following suit. Business growth is increasing demand for call center services, forcing businesses to hire more agents. Property costs are rising 100 percent to 200 percent annually, making expansion of call center facilities costly. Companies are being forced to think of other ways to handle higher call volumes. The result is that SR in a standards-based IVR is a cost-effective alternative if professional services can be kept to a minimum.
3. Competition.  In response to global competition, companies are seeking streamlined operations that provide good customer service 24/7. This is ideal for standards-based SR IVR solutions.

Enhanced Services The global mobility explosion has made China and Taiwan among the largest markets for wireless telecommunications devices and value-added services like downloadable ringtones. SR is making inroads in music search and voice-activated dialing.

Technology The performance of SR engines for Chinese languages  (primarily Mandarin) is comparable to that for North American English. Mainland China has five regional dialects that are really different languages. Furthermore, each dialect has its own regional and social subdialects. Even standard Mandarin is spoken differently in different regions of China.

The economic boom is bringing those speakers to Beijing and other cities, and their foreign accents are coming with them. Mandarin recognizers trained on speakers from Beijing tend to work poorly. Consequently, a major challenge facing SR in China is overcoming this linguistic diversity. That is a problem that researchers in China and elsewhere are trying to solve.

I want to thank the following people for their help with this series of articles: Ding Dawei of iFLYTEK; John Poon, Chris Chan, and James Brooks of Nuance; Rob Hilsen of Genesys; Todd Mozer of Sensory; and Edgar Chau of Cyberworkshop.


Judith Markowitz, Ph.D., is technology editor of Speech Technology magazine and a leading independent analyst in the speech and voice biometric fields. She can be reached at judith@jmarkowitz.com.



I want to thank the following people for their help with this series of articles: Ding Dawei of iFLYTEK; John Poon, Chris Chan, and James Brooks of Nuance; Rob Hilsen of Genesys; Todd Mozer of Sensory; and Edgar Chau of Cyberworkshop.

Page1 of 1