Magic Data Launches Conversational AI Datasets
Magic Data, an artificial intelligence data service provider, has launched an accumulation of more than 200,000 hours of training datasets, including 140,000 hours of conversational AI training datasets and 60,000 hours of read speech datasets, covering Asian languages, English dialects, and European languages, to help boost the development of human-computer interaction in artificial intelligence.
Magic Data's R&D Center worked on conversational speech data and read speech data comparison, where 3,000 hours of conversational speech training data and read speech training data were respectively used to train automatic speech recognition (ASR) models under customer service scenarios, broadcasting, and navigation command. It found that, compared with read speech data, conversational speech data word accuracy is improved up to 84 percent. Testing also found that the more the conversational data is used, the higher the word accuracy becomes.