Speech Technology’s Next Frontier: Developing Countries

Pioneers have been chipping away at nurturing speech technology in developing countries for 20-plus years, always hitting the same wall: The technology simply was not ready for the demands of low-resource languages. The tide is turning now, though, as investment is beginning to seep in to support the creation of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) across Africa.

The feasibility of such development is enabled by technology advances such as deep learning and GPU-powered computers, along with the growing throng of highly skilled university graduates and postgrads specializing in computer engineering and computational linguistics. These new technologists, together with the cadre of established professionals, need outlets for their talents, and the governments of Africa are taking notice, providing funding to raise communication in Africa to a new level. This is particularly the case in South Africa, which has 11 official languages.

The seeds of this effort were planted years ago. Roger Tucker, a multinational visionary of speech technology, has been one of the people at the forefront of the attempt to expand speech technologies into Africa. He was so early, in fact, that he might have been too early. He founded the Local Language Speech Technology Initiative (LLSTI), dedicated to incubating speech technology centers in the developing world, in the early years of the 21st century. He noted that the funding basics for this part of the world differ from those of the western corporate world; these technologies need to be created for the benefit of the people, because the citizenry were a top priority (and resource) inside each developing country. In fact, South African developers have no problem turning down your money if investor goals are not in sync with theirs.

Said Tucker more a decade ago: “Speech tech companies … did not transfer ownership of that tech to any of the speakers of that language. Minor languages were off the radar for these companies. There were in any case some deep problems involved in transferring technology to the developing world, resulting in too many rooms full of PCs that were left gathering dust, as puzzled community leaders or head teachers didn’t really know what to do with them.” Today, the landscape has changed, and it is time to take another look.

Among the companies on the forefront of the continent’s speech technology efforts is Saigen, an ASR developer in South Africa. The company creates speech recognition for South African English, Zulu, Sesotho, and Afrikaans and is now beginning development on Swahili and Xhosa. “Thousands of hours of transcribed audio data is required for training acoustic models, and millions to billions of words for language modeling,” says Charl van Heerden, Saigen’s CEO. “Corpora of these sizes simply do not exist for most African languages, but these languages are starting to receive the much needed international attention they deserve, with several African languages featuring on popular ASR resource collection platforms such as Mozilla’s Common Voice, and recent funding by organizations such as the Lacuna Fund developing machine learning datasets for African languages.”

But obstacles persist, Van Heerden points out. “The two most prominent challenges [are] a lack of text for language modeling and how to deal with a phenomenon common in African conversations that we call ‘code switching.’ Code switching occurs when a speaker switches from one language to another in a conversation.” For the latter issue, solutions are beginning to emerge as more and more technologists dedicate efforts to cracking this nut.

And help is on the way from the public sector. South Africa’s Council for Scientific and Industrial Research (CSIR) develops solutions with the support of government grants and attracts commercial ventures and R&D funding. The government has a Human Language Technology Expert Panel that sets the tone for the linguistic direction, and funding is also distributed by the Department of Sport, Arts and Culture and the Department of Science and Innovation. Aby Louw is principal engineer of the Voice Computing Research Group at CSIR, developing ASR, MT, and TTS in the 11 official South African languages. Its products and services are available directly or through resellers such as Inclusive Solutions and Edit Micro. Says Louw, “We actively publish and participate in the scientific community and support ourselves through commercial ventures and royalties. CSIR is a multidisciplinary scientific and technology research organization that researches, develops, localizes, and diffuses technologies to accelerate socioeconomic prosperity in South Africa.”

That last concept, “socioeconomic prosperity,” is especially important in all of Africa. These technologies are being created continent-wide for the people and need to be accessible to all, not hoarded by the few. “The Intellectual Property Act stipulates that government-funded research and development should be used for the betterment of the people of South Africa. This means that the funding must be used in order to deploy it to improve the lives of the citizens,” Louw says.

What has changed over the past decade in Africa is that the governments and their funded companies are now ready to take responsibility for housing and deploying these technologies for the long haul, thus enabling corporate investors to reap the benefits of access as well as earning the very valuable kudos of being development supporters in these countries.

“With low-resource languages,” Louw explains, “you can do a lot of data sharing within language families and even across language families. So we use a higher-resource language that has related elements, like English or Dutch, to shore up the initial model, and then refine to the lower resource. This enabled us, for example, to quickly create a COVID application for communication between healthcare providers and patients without a common language, such that within the specific domain of the application they could communicate with each other using an Android phone.”

As noted, advances in processing power have helped enable the development of speech technology in South Africa and elsewhere. “In the past, to train your models just required RAM and CPUs. Now if you try to train our current models on CPUs, even if you have access to hundreds of CPUs, it will take months,” Louw says. “We now use GPUs available in our own company, and an organization based in Cape Town called High Performance Computing makes GPUs available to the community. Yet whatever we do, and however we do it, our goal is to develop speech technology that can be used to improve the lives of the citizens of South Africa.”

Sue Reager specializes in across-language speech communication, applications and context engines. Her innovations are licensed by Cisco Systems, Intel, and telecoms worldwide.