Google Announces General Availability of Cloud Speech-to-Text, Support for New Languages, and More

According to a blog post, Google is “we're making our Cloud Speech-to-Text and Text-to-Speech products more accessible to companies around the world, with more features, more voices (roughly doubled), more languages in more countries (up 50+%), and at lower prices (by up to 50% in some cases).” Last year, Google announced the availability of premium models for video and enhanced phone in beta. Now, the program is generally available.

Additionally, Google is announcing the general availability of multi-channel recognition, which helps the Cloud Speech-to-Text API distinguish between multiple audio channels (e.g., different people in a conversation), which is very useful for doing call or meeting analytics and other use cases involving multiple participants.

Availability isn’t the only way Google is hoping to make cloud speech-to-text more accessible to potential customers. It’s also offering discounts. According to the announcement, “For standard models and the premium video model, customers that opt-in to our data logging program that was introduced last year, will now pay 33% less for all usage that goes through the program.” It continues, "We’ve cut pricing for the premium video model by 25%, for a total savings of 50% for current video model customers who opt-in to data.”

If you’re interested in giving the service a whirl, you can use the $300 GCP credit to start testing. The first 60 minutes of audio you process every month with Cloud Speech-to-Text is always free.

Google also announced support for seven new languages or variants, and 31 new WaveNet voices (and 24 new standard voices) across those new languages.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Radisys Unveils New Advanced Speech Recognition Support for Enhanced In-Call Speech Services

Radisys' MediaEngine solution says it reduces costs of deploying speech-enabled services by more than 90%; allows CSPs to capitalize on mass-market acceptance of person-to-application interaction via speech.

CallMiner and Morae Global Announce Partnership to Deliver Conversational Behavioral Analytics for Financial Services Risk Mitigation and Regulatory Compliance

Voice surveillance based on conversational content proactively reveals compliance and fraud threats.