In a blog posting , Google announced the general availability of Cloud Text-to-Speech . The speech synthesis offered via the Internet has been expanded to include 14 languages, with Google counting American, British and Australian English among its own languages. The choice of speakers has been extended to 24 using WaveNet's neural network. The Deepmind technology developed by the London-based company analyzes audio recordings of real human speakers to make the speech sound more natural.
Google is also expanding its Cloud Speech-to-Text offering. In order to transcribe recordings of two speakers talking with each other by phone, the service simply uses the different channels to assign the texts to the respective persons. In case of recordings of conferences for example, users can leverage the programming interface (API) to inform the system of the number of participants. Cloud Speech-to-Text can subsequently differentiate the voices ever more easily in course of the conversation and update the assignments. Google has also added recognition of the respective language to its range of features.