“When people talk to each other, there are breaks, stutters, hesitations such as ‘uh’ or ‘hmm’, laughs and coughs. Words are often not clearly pronounced,” explains Alex Waibel, professor of computer science at KIT. This makes it difficult for humans to accurately record, but until now, it has been even more difficult for AI. The newly programmed computer system from KIT does this better than humans and faster than other systems.

“The recognition of spontaneous language is the most important component in this system, because errors and delays make the translation incomprehensible,” explains Waibel. The new system achieves an error rate of 5.0 percent; humans attain 5.5 percent. In addition to accuracy, a rapid result is also important so that students can follow the lecture live. For the first time, this delay is one second – the lowest latency value ever achieved by a speech recognition system of this quality.

According to Waibel, a recognition system cannot yet understand content or context on its own. “Here it is exclusively about acoustic recognition under scientifically comparable conditions.” However, dialog, translation and other AI modules can now enable linguistic interaction faster and with greater accuracy.