We will Do and We will Hear

We will Do and We will Hear
About 30 years ago, the supposedly simple function of voice recognition required IBM to develop a dedicated computer, one that could cope with the required vocabulary, and which had the necessary response speed.
Today, every cell phone with an internet connection can make use of voice to control the device, to carry out a Google search, or to ask Siri to write an appointment into the diary.
Over the past 30 years, voice recognition has come an enormous way, the available vocabulary has grown immensely, and the ability to recognize the human voice has improved remarkably. The technology is still not perfect but, in hindsight, the product developed by IBM at the time, and which was intended for businesses, is now within reach of every one of us.
Microsoft first displayed voice recognition as part of its Vista operating system. The demonstration went smoothly when it came to commands but, it will be remembered, everything went wrong during the dictation demo.
In the cellular sphere, it was Apple who decided to use voice and create a “digital assistant,” one which interfaces with dozens of varied services; apart from identifying the human voice, it also has an understanding of “intent,” and responds accordingly.
Google’s range of services (dozens of them) allows the company to create an extensive database in multiple languages. This is reflected in Google’s now service which, in its new Android version, offers information at the flick of a finger, and pushes – on its own initiative – data and action options, in keeping with the user’s location and the time.
Microsoft appears confident in the capabilities of its natural interface, thanks to the success of Kinect, and so it has included a full voice interface in its new generation of Xbox.
In the area of Text to Voice, in which the computer reads the text aloud, significant progress has also been made.
Nuance, which develops voice recognition applications, and which is behind Siri’s voice, is also investigating the various characteristics of voice, among them pronunciation, speed, intonation, and more. The knowledge accumulated from analysis of these characteristics allows the creation of a real, almost natural-sounding voice, certainly more natural than the monotonic, metallic-sounding voices of the not so distant past.
Microsoft, Google, IBM, Nuance and others are trying to integrate artificial intelligence into these services, mainly by use of algorithms that allow the computer to “think deeply” and understand contexts and fine distinctions of meaning. One thing is clear about this field – the last word has not yet been spoken.