In the last few years, the research community has taken another look at using neural nets in speech recognition (aka “Machine Learning”) with impressive results. The application of neural nets to speech recognition was pioneered in the 1990’s, but a lack of speed, memory and storage prevented commercial application. A couple of decades later, and the availability of all three components has increased beyond recognition. A modestly priced server with a few nVidia GPU cards can pack 30 Teraflops (flops – floating point operations per second) of computing capacity, which would make it the fastest supercomputer of 2002 (costing many millions of dollars).
Share this Post
The leading commercially available speech recognition technology of a few years ago required intricate hand crafted algorithms and additional per-user training. Today, it’s possible to train a neural net to recognize speech from a large range of speakers, including ones that are barely understandable to human transcriptionists, and return very accurate transcriptions of what those users said, or intended to say. In the past, these speakers would have been considered poor candidates for real time speech recognition and likely would have relied on a human transcriptionist to decipher their audio.
The benefit of neural nets to end users is that the accuracy of transcription continues to improve, while the cost for developers to implement speech recognition, and for IT organizations to deploy it, is declining markedly. In other words, speech recognition is rapidly being commoditized. You can see this with consumer-oriented services such as Google Voice Mail, whose accuracy has been steadily improving as Google’s neural nets learn from their users.
At nVoq, we make it our mission to bring you the benefits of this cutting edge research as soon as we can and to make it very affordable. Since we deploy our speech recognition in the cloud, we are also able to give you the benefits of improvements in recognition in the form of continual updates. There is a virtuous circle, in that as people use the system, it learns their individual patterns of usage, including new terms and expressions, which, in turn, helps all users.
An added benefit from neural nets: we at nVoq can now focus more effort on providing truly unique and value-added functionality through our platform in support of workflow productivity, above and beyond the speech recognition element. Capabilities such as voice-less automations bring a whole new set of possibilities to end users that shorten the “yes, I’ll buy” cycle – but more importantly, make “yes, I’ll actually use this” a reality.
Interested in learning more? Contact us!