The Smart Techie was renamed Siliconindia India Edition starting Feb 2012 to continue the nearly two decade track record of excellence of our US edition.

Open Sesame

Sankaran Namboothiri
Thursday, September 4, 2008
 Sankaran Namboothiri
Most of us cannot forget the wide-eyed feeling we had as children when we heard the story of ‘Ali Baba and his treasure cave’ that opened with a spoken command, “Open Sesame”. Speech recognition technologies open up vistas of exciting enterprise mobility applications. This article examines, briefly, how far this technology has reached, its strengths, and limitations and explores the possibilities within the current constraints.

Ali Baba is confined to the realm of fairy tales but a new breed of technologists have been working to make the spoken command work in seemingly magical ways. Today, Voice recognition, Speech Recognition, and Semantics are driving forces in several industries. They have opened up a treasure trove of exciting prospects. Shorter response time and a disregard for day and night are redefining our ways of working and productivity. Enterprise Mobility has moved from being a social statement to a useful productivity technology. The day is not far when the ROI driven applications built on Voice, Speech, and Semantics will define the functional contours of enterprises.

The challenges of adaptations are daunting, but surmountable. We have made a successful transition from passive DTMF menu structure driven IVR applications to more interactive, personality injected, and near natural experience of dialogue-driven applications.

Rather than pressing buttons or interacting with a computer screen, users speak to the computer. As automatic speech recognition returns probabilities, not certainties, the challenges of levels of uncertainty associated with the users’ speech input can be daunting,

The most palpable Achilles’ heel is the potential for misrecognition. No matter how much effort and care is put into developing a piece of speech recognition software, there will always be times when the application misrecognizes user input. Because of this, it becomes important to provide for greater error handling than in other applications. If the confidence score on a specific recognition is low, it becomes important to confirm what the user said. The system may have to ask users to repeat themselves. Sometimes a given user will just not be understood, perhaps because he or she is in a noisy environment. If a speech engine returns low confidence values for the same user several times, it may be imperative to transfer that user to a human agent so the user can carry out his transaction.
Speech recognition is also affected by the quality of the input. If a user is calling a system, a bad cell phone connection or overly compressed Internet audio may throw off recognition. Providing for these situations becomes critical when designing speech recognition applications.

Share on Twitter
Share on LinkedIn
Share on facebook