October - 2015 - issue > CEO Insight

Speech Technology is the topic of Conversation

Nagendra Kumar Goel

CEO-GoVivace

Tuesday, October 20, 2015

Forward

Being able to talk to your computer as if it were human has always been considered the ideal computer user experience. Many saw the Star Trek computer as that ultimate computer experience. You talk to the computer; it knows who you are and understands the topic of what you are asking it. It responds with personalized information or asks additional questions to yield optimal information or performs a specific task. This type of user experience can speed search and input processes in a natural, humanized way. To further engage the user, text to speech voice has a personality that makes user experience fun and more likely to engage the user. This is the next generation of user interface, which is commonly referred to as a Conversational User Interface (CUI).

While speech technology has been around for over half a century, in the last decade the accuracies of these systems have inched up to provide an enhanced user experience. Speech recognition is regularly used for transcription, interactive voice response, auto command and control, and information searching on mobile devices. Increases in processing power, storage, cloud computing, and recent developments in deep learning have all contributed to this.

Today's technology can be easily added to web and mobile applications to input information and control the application. On smaller devices, audio is streamed over the Internet to servers that process the audio and convert into text. Yet the process is so seamless that the user doesn't notice the role of the cloud technology. When combined with natural language processing, text can be converted to structured data. For example, patients are interviewed by virtual physicians and nurses about their medical problems. The problems can be spoken by patients or physicians and codified into clinical concepts for diagnosis and analytics.

Smart phones and tablets have stimulated new demand for next generation speech recognition in the form of CUIs. These devices are designed for mobile speech input and benefit tremendously from speech technology, because typing on phones and tablets is difficult. Apple's Siri, Google's speech recognition and Microsoft's Cortana are the first generation CUIs. These applications have been embraced by the average consumer as a new way to interact with their devices. The next phase shall extend the same concept to industry specific mobile applications.

As speech evolves to a conversational interface, the system becomes more intelligent and humanized. Natural language processing technology and unstructured information management software are combined with speech recognition and text to speech to produce a conversational interface. Unstructured information management is used to store personal and topic specific information, which is the core knowledge base used to build the intelligent dialog. Mobile and Internet applications are capturing more personalized. information about the user, her location, the context, and subject matter. This information helps build a profile of the user that can be accessed by the CUI to deliver the information that makes sense given the context, despite an ambiguous question. The CUI knows when more information is needed so that an intelligent question is asked to capture complete information to deliver the best answer or to perform the proper task. Clearly, the user interface of speech enabled applications needs to be built from a user centric perspective rather than the technology, data structure or algorithms.