Browse by year:
Speech Technology is the topic of Conversation
Nagendra Kumar Goel
Tuesday, October 20, 2015
Being able to talk to your computer as if it were human has always been considered the ideal computer user experience. Many saw the Star Trek computer as that ultimate computer experience. You talk to the computer; it knows who you are and understands the topic of what you are asking it. It responds with personalized information or asks additional questions to yield optimal information or performs a specific task. This type of user experience can speed search and input processes in a natural, humanized way. To further engage the user, text to speech voice has a personality that makes user experience fun and more likely to engage the user. This is the next generation of user interface, which is commonly referred to as a Conversational User Interface (CUI).

While speech technology has been around for over half a century, in the last decade the accuracies of these systems have inched up to provide an enhanced user experience. Speech recognition is regularly used for transcription, interactive voice response, auto command and control, and information searching on mobile devices. Increases in processing power, storage, cloud computing, and recent developments in deep learning have all contributed to this.

Today's technology can be easily added to web and mobile applications to input information and control the application. On smaller devices, audio is streamed over the Internet to servers that process the audio and convert into text. Yet the process is so seamless that the user doesn't notice the role of the cloud technology. When combined with natural language processing, text can be converted to structured data. For example, patients are interviewed by virtual physicians and nurses about their medical problems. The problems can be spoken by patients or physicians and codified into clinical concepts for diagnosis and analytics.

Smart phones and tablets have stimulated new demand for next generation speech recognition in the form of CUIs. These devices are designed for mobile speech input and benefit tremendously from speech technology, because typing on phones and tablets is difficult. Apple's Siri, Google's speech recognition and Microsoft's Cortana are the first generation CUIs. These applications have been embraced by the average consumer as a new way to interact with their devices. The next phase shall extend the same concept to industry specific mobile applications.

As speech evolves to a conversational interface, the system becomes more intelligent and humanized. Natural language processing technology and unstructured information management software are combined with speech recognition and text to speech to produce a conversational interface. Unstructured information management is used to store personal and topic specific information, which is the core knowledge base used to build the intelligent dialog. Mobile and Internet applications are capturing more personalized. information about the user, her location, the context, and subject matter. This information helps build a profile of the user that can be accessed by the CUI to deliver the information that makes sense given the context, despite an ambiguous question. The CUI knows when more information is needed so that an intelligent question is asked to capture complete information to deliver the best answer or to perform the proper task. Clearly, the user interface of speech enabled applications needs to be built from a user centric perspective rather than the technology, data structure or algorithms.

Domain and context knowledge play a critical role in the performance of these systems. Knowledge of the location of the user helps prune the unlikely but similar sounding destinations from map search. Knowledge about the popularity of web searches plays an important role in understanding spoken web queries. Many of the CUIs commonly seen today lack the conversational context. Consider the example of making an appointment. The user may say "I want an appointment with Dr. Dillon tomorrow". Through NLP analysis, the doctor and date are now known. The system now needs to find out if the doctor is really busy, and in that case respond with a negative or a limited number of choices, or if the doctor is not busy at all then they want to narrow down the patient preferences first. This is where a good CUI interface starts deviating from traditional click and type interface. Eventually this CUI can be mapped to a form that needs certain fields to be filled in to narrow down to a particular appointment that is available. The technology for such applications is not new, but the recency of technological advancements that allow for usable implementations leave the playing field wide open for a plethora of domain specific CUI applications that will become a reality in the coming years.

A mechanic would find it very useful to have an application that makes repair manuals and datasheets easily accessible in a hands and eyes busy scenario. A doctor's office would love to have a conversational assistant that collects the patient problems prior to the visit, or educates them and helps them stay compliant to the treatment plan after the visit. This type of solution can reduce the cost of care by reducing the time it takes a human to interview the patient and the time to write the problem and history of present illness. The diagnostic process will have fewer errors and perhaps be more comprehensive. Patients may reveal more information than actually speaking with a real person, because they are uninhibited in the process. Best of all this process can be performed anywhere the patient happens to be at the time as long as they have their mobile phone with them.

Several other new applications of speech technology have also emerged. Conversion to text allows for large scale data mining of audio archives. In this era of identity thefts, voice biometrics is starting to play a role of growing importance. Language, accent, gender and emotion detection are finding wide ranging applications from connecting to the best suited agent in a call center to identifying a national security threat. Students learn new languages and correct pronunciation of words using speech enabled software. Elderly with hearing disabilities can communicate better with simple mobile speech to text. Next time you develop a mobile application; evaluate how a CUI can make your application stand out as the best user experience.
Share on LinkedIn