The keyboard and mouse as input devices have revolutionized the computer interaction for decades now. Perhaps, nothing could make me realize the profound effect these input methods have on the new generation, until I heard a friend’s daughter ask matter-of-factly, “Why do we have five fingers, when the mouse has only 2-buttons?”
With more than three billion mobile phones in the world today, analysts predict that the number of mobile devices accessing the Internet will cross the one billion mark over the next four years.
With the wealth of information and services available from almost everywhere, Internet-connected mobile devices are reshaping the way we go about our personal and professional lives. According to John Gantz, Chief Research Officer at IDC, “With an explosion in applications for mobile devices underway, the next several years will witness another sea change in the way users interact with the Internet and further blur the lines between personal and professional.”
During the early years of mobile Internet, the industry focused primarily on trying to fit the existing web sites on to a small mobile screen, through a process called transcoding (translation + encoding). This process essentially involved the translating of the existing HTML pages of Web-sites into the Wireless Markup Language (WML) pages, with some image re-sizing capability. But, the market quickly realized this did not really help adoption of mobile internet, due to lack of the convenient interface to interact with the Internet applications using the phone keypad. In those days, users had to press each key up to three times to select the desired alphabet to input.
Then came the Research In Motion’s (RIM) Blackberry as a mobile email appliance with a QWERTY keypad that quickly became popular with users as they could read and reply to the emails without having to carry bulky laptop computers for email access outside their homes and offices. While many loyal users of Blackberry swear by the convenience of a track-wheel (mouse) and keypad, the primary use of that interface could not help increase the use of mobile internet access beyond email.
Clearly, the keyboard and mouse outlived their purpose. After trying, foldable key-boards and accessories, the mobile phone industry has until recently settled on sliding keypads for user interaction.
The launch of Apple’s iPhone with its innovative touch interface and on-screen keypad, revolutionized the mobile Internet access, taking it beyond simple email access. With over 100,000 applications, it quickly became more than just a phone, but an information access device, a GPS and a mobile TV and a gaming console and a social networking tool.
Given the addictive nature of the convenience of information access, anytime, anywhere, several research efforts around the world have been focused on new and innovative interfaces that are suitable for interaction while on the move, including projection keyboards, speech recognition, multi-touch and projection screens.
A recent research project at MIT has popularized the power of combining the mobile phone camera with a mobile projector, allowing users to project the output of the mobile screen to any surface making it touch sensitive for user input.
Research has shown that combining the speech and gesture (touch and tap) reduces the amount of time we take to communicate our intent be it with other humans or with systems. It is less ambiguous and takes less time to say, “I want that (gesturing at the item you want)” than speaking item name and location when you are in a shop. The geek-speak for such “natural interaction” is multimodality.
Imagine the convenience of listening to your emails, text messages or personalized news stories, while driving to work or when stuck in the traffic, without having to type or take your eyes off the road. Or, sharing the picture with your family of that dress you wanted to buy for your daughter, with quick voice notes and annotations all from the convenience of your mobile internet device.
The power of voice search on the internet has been demonstrated by Google, Vlingo among others. While combining speech and gesture interfaces for interaction with mobile internet devices makes it very easy for the users to interact with applications and peripherals, it can be complex to implement such interfaces commercially, that only a handful of companies such as Microsoft, AT&T, Nuance, Openstream, IBM, Kirusa and some others provide such solutions.
The WorldWideWeb Consortium, the apex body that develops the web standards, has several research groups that are focused on mobile and voice web interaction. The W3C Multimodal Interaction ( MMI ) Working Group, focuses on the development of standards for combining speech, touch (Ink), text and emotion into user interface ( imagine if your phone could detect when you are tired or angry and adjust its presentation/intonation of the output accordingly!).
The W3C MMI Architecture provides application developers the ability to combine various modalities of interaction through an Interaction Manager (IM) using asynchronous events among various constituents of the application, separating the application logic from user interface.
For example, an application, can have Speech Interface (VoiceModality) developed in VoiceXML markup and gesture-annotations in InkML and the Visual application in HTML, all exchanging the user-input in EMMA ( extensible multimodal annotation) markup with the business-logic layer of the application. The markup-based development of interfaces, allows the developer community to fully leverage the benefits of portability and extensibility and inter-operability of the web application paradigm on multiple mobile platform and devices, as opposed to native development of such interfaces on each of those devices and platforms.
With mobile phonespacking more functionality, such as GPS, Camera, Accelerometers, Multimedia Players, RFID and other sensory capabilities, the applications on these mobile devices will get richer in features than the traditional web applications designed for desk-top computers.
The appeal and applicability of multimodal interfaces is not just for consumer/personal applications. The use and convenience of such multimodal applications in the insurance, healthcare, realty, media and entertainment industries can be readily seen, as several enterprises have already started incorporating the rich features of multimodality in to their field data collection applications using voice form-fill, image & signature capture, map-integration with visual annotations and spoken driving directions using mobile devices.
In the coming months, mobile applications will become increasingly multimodal allowing users the choice of modes of interaction based on their situational needs with intelligent detection of the ambient conditions.
The author is CEO, Openstream