point
Menu
Magazines
Browse by year:
Robot Reality
Wednesday, May 1, 2002
Deb Roy maintains that his work as director of the Cognitive Machines Group at the Massachusetts Institute of Technology’s media lab is just more of the same thing he started doing as a four-year-old in Winnipeg, Canada.

“My childhood was filled with building robots and taking apart anything which moved or needed batteries,” Roy says. “It’s fortunate that I was blessed with parents who never got tired of my asking ‘why?’ all the time.”

Nowadays, of course, the robots that Roy and his MIT research group are creating have grown much more sophisticated than those he fashioned as an inquisitive boy. And the questions that Roy is asking today are about as big as any, considering their implications.

That’s because Roy wants to create machines that will speak and understand natural spoken language, and be genuinely able to “communicate on human terms,” he says. For this, Roy believes that what’s required are not machines which mindlessly function as assemblages of voice recognition algorithms, but robots that have actually acquired their use of language in much the same way humans beings do: by learning words’ meanings in the process of interaction with the physical world around them.

Isn’t the creation of such machines a highly ambitious goal — one that many smart people have attempted and failed to achieve?

“If you want to create a robot that uses language as humans do, I see no way but to endow it with the same sorts of sensory-motor grounding and goal-pursuit processes that we ourselves possess,” Roy argues. “For instance, you and I can comprehend words’ dictionary definitions only because we already know a basic set of words from our experience. If you look up words like ‘not’ or ‘heavy’ or ‘I,’ you’ll find definitions using more complicated words than the words they’re trying to define. Then, if you look up what each of those more complicated words means, and then go on to look up the definitions of those words, you’ll soon find yourself caught in infinite, meaningless loops. Because all any dictionary can do is use words to describe words. On the other hand, for a word like ‘blue,’ the definition might say, ‘The color of the sky on a clear day.’ So there we do have an explicit pointer beyond the world of words and into the non-symbolic physical world.”

Thus, this question of how the meanings of basic words like “blue” are established is one that Roy’s research group at MIT is concentrating upon. Roy believes that linking any purportedly cognitive machine to the physical world via sensors and actuators is indispensable.

“No human being born with all his or her perceptual systems damaged would be likely to develop concepts and language,” Roy says. “Why should we expect computers to? The higher levels of knowledge which language makes use of are grounded in these physical conceptions of the world that arise out of having direct sensory feedback. Furthermore, this sensory-motor grounding isn’t the whole story. People also need to have drives, goals, desires and self-reflective mechanisms in order to function. So, as I say, I see no way around endowing a machine with these same sorts of mechanisms if we are to enable machines that we can talk to on human terms. Imagine trying to talk to a person who has no goals. How would they understand words like ‘want’ or ‘good’ or ‘bad?’”

In keeping with all this, one robotic manipulator built by Roy and his group has been given an ability to experiment with the meaning of verbs, along with stereo color vision, an artificial vestibular system to supply a sense of gravity, a sense of touch, and a set of proprioceptive sensors allowing the robot to feel strain in each of its joints, as human beings do.

“This sort of thing is very important for our goal,” Roy says, “since we want to be able to physically play and interact with the robot as if it were a natural creature. Also, we’ve chosen to develop — entirely by ourselves — both the robot’s physical components and all the machine learning algorithms that drive the robot’s automatic language acquisition from its human trainers. Typically, when research requires multiple complex components, a research group will tend to have expertise in one area and will take the other components off the shelf. This usually ends up constraining the work. We wanted to avoid that.”

Already, Roy and his research group have created robots that can learn the meanings of visually grounded words (words describing shapes, colors, and spatial relationships) via “show-and-tell” training. They can also understand reasonably complex grammars. Indeed, in one experiment a robot called Toco learned directly from mothers’ spontaneous speech directed at their infants.

“This is significant,” Roy says. “It demonstrates the ability of the algorithms we’ve developed to segment speech, and to link words to visual categories given the sort of data that an infant would see and hear. If we can do that, then there’s much more to come.”

Like what? Well, one application Roy sees is the creation of computers for the approximately one billion or so illiterate people who exist worldwide.

If we can create a machine able to learn language in human ways — by having somebody interact with it as they would with a child — we should then be able to package that automatic language-learner in a piece of software and float it out on the Internet,” Roy says. “After that, any linguistic community that is willing to teach the machine would end up with local language versions of its speech recognizers, synthesizers and the rest of it.”

He adds, “With these sorts of speech technologies we’re working on here at MIT, all the functionally illiterate, preliterate and visually impaired people all over the planet might ultimately gain access to textual information.”

Twitter
Share on LinkedIn
facebook