Software to generate description of videos

By siliconindia   |   Monday, 27 July 2009, 20:45 IST   |    15 Comments
Printer Print Email Email
Software to generate description of videos
Maryland: According to the researchers from the University of Maryland and the University of Pennsylvania, a computer program has been developed to understand the video footage and to describe it in words. The software reports the action rather than the dialogue and functions a sort of video-to-text system, reports the Ethiopian Review. At the IEEE Computer Society's Conference on Computer Vision and Pattern Recognition, in Miami Beach, the researchers described the software's ability to draw links among human actions and to understand causal relationships. The software essentially produces a flowchart full of 'AND and OR' junctions, which account for all the possibilities or story lines, that it has experienced. Once the system settles on the most likely story line for a play, it then gives a linguistic description of the actions in the video. Abhinav Gupta, a Doctoral candidate at the University of Maryland who was a member of the research team, notes that the system's phrasings sound is like those of Frankenstein's monster. It uses only nouns, verbs and prepositions. To illustrate how the system works, the researchers showed how it analyzed footage of Major League Baseball games. During a learning period, the system watched games that had already been tagged with human-generated captions describing who the players were and what they were doing (pitcher: pitch; batter: no swing; batter: swing-miss; batter: swing-hit-run; fielder: run-catch-throw). Like the average human fan, the system learned the mechanics of the game by watching. It created a set of hypotheses about the relationships among the actions using the narratives and measured the correctness of each using the training videos. Gupta admits that it is quite likely the system would foul up a description of a play that had not previously appeared on videos it had analyzed. But he notes that such an instance would immediately push the system back into learning mode. It would add that new wrinkle - say, a wild pitch or a balk - to its flowchart and instantly recognize it the next time it happens. Describing the significance of the system, Gupta says the system would also be useful for video surveillance of places like airports and traffic intersections. Incidents can be tagged for later review by law enforcement as "unusual/suspicious" and brought to the attention of security personnel, all without relying on humans to keep their eyes on video screens and remain attentive.