Researchers create an algorithm that makes highlight reels from long videos

Researchers from Carnegie Mellon University have developed an algorithm that can watch videos and condense them into shorter highlight reels, or trailers, containing only the good parts. Called LiveLight, the algorithm works by watching long videos, cataloging what happens in them, and then ignoring repetitive parts. The synopsis the method generates will feature only the parts that are visually novel compared to the rest of the video.

Practical applications of such an algorithm might be automatically cutting down long sections of GoPro, personal or surveillance camera video into watchable-sized trailers. Instead of sending family members the entire video of a child’s birthday party, new parents — even those without editing skills — could send a highlight reel of when the cake caught fire, the cat attacked a toddler and everyone finally got around to singing “Happy Birthday.” People watching NASCAR races for the crashes, for example, could ignore all that driving in circles.

Not only could LiveLight save viewers time, but smaller file sizes would minimize data consumption assuming the videos are ultimately hosted somewhere on the web.

LiveLight’s creators have founded a startup called PanOptus to commercialize their method and, presumably, to speed it up. LiveLight can take a long time to process on a personal computer (1 to 2 hours to cut down an hour of video) but could be sped up by parallel processing, and possibly score individual videos against a larger crowdsourced library of events, as part of a cloud-based service.


However, LiveLight isn’t the first attempt to deal with extraneous content often prevalent in non-professional videos. Last year, a group of researchers from the University of Texas developed a method for automatically summarizing the themes of videos based on an analysis of objects within them. That team suggested its work could improve video search by showing searchers a snippet that captures the gist of the whole (for example, a person making a cup of tea, or someone demonstrating proper weightlifting form).

Dropcam, the connected-camera startup recently acquired by Google, and other companies (especially in the home-security space) are also working on techniques to maximize their utility for customers. They want their systems to learn what’s normal and what’s anomalous, or the difference between a human being approaching the front door and a stray dog approaching the front door, to cut the number of alerts users see to just the ones that might actually be significant.

A sample of Activity Recognition working in the Dropcam office.
A sample of Activity Recognition working in the Dropcam office.

And, actually, the LiveLight research paper is being presented this week at the Conference on Computer Vision and Pattern Recognition, which also features numerous other papers on new methods for analyzing video. (As the name implies, the conference also features research across all aspects of computer vision, including into artificial intelligence and deep learning.)

All this research into video analysis will only become more meaningful as the amount of video content we shoot, upload and view continues to skyrocket. Unlike text, or even photos, videos tend to be long and contain a lot of information not easily labeled or discernible from just glancing at a thumbnail or watching random samples. Companies, cops and consumers awash in video data but limited on the human resources to watch every second will need all the help they can get to figure out what videos (or which parts of them) they really need to see.