Detailed Video Understanding

maandag 13 februari 2023
Niels Bohrweg 1
2333 CA Leiden

Detailed, human-level understanding of motions, state changes and actions in videos is essential for many human-centric applications of intelligent agents. For example, reminding an early-stage Alzheimer’s patient how to make their coffee, identifying when you’re building your flat-pack furniture incorrectly or providing automated feedback for a trainee surgeon to explain that their suturing should be gentler. To achieve such detailed video understanding, it is crucial to learn from limited labelled data as the more fine-grained a task, the more difficult and more time-consuming it is to obtain comprehensively labelled data. In this talk, I will discuss my recent research towards understanding subtle details in videos and reducing the labelled data requirement for video understanding. This includes work on recognizing adverbs, learning self-supervised video representations and identifying actions across different domains.

