Game and Media Technology Master's Programme, Department of Information and Computing Sciences
Contact email: e.ursu [at] students [dot] uu [dot] nl
This is a thesis project performed by master student Elena Ursu under the supervision of dr. Robby T. Tan and dr. ir. Nico van der Aa. The topic of this project is automatically tracking a person in a sequence of frames. We developed an improved system built upon the method of Ramanan et al., which models a person's body configuration as a pictorial structure Felzenszwalb et al.. The system first analyses all the frames from a video to find a specific pose from which it learns the appearance of the person to be tracked. Then it processes the video to detect the person in any possible pose. We analysed the robustness of the original method by comparing pose estimations with labelled ground truth. Then, we extended the original method by including temporal information using two different types of motion models, which improved the tracking results. According to our qualitative evaluation of side-by-side tracking sequences, the new extensions resulted in more stable and accurate detections throughout time and are able to solve some challenging situations which arise when the motion is fast or body parts resemble each other. The main remaining problem is improving the arms detection.
Pose Estimation in Video
The system is made of two modules: a model building module (walking pose detector) and a detection module (general pose detector).
This module takes a sequence of frames and the learnt appearance models from the model building module and looks for a general human pose in each frame. As a contribution, we implemented two inference algorithms on graphical models that also include temporal information through connections (motion models) with nodes representing the previous detections. Our addition to the system is shown in green in the figure below.
We show a demo video obtained through our implementation of the tracking system, as well as examples of sequences where our system outperforms the basic method. The implementation was done in C++, using OpenCV.
Original video taken from https://www.youtube.com/watch?v=bbYQsp9BFZ0.
(Click on images to enlarge)