Pose Estimation in Video

Elena Ursu

Game and Media Technology Master's Programme, Department of Information and Computing Sciences
Utrecht University

Contact email: e.ursu [at] students [dot] uu [dot] nl


Abstract

This is a thesis project performed by master student Elena Ursu under the supervision of dr. Robby T. Tan and dr. ir. Nico van der Aa. The topic of this project is automatically tracking a person in a sequence of frames. We developed an improved system built upon the method of Ramanan et al., which models a person's body configuration as a pictorial structure Felzenszwalb et al.. The system first analyses all the frames from a video to find a specific pose from which it learns the appearance of the person to be tracked. Then it processes the video to detect the person in any possible pose. We analysed the robustness of the original method by comparing pose estimations with labelled ground truth. Then, we extended the original method by including temporal information using two different types of motion models, which improved the tracking results. According to our qualitative evaluation of side-by-side tracking sequences, the new extensions resulted in more stable and accurate detections throughout time and are able to solve some challenging situations which arise when the motion is fast or body parts resemble each other. The main remaining problem is improving the arms detection.


Thesis

Pose Estimation in Video [PDF]
Elena Ursu

System Pipeline

The system is made of two modules: a model building module (walking pose detector) and a detection module (general pose detector).

Detection module
This module takes a sequence of frames and the learnt appearance models from the model building module and looks for a general human pose in each frame. As a contribution, we implemented two inference algorithms on graphical models that also include temporal information through connections (motion models) with nodes representing the previous detections. Our addition to the system is shown in green in the figure below.


Results

We show a demo video obtained through our implementation of the tracking system, as well as examples of sequences where our system outperforms the basic method. The implementation was done in C++, using OpenCV.

Demo video

Original video taken from https://www.youtube.com/watch?v=bbYQsp9BFZ0.


Comparative results

(Click on images to enlarge)











Main References:

  1. Ramanan, D., Forsyth, D. A., Zisserman, A. "Tracking People by Learning their Appearance" IEEE Pattern Analysis and Machine Intelligence(PAMI). January 2007
  2. Ramanan, D., Forsyth, D. A., Zisserman, A. "Strike a Pose: Tracking People by Finding Stylized Poses" Computer Vision and Pattern Recognition(CVPR), San Diego, CA, June 2005
  3. P. Felzenszwalb, D. Huttenlocher "Pictorial Structures for Object Recognition" International Journal of Computer Vision Vol. 61, No. 1, January 2005

Last Update: August 2013