Human Tracking and Orientation Estimation

Manuela Ichim

Msc Programme Computer Graphics and Virtual Reality, Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, Romania

Performed at department of Information and Computing Sciences, Game and Media Technology Master's Programme, Utrecht University through Erasmus Programme


This thesis presents the research done on human head and body orientation estimation. This problem can be subdivided in two tasks, namely human tracking and orientation estimation. The first is accomplished using the framework described by Choi et. al, which is capable of estimating and tracking the positions of human targets in the real world coordinates, starting from a video stream captured using a single monoscopic moving camera. In the first stage of the research, the approach of Chen et al. was implemented for solving the second task, namely head and body orientation estimation. My approach for this task starts from the main ideas outlined in the original method, such as using HOG descriptors for describing the visual appearance of the targets and considering additional cues such as the velocity direction and head-body coupling. In order to address some of the limitations of the original method, as well as to incorporate new elements, a different framework was conceived. Under this new framework, the response of 3 different classifiers (Gaussian Mixture Model, Neural Network and Support Vector Machine) is combined with information from additional cues. These include the original ones, velocity direction and magnitude and head-body coupling, as well as new ones, face detections and temporal smoothness. The performance of the method was evaluated and the contribution to the final prediction of each classifier and additional cue was assessed. Overall, the performance of the proposed approach was good, both in terms of estimation accuracy, as well as computation time.

Download: pdf

Approach description

Figure 1. Pipeline for body angle estimation using method 1.


Figure 2. Screenshot example of resulting output. The estimated angle values expressed in degrees are attached to the bounding box. The body orientation angle is written in green and followed by the ground truth value, while the head angle is written in red. A visual representation of the orientation is given by the attached circle, a green line represent the estimated angle, while the yellow line represent the computed estimated velocity.


Figure 3. Average angle estimation error while using only HOG based descriptors.

Figure 4. Average angle estimation error while considering additional cues.

Main References:

[1] Choi, Wongun, Caroline Pantofaru, and Silvio Savarese. "A general framework for tracking multiple people from a moving camera." (2012): 1-1.
[2] Chen, Cheng, and J. Odobez. "We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.