Object Recognition with Videos

Overview

Object recognition refers to the task of classifying objects in images or videos, and has been an active research topic in the recent years. Using deep convolutional neural networks (CNNs), great progress has been made in object recognition with still images, however object recognition with videos is still underexplored. Due to complexities in videos such as pose and scale changes, directly applying still-image frameworks to videos cannot achieve satisfying performance; instead, we should try to incorporate special characteristics about videos. For example, videos contain temporal information (in other words, videos can be treated as sequences of images) and richer contextual information than still images. Therefore, taking advantage of these information should lead to a better result.

In this project, we choose the baseline framework to be T-CNN, a deep learning framework combining object detection and object tracking incorporating temporal and contextual information in videos. The objective for this project is to improve the performance over the baseline framework by exploring different modifications to the baseline framework. Specifically, the performance of the enhanced framework will be evaluated on ImageNet ILSVRC2015 VID dataset, using the mean Averaged Precision (AP) as the evaluation metric.

Methodology

Tentative modifications to the baseline framework

Selected MGP

It was shown that the original MGP will result in too many duplicates which can increase the computation cost in later stages. To reduce duplicates, this modification

Enhanced Feature Maps

The quality of single-frame results are essential to the overall performance. This modification therefore attempts to incorporate contextual information from neighboring frames, especting the enhanced feature maps to result in better proposals and more accurate classification.

Temporal Loss

To enforce temporal consistency on detections from adjacent frames, a temporal loss will be considered in addition to the original loss function.

Schedule

Task	Status	Start Date	End Date
Literature Review	Completed	1/Sept/2016	30/Sept/2016
Programming environment setup	Completed	25/Sept/2016	10/Oct/2016
Coarse baseline on VOC	Completed	10/Oct/2016	3/Jan/2017
Baseline on VID	Completed	3/Jan/2017	15/Jan/2017
Selected MGP - 1	Completed	15/Jan/2017	16/Jan/2017
Selected MGP - 2	Completed	19/Jan/2017	19/Jan/2017
NMS Variants	Completed	17/Jan/2017	10/Feb/2017
3D Convolution for feature maps	Completed	4/Feb/2017	10/April/2017
Propagated Proposals	Completed	20/Mar/2017	10/April/2017
Documentation and Wrap-up	Completed	10/April/2017	16/April/2017
Presentation	Completed	21/April/2017	21/April/2017
Merging feature maps using optical flow	Future work	--	--
Temporal loss	Future work	--	--

Object Recognition with Videos

Overview

Methodology

Selected MGP

Enhanced Feature Maps

Temporal Loss

Schedule

Documentation

Project Plan

Interim Report

Project Report

Contact Us