Video Annotation and Tracking
with Active Learning

Carl Vondrick and Deva Ramanan. NIPS 2011.


Download Paper

Download Slides
Download Poster

We introduce a novel active learning framework for video annotation. By judiciously choosing which frames a user should annotate, we can obtain highly accurate tracks with minimal user effort. We cast this problem as one of active learning, and show that we can obtain excellent performance by querying frames that, if annotated, would produce a large expected change in the estimated object track. We implement a constrained tracker and compute the expected change for putative annotations with efficient dynamic programming algorithms. We demonstrate our framework on four datasets, including two benchmark datasets constructed with key frame annotations obtained by Amazon Mechanical Turk. Our results indicate that we could obtain equivalent labels for a small fraction of the original cost

Code

The code for our active learning simulations is available for download:

$ wget https://github.com/cvondrick/pyvision/tarball/master

or

$ git clone https://github.com/cvondrick/pyvision.git

Our algorithm is implemented in the vision/alearn/marginals.pyx file. Our benchmark data sets are available here.

If you only wish to annotate videos, consider using our Video Annotation Tool, an interactive tool that crowdsources video annotation.

Bibtex

@conference{vondrick2011,
  title={{Video Annotation and Tracking with Active Learning}},
  author={Carl Vondrick and Deva Ramanan},
  booktitle={Neural Information Processing Systems (NIPS)},
  year={2011},
}