A number of projects are available for graduate or advanced
undergraduate credit. These involve work in the segmentation,
annotation, analysis, classification, indexing, and display of
extended videos, and are carried out under the direction of
Prof. Kender and his Ph.D. students. Students work in the High Level
Vision Lab, whose facilities include Sun workstations and Pentium
PCs equipped with MPEG digitization and compression hardware and
software.
Currently available projects include:
- The summarization and indexing of videos. We have
recently graduated a PhD who was able to automatically analyze several
episodes of the sit-com Friends and to cross-index them by means of
their narrative structure and physical location. (To see an example
of such structure, look here.) We
have also recently graduated a PhD who applied similar methods to the
relatively unstructured domain of educational videos. We would like
to extend this work so that an entire semester's worth of eduational
videos, like those produced by CVN, can be summarized and
cross-indexed for easy retrieval.
- Learning of good video features. Selecting quick features
to classify individual frames of videos into different categories is
difficult, but we have found a novel machine learning approach to do
so. We would like to extend it to do real-time learning, so that a
Video Google can be made: the system learns from video clips selected
by the user exactly what makes those clips interesting, and does so
without preprogrammed notions of what is the best set of features to
index on.
- Deriving semantic cues from educational videos. Ever
try to find what you wanted in a CVN recording of a lecture? If only
a computer could organize the lecture and label related segments with
meaningful words. We are attempting to extract handwriting from
blackboard and overhead slide frames of educational videos, based on a
restricted vocabulary that comes from the keywords found in a given
subject area. This is currently accurate but very slow, so we would
like to find the most speedy algorithms to do so.
- Personalization of video summaries. Given some
representation of a user's personal "style", how does a system decide
how to summarize videos of news, or MTV music, or talk shows: use
text? or imagery? or key clips? or a combination? Relating user
behavior and preferences to efficient summarization and structuring
methods is challenging, and we can use help in exploring these
connections.
Back to my home page.