Two video representations are derived from this framework. The first stresses the combination of efficient coding, indexing, and retrieval necessary to make content-based access viable over large-scale distributed and unconstrained environments such as the Internet. The resulting interfaces rely mostly on queries that are visual (based in images or objects) in nature. The second explores the structured nature of specific content domains to support interaction at a semantic level, leading to more meaningful characterization and summarization of the video and appealing procedures for classification, browsing, and retrieval. In both cases, the analysis relies heavily on probabilistic modeling through procedures such as the EM algorithm, and Bayesian belief propagation is used to construct interfaces whose behavior adapts according to the specifications of the user. The procedures presented here are generic and applicable to a wide variety of problems involving human-machine interaction.