John R. Kender
Columbia University


We investigate the new task of determining which textual tags are preferred by different affinity groups for news and related videos. We use this knowledge to assign new group-specific tags to other videos of the same event that have arisen elsewhere. We map visual and multilingual textual features into a joint latent space using reliable visual cues, and determine tag relationships through various canonical correlation analyses (CCA) variants. For human-interest international events such as epidemics and transportation disasters, we detect country-specific tags from US, Chinese, European, South American, and other countries' news coverage.

We catalog statistically significant cross-group differences in multimedia creation and tagging, and explore variants of Deep CCA, finding them better suited to capturing those preferences in a three view space (one common video dimension, two culturally-determined tag dimensions). We investigate how these non-linear methods can be extended to the videos of multiple affinity groups, including more subtle shadings such as US compared to UK or even Canada. As different groups are differentially sensitive to particular images, we investigate the day-to-day spreading influence of visual memes across countries through a novel application of the PageRank algorithm.

We demonstrate and evaluate a novel cross-group multimedia browser that accesses online webpage archives of international events from two different countries. It visualizes these results with country-specific information on separate timelines, but with cross-country images and tags straddling both. This system provides an exploratory, zoomable differential view of clips and text, and graphs their development over time. We demonstrate that this browser expands and improves the effectiveness of video retrieval

Some examples of different cultural viewpoints detected:

In the above, near-duplicate keyframes appear across cultures with different texts. The upper pair is the near-duplicate keyframes about the AirAsia Flight news. The lower pair is the near-duplicate keyframes about the AlphaGo vs. Human event. Included is the translated English version (from Chinese) from an auto-translator, for comparison. It can be noticed that the translation has some slight issues, such as translating the Chinese word "diliutian(day 6)" to "6th", omitting the "tian" character.

Some examples of differential use of named entities:

In the above, a plot of the frequency counts of named entities in Chinese (CGTN) versus U.S. (YouTube) sources, about the Chinese Lunar Rover event. The outliers in the top-left corner (gray) are "Yutu" and "Chang'e-4", which are the Chinese names of the mission and the space craft, respectively, generally unmentioned in the U.S. The outlier in the bottom-right corner (green) is "NASA", generally unmentioned in China. The outliers (red) in the top-right corner, reflecting equal frequency of use, are "first" and "moon", the common subjects of the event.

STUDENTS (alphabetical order):



NSF IIS project award number: 1841670
Expected duration: 18 to 24 months
Award title: "Tagging and Browsing Videos According to the Preferences of Differing Affinity Groups"
Principal investigator: John R. Kender
Acknowledgment: This material is based upon work supported by the National Science Foundation under Grant No. 1841670.
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

For further information: jrk atsign cs dot columbia dot edu
Last update: Mar. 30, 2022