CS researchers had a strong showing at the ACM CHI Conference on Human Factors in Computing Systems (CHI 2023), with seven papers and two posters accepted. The premier international conference of Human-Computer Interaction (HCI) brings together researchers and practitioners who have an overarching goal to make the world a better place with interactive digital technologies.
Memento Player: Shared Multi-Perspective Playback of Volumetrically-Captured Moments in Augmented Reality
Yimeng Liu UC Santa Barbara, Jacob Ritchie Stanford University, Sven Kratz Snap Inc., Misha Sra UC Santa Barbara, Brian A. Smith Columbia University, Andrés Monroy-Hernández Princeton University, Rajan Vaish Snap Inc.
Capturing and reliving memories allow us to record, understand and share our past experiences. Currently, the most common approach to revisiting past moments is viewing photos and videos. These 2D media capture past events that reflect a recorder’s first-person perspective. The development of technology for accurately capturing 3D content presents an opportunity for new types of memory reliving, allowing greater immersion without perspective limitations. In this work, we adopt 2D and 3D moment-recording techniques and build a moment-reliving experience in AR that combines both display methods. Specifically, we use AR glasses to record 2D point-of-view (POV) videos, and volumetric capture to reconstruct 3D moments in AR. We allow seamless switching between AR and POV videos to enable immersive moment reliving and viewing of high-resolution details. Users can also navigate to a specific point in time using playback controls. Control is synchronized between multiple users for shared viewing.
Towards Accessible Sports Broadcasts for Blind and Low-Vision Viewers
Gaurav Jain Columbia University, Basel Hindi Columbia University, Connor Courtien Hunter College, Xin Yi Therese Xu Pomona College, Conrad Wyrick University of Florida, Michael Malcolm SUNY at Albany, Brian A. Smith Columbia University
Blind and low-vision (BLV) people watch sports through radio broadcasts that offer a play-by-play description of the game. However, recent trends show a decline in the availability and quality of radio broadcasts due to the rise of video streaming platforms on the internet and the cost of hiring professional announcers. As a result, sports broadcasts have now become even more inaccessible to BLV people. In this work, we present Immersive A/V, a technique for making sports broadcasts —in our case, tennis broadcasts— accessible and immersive to BLV viewers by automatically extracting gameplay information and conveying it through an added layer of spatialized audio cues. Immersive A/V conveys players’ positions and actions as detected by computer vision-based video analysis, allowing BLV viewers to visualize the action. We designed Immersive A/V based on results from a formative study with BLV participants. We conclude by outlining our plans for evaluating Immersive A/V and the future implications of this research.
Supporting Piggybacked Co-Located Leisure Activities via Augmented Reality
Samantha Reig Carnegie Mellon University, Erica Principe Cruz Carnegie Mellon University, Melissa M. Powers New York University, Jennifer He Stanford University, Timothy Chong University of Washington, Yu Jiang Tham Snap Inc., Sven Kratz Independent, Ava Robinson Snap Inc., Brian A. Smith Columbia University, Rajan Vaish Snap Inc., Andrés Monroy-Hernández Princeton University
Technology, especially the smartphone, is villainized for taking meaning and time away from in-person interactions and secluding people into “digital bubbles”. We believe this is not an intrinsic property of digital gadgets, but evidence of a lack of imagination in technology design. Leveraging augmented reality (AR) toward this end allows us to create experiences for multiple people, their pets, and their environments. In this work, we explore the design of AR technology that “piggybacks” on everyday leisure to foster co-located interactions among close ties (with other people and pets). We designed, developed, and deployed three such AR applications, and evaluated them through a 41-participant and 19-pet user study. We gained key insights about the ability of AR to spur and enrich interaction in new channels, the importance of customization, and the challenges of designing for the physical aspects of AR devices (e.g., holding smartphones). These insights guide design implications for the novel research space of co-located AR.
Towards Inclusive Avatars: Disability Representation in Avatar Platforms
Kelly Mack University of Washington, Rai Ching Ling Hsu Snap Inc., Andrés Monroy-Hernández Princeton University, Brian A. Smith Columbia University, Fannie Liu JPMorgan Chase
Digital avatars are an important part of identity representation, but there is little work on understanding how to represent disability. We interviewed 18 people with disabilities and related identities about their experiences and preferences in representing their identities with avatars. Participants generally preferred to represent their disability identity if the context felt safe and platforms supported their expression, as it was important for feeling authentically represented. They also utilized avatars in strategic ways: as a means to signal and disclose current abilities, access needs, and to raise awareness. Some participants even found avatars to be a more accessible way to communicate than alternatives. We discuss how avatars can support disability identity representation because of their easily customizable format that is not strictly tied to reality. We conclude with design recommendations for creating platforms that better support people in representing their disability and other minoritized identities.
ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low-Vision Users
Vishnu Nair Columbia University, Hanxiu ’Hazel’ Zhu Columbia University, Brian A. Smith Columbia University
Blind and low vision (BLV) users often rely on alt text to understand what a digital image is showing. However, recent research has investigated how touch-based image exploration on touchscreens can supplement alt text. Touchscreen-based image exploration systems allow BLV users to deeply understand images while granting a strong sense of agency. Yet, prior work has found that these systems require a lot of effort to use, and little work has been done to explore these systems’ bottlenecks on a deeper level and propose solutions to these issues. To address this, we present ImageAssist, a set of three tools that assist BLV users through the process of exploring images by touch — scaffolding the exploration process. We perform a series of studies with BLV users to design and evaluate ImageAssist, and our findings reveal several implications for image exploration tools for BLV users.
Improving Automatic Summarization for Browsing Longform Spoken Dialog
Daniel Li Columbia University, Thomas Chen Microsoft, Alec Zadikian Google, Albert Tung Stanford University, Lydia B. Chilton Columbia University
Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, spoken dialog remains challenging to consume as listening is slower than reading and difficult to skim or navigate relative to text. Recent systems leveraging automatic speech recognition (ASR) and automatic summarization allow users to better browse speech data and forage for information of interest. However, these systems intake disfluent speech which causes automatic summarization to yield readability, adequacy, and accuracy problems. To improve navigability and browsability of speech, we present three training agnostic post-processing techniques that address dialog concerns of readability, coherence, and adequacy. We integrate these improvements with user interfaces which communicate estimated summary metrics to aid user browsing heuristics. Quantitative evaluation metrics show a 19% improvement in summary quality. We discuss how summarization technologies can help people browse longform audio in trustworthy and readable ways.
Social Dynamics of AI Support in Creative Writing
Katy Ilonka Gero Columbia University, Tao Long Columbia University, Lydia Chilton Columbia University
Recently, large language models have made huge advances in generating coherent, creative text. While much research focuses on how users can interact with language models, less work considers the social-technical gap that this technology poses. What are the social nuances that underlie receiving support from a generative AI? In this work we ask when and why a creative writer might turn to a computer versus a peer or mentor for support. We interview 20 creative writers about their writing practice and their attitudes towards both human and computer support. We discover three elements that govern a writer’s interaction with support actors: 1) what writers desire help with, 2) how writers perceive potential support actors, and 3) the values writers hold. We align our results with existing frameworks of writing cognition and creativity support, uncovering the social dynamics which modulate user responses to generative technologies.
AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models
Savvas Petridis Columbia University, Nicholas Diakopoulos Northwestern University, Kevin Crowston Syracuse University, Mark Hansen Columbia University, Keren Henderson Syracuse University, Stan Jastrzebski Syracuse University, Jefrey V. Nickerson Stevens Institute of Technology, Lydia B. Chilton Columbia University
News media often leverage documents to find ideas for stories, while being critical of the frames and narratives present. Developing angles from a document such as a press release is a cognitively taxing process, in which journalists critically examine the implicit meaning of its claims. Informed by interviews with journalists, we developed AngleKindling, an interactive tool which employs the common sense reasoning of large language models to help journalists explore angles for reporting on a press release. In a study with 12 professional journalists, we show that participants found AngleKindling significantly more helpful and less mentally demanding to use for brainstorming ideas, compared to a prior journalistic angle ideation tool. AngleKindling helped journalists deeply engage with the press release and recognize angles that were useful for multiple types of stories. From our findings, we discuss how to help journalists customize and identify promising angles, and extending AngleKindling to other knowledge-work domains.
PopBlends: Strategies for Conceptual Blending with Large Language Models
Sitong Wang Columbia University, Savvas Petridis Columbia University, Taeahn Kwon Columbia University, Xiaojuan Ma Hong Kong University of Science and Technology, Lydia B. Chilton Columbia University
Pop culture is an important aspect of communication. On social media people often post pop culture reference images that connect an event, product, or other entity to a pop culture domain. Creating these images is a creative challenge that requires finding a conceptual connection between the users’ topic and a pop culture domain. In cognitive theory, this task is called conceptual blending. We present a system called PopBlends that automatically suggests conceptual blends. The system explores three approaches that involve both traditional knowledge extraction methods and large language models. Our annotation study shows that all three methods provide connections with similar accuracy, but with very different characteristics. Our user study shows that people found twice as many blend suggestions as they did without the system, and with half the mental demand. We discuss the advantages of combining large language models with knowledge bases for supporting divergent and convergent thinking.