Auditorily Embodied Conversational Agents: Effects of Spatialization and Situated Audio Cues on Presence and Social PerceptionEmbodiment can enhance conversational agents, such as increasing their perceived presence. This is typically achieved through visual representations of a virtual body; however, visual modalities are not always available, such as when users interact with agents using headphones or display-less glasses. In this work, we explore auditory embodiment. By introducing auditory cues of bodily presence — through spatially localized voice and situated Foley audio from environmental interactions — we investigate how audio alone can convey embodiment and influence perceptions of a conversational agent. We conducted a 2 (spatialization: monaural vs. spatialized) × 2 (Foley: none vs. Foley) within-subjects study, where participants (n=24) engaged in conversations with agents. Our results show that spatialization and Foley increase co-presence, but reduce users’ perceptions of the agent’s attention and other social attributes.2026YCYi Fei Cheng et al.Carnegie Mellon UniversityAffective Human-Computer DialogueSpatial Audio & 3D SoundAffective Feedback & Emotion Regulation InterfacesCHI
A11y-CUA Dataset: Characterizing the Accessibility Gap in Computer Use AgentsComputer Use Agents (CUAs) operate interfaces by pointing, clicking, and typing - mirroring interactions of sighted users (SUs) who can thus monitor CUAs and share control. CUAs do not reflect interactions by blind and low-vision users (BLVUs) who use assistive technology (AT). BLVUs thus cannot easily collaborate with CUAs. To characterize the accessibility gap of CUAs, we present A11y-CUA, a dataset of BLVUs and SUs performing 60 everyday tasks with 40.4 hours and 158,325 events. Our dataset analysis reveals that our collected interaction traces quantitatively confirm distinct interaction styles between SU and BLVU groups (mouse- vs.keyboard-dominant) and demonstrate interaction diversity within each group (sequential vs. shortcut navigation for BLVUs). We then compare collected traces to state-of-the-art CUAs under default and AT conditions (keyboard-only, magnifier). The default CUA executed 78.3% of tasks successfully. But with the AT conditions, CUA’s performance dropped to 41.67% and 28.3% with keyboard-only and magnifier conditions respectively, and did not reflect nuances of real AT use. With our open A11y-CUA dataset, we aim to promote collaborative and accessible CUAs for everyone.2026AMAnanya Gubbi Mohanbabu et al.The University of Texas at AustinVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUniversal & Inclusive DesignCHI
A11yExtensions: Accessibility Extensions to Augment Mobile AI Assistive Technology In-SituExisting visual AI assistive technologies have usability gaps, and may need additional adaptations and features to serve users' needs. We propose A11yExtensions, in-situ interventions that augment existing mobile AI assistive technology with add-on services. Add-ons include features that have been researched but are not yet deployed (e.g., cross-checking AI results), or that are only available in certain applications (e.g., camera aiming assistance). Through co-design sessions with two blind accessibility professionals, we designed and implemented three exemplar extensions, leveraging mobile automation tools to invoke add-ons, enabling just-in-time interventions for adaptability. We found that A11yExtensions provide opportunities to test new features and a new degree of flexibility and customization, though they introduce additional onboarding and communication challenges. We also derived a design space of accessibility extensions as a basis for future extension designs. Overall, A11yExtensions is a demonstration of the effectiveness of deploying new features in-situ via automation, with the technologies people actually use in their day-to-day lives.2026JHJaylin Herskovitz et al.University of MichiganVoice AccessibilityHealth Self-TrackingBehavior Change & Reflection TechnologyCHI
TouchScribe: Augmenting Non-Visual Hand-Object Interactions with Automated Live Visual DescriptionsPeople who are blind or have low vision regularly use their hands to interact with the physical world to gain access to objects' shape, size, weight, and texture. However, many rich visual features remain inaccessible through touch alone, making it difficult to distinguish similar objects, interpret visual affordances, and form a complete understanding of objects. In this work, we present TouchScribe, a system that augments hand-object interactions with automated live visual descriptions. We trained a custom egocentric hand interaction model to recognize both common gestures (e.g., grab to inspect, hold side-by-side to compare) and unique ones by blind people (e.g., point to explore color, or swipe to read available texts). Furthermore, TouchScribe provides real-time and adaptive feedback based on hand movement, from hand interaction states, to object labels, and to visual details. Our user study and technical evaluations demonstrate that TouchScribe can provide rich and useful descriptions to support object understanding. Finally, we discuss the implications of making live visual descriptions responsive to users' physical reach.2026RCRuei-Che Chang et al.University of MichiganVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Vibrotactile Feedback & Skin StimulationBehavior Change & Reflection TechnologyCHI
ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User ProgrammingExisting visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating custom filters for visual information, e.g., 'find NUMBER on BUS', leveraging three end-user programming approaches: block programming, natural language, and programming by example. To implement ProgramAlly, we designed a representation of visual filtering tasks based on scenarios encountered by blind people, and integrated a set of on-device and cloud models for generating and running these programs. In user studies with 12 blind adults, we found that participants preferred different programming modalities depending on the task, and envisioned using visual access programs to address unique accessibility challenges that are otherwise difficult with existing applications. Through ProgramAlly, we present an exploration of how blind end-users can create visual access programs to customize and control their experiences.2024JHJaylin Herskovitz et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Programming Education & Computational ThinkingUIST
WorldScribe: Towards Context-Aware Live Visual DescriptionsAutomated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to users' contexts: (i) WorldScribe's descriptions are tailored to users' intents and prioritized based on semantic relevance. (ii) WorldScribe is adaptive to visual contexts, e.g., providing consecutively succinct descriptions for dynamic scenes, while presenting longer and detailed ones for stable settings. (iii) WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy environments, or pausing when conversations start. Powered by a suite of vision, language, and sound recognition models, WorldScribe introduces a description generation pipeline that balances the tradeoffs between their richness and latency to support real-time use. The design of WorldScribe is informed by prior work on providing visual descriptions and a formative study with blind participants. Our user study and subsequent pipeline evaluation show that WorldScribe can provide real-time and fairly accurate visual descriptions to facilitate environment understanding that is adaptive and customized to users' contexts. Finally, we discuss the implications and further steps toward making live visual descriptions more context-aware and humanized.2024RCRuei-Che Chang et al.Intelligent Voice Assistants (Alexa, Siri, etc.)Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)UIST
VRCopilot: Authoring 3D Layouts with Generative AI Models in VRImmersive authoring provides an intuitive medium for users to create 3D scenes via direct manipulation in Virtual Reality (VR). Recent advances in generative AI have enabled the automatic creation of realistic 3D layouts. However, it is unclear how capabilities of generative AI can be used in immersive authoring to support fluid interactions, user agency, and creativity. We introduce VRCopilot, a mixed-initiative system that integrates pre-trained generative AI models into immersive authoring to facilitate human-AI co-creation in VR. VRCopilot presents multimodal interactions to support rapid prototyping and iterations with AI, and intermediate representations such as wireframes to augment user controllability over the created content. Through a series of user studies, we evaluated the potential and challenges in manual, scaffolded, and automatic creation in immersive authoring. We found that scaffolded creation using wireframes enhanced the user agency compared to automatic creation. We also found that manual creation via multimodal specification offers the highest sense of creativity and agency.2024LZLei Zhang et al.Mixed Reality WorkspacesGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsUIST
BrushLens: Hardware Interaction Proxies for Accessible Touchscreen Interface ActuationTouchscreen devices, designed with an assumed range of user abilities and interaction patterns, often present challenges for individuals with diverse abilities to operate independently. Prior efforts to improve accessibility through tools or algorithms necessitated alterations to touchscreen hardware or software, making them inapplicable for the large number of existing legacy devices. In this paper, we introduce BrushLens, a hardware interaction proxy that performs physical interactions on behalf of users while allowing them to continue utilizing accessible interfaces, such as screenreaders and assistive touch on smartphones, for interface exploration and command input. BrushLens maintains an interface model for accurate target localization and utilizes exchangeable actuators for physical actuation across a variety of device types, effectively reducing user workload and minimizing the risk of mistouch. Our evaluations reveal that BrushLens lowers the mistouch rate and empowers visually and motor impaired users to interact with otherwise inaccessible physical touchscreens more effectively.2023CLChen Liang et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUIST
VRGit: A Version Control System for Collaborative Content Creation in Virtual RealityImmersive authoring tools allow users to intuitively create and manipulate 3D scenes while immersed in Virtual Reality (VR). Collaboratively designing these scenes is a creative process that involves numerous edits, explorations of design alternatives, and frequent communication with collaborators. Version Control Systems (VCSs) help users achieve this by keeping track of the version history and creating a shared hub for communication. However, most VCSs are unsuitable for managing the version history of VR content because their underlying line differencing mechanism is designed for text and lacks the semantic information of 3D content; and the widely adopted commit model is designed for asynchronous collaboration rather than real-time awareness and communication in VR. We introduce VRGit, a new collaborative VCS that visualizes version history as a directed graph composed of 3D miniatures, and enables users to easily navigate versions, create branches, as well as preview and reuse versions directly in VR. Beyond individual uses, VRGit also facilitates synchronous collaboration in VR by providing awareness of users’ activities and version history through portals and shared history visualizations. In a lab study with 14 participants (seven groups), we demonstrate that VRGit enables users to easily manage version history both individually and collaboratively in VR.2023LZLei Zhang et al.University of MichiganMixed Reality WorkspacesCreative Collaboration & Feedback SystemsCHI
Hacking, Switching, Combining: Understanding and Supporting DIY Assistive Technology Design by Blind PeopleExisting assistive technologies (AT) often fail to support the unique needs of blind and visually impaired (BVI) people. Thus, BVI people have become domain experts in customizing and `hacking’ AT, creatively suiting their needs. We aim to understand this behavior in depth, and how BVI people envision creating future DIY personalized AT. We conducted a multi-part qualitative study with 12 blind participants: an interview on unique uses of AT, a two-week diary study to log use cases, and a scenario-based design session to imagine creating future technologies. We found that participants work to design new AT both implicitly through creative use cases, and explicitly through regular ideation and development. Participants envisioned creating a variety of new technologies, and we summarize expected benefits and concerns of using a DIY technology approach. From our results, we present design considerations for future DIY technology systems to support existing customization and `hacking' behaviors.2023JHJaylin Herskovitz et al.University of MichiganHaptic WearablesAging-Friendly Technology DesignUniversal & Inclusive DesignCHI
OmniScribe: Authoring Immersive Audio Descriptions for 360° VideosBlind people typically access videos via audio descriptions (AD) crafted by sighted describers who comprehend, select, and describe crucial visual content in the videos. 360° video is an emerging storytelling medium that enables immersive experiences that people may not possibly reach in everyday life. However, the omnidirectional nature of 360° videos makes it challenging for describers to perceive the holistic visual content and interpret spatial information that is essential to create immersive ADs for blind people. Through a formative study with a professional describer, we identified key challenges in describing 360° videos and iteratively designed OmniScribe, a system that supports the authoring of immersive ADs for 360° videos. OmniScribe uses AI-generated content-awareness overlays for describers to better grasp 360° video content. Furthermore, OmniScribe enables describers to author spatial AD and immersive labels for blind users to consume the videos immersively with our mobile prototype. In a study with 11 professional and novice describers, we demonstrated the value of OmniScribe in the authoring workflow; and a study with 8 blind participants revealed the promise of immersive AD over standard AD for 360° videos. Finally, we discuss the implications of promoting 360° video accessibility.2022RCRuei-Che Chang et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)UIST
CollabAlly: Accessible Collaboration Awareness in Document EditingCollaborative document editing tools are widely used in professional and academic workplaces. While these tools provide basic accessibility support, it is challenging for blind users to gain collaboration awareness that sighted people can easily obtain using visual cues (e.g., who is editing where and what). Through a series of co-design sessions with a blind coauthor, we identified the current practices and challenges in collaborative editing, and iteratively designed CollabAlly, a system that makes collaboration awareness in document editing accessible to blind users. CollabAlly extracts collaborator, comment, and text-change information and their context from a document and presents them in a dialog box to provide easy access and navigation. CollabAlly uses earcons to communicate background events unobtrusively, voice fonts to differentiate collaborators, and spatial audio to convey the location of document activity. In a study with 11 blind participants, we demonstrate that CollabAlly provides improved access to collaboration awareness by centralizing scattered information, sonifying visual information, and simplifying complex operations.2022CLCheuk Yin Phipson Lee et al.Carnegie Mellon UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Prototyping & User TestingCHI
ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image CaptionsBlind users rely on alternative text (alt-text) to understand an image; however, alt-text is often missing. AI-generated captions are a more scalable alternative, but they often miss crucial details or are completely incorrect, which users may still falsely trust. In this work, we sought to determine how additional information could help users better judge the correctness of AI-generated captions. We developed ImageExplorer, a touch-based multi-layered image exploration system that allows users to explore the spatial layout and information hierarchies of images, and compared it with popular text-based (Facebook) and touch-based (Seeing AI) image exploration systems in a study with 12 blind participants. We found that exploration was generally successful in encouraging skepticism towards imperfect captions. Moreover, many participants preferred ImageExplorer for its multi-layered and spatial information presentation, and Facebook for its summary and ease of use. Finally, we identify design improvements for effective and explainable image exploration systems for blind users.2022JLJaewook Lee et al.University of Illinois at Urbana-ChampaignExplainable AI (XAI)Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
"It's Complicated": Negotiating Accessibility and (Mis)Representation in Image Descriptions of Race, Gender, and DisabilityContent creators are instructed to write textual descriptions of visual content to make it accessible; yet existing guidelines lack specifics on how to write about people's appearance, particularly while remaining mindful of consequences of (mis)representation. In this paper, we report on interviews with screen reader users who were also Black, Indigenous, People of Color, Non-binary, and/or Transgender on their current image description practices and preferences, and experiences negotiating theirs and others' appearances non-visually. We discuss these perspectives, and the ethics of humans and AI describing appearance characteristics that may convey the race, gender, and disabilities of those photographed. In turn, we share considerations for more carefully describing appearance, and contexts in which such information is perceived salient. Finally, we offer tensions and questions for accessibility research to equitably consider politics and ecosystems in which technologies will embed, such as potential risks of human and AI biases amplifying through image descriptions.2021CBCynthia L. Bennett et al.Carnegie Mellon UniversityVoice AccessibilityAI Ethics, Fairness & AccountabilityUniversal & Inclusive DesignCHI