ToPSen: Task-Oriented Priming and Sensory Alignment for Comparing Coding Strategies Between Sighted and Blind ProgrammersThis paper examines how the coding strategies of sighted and blind programmers differ when working with audio feedback alone. The goal is to identify challenges in mixed-ability collaboration, particularly when sighted programmers work with blind peers or teach programming to blind students. To overcome limitations of traditional blindness simulation studies, we proposed Task-Oriented Priming and Sensory Alignment (ToPSen), a design framework that reframes sensory constraints as technical requirements rather than as a disability. Through a study of 12 blind and 12 sighted participants coding non-visually, we found that expert blind programmers maintain more accurate mental models and process more information in working memory than sighted programmers using ToPSen. Our analysis revealed that blind and sighted programmers process structural information differently, exposing gaps in current IDE designs. These insights inform our guidelines for improving the accessibility of programming tools and fostering effective mixed-ability collaboration.2025MEMd Ehtesham-Ul-Haque et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesDIS
IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground TruthWe present IKIWISI, an interactive visual pattern generator for assessing the reliability of vision-language models in multi-object recognition tasks with arbitrary, user-defined objects in video data, where ground truth is often unavailable. The name IKIWISI is an acronym for the phrase "I know it when I see it" and reflects the tool’s grounding in human visual perception research, leveraging intuitive pattern recognition capabilities to facilitate trust and interpretability. IKIWISI employs an easily interpretable binary heatmap, where columns represent video frames and rows represent user-defined objects. Cells are color-coded green or red to indicate an object's presence or absence. Using a research-through-design approach, we refined IKIWISI through several iterations. A final study with 15 participants demonstrated that IKIWISI is easy to use and enables reliable model performance assessments that correlate with true performance, when available. Furthermore, users only need to inspect a tiny fraction of heatmap cells to reach conclusions. IKIWISI promotes transparency through visual interpretation, allowing users to quickly detect anomalies and focus on critical areas for further analysis. This makes it a valuable tool for evaluating vision-language models on user-defined objects in real-world scenarios.2025MIMd Touhidul Islam et al.Generative AI (Text, Image, Music, Video)Explainable AI (XAI)Interactive Data VisualizationDIS
Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal ModelsLarge multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.2025JXJingyi Xie et al.Pennsylvania State University, College of Information Sciences and TechnologyHuman-LLM CollaborationVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
Wheeler: A Three-Wheeled Input Device for Usable, Efficient, and Versatile Non-Visual InteractionBlind users rely on keyboards and assistive technologies like screen readers to interact with user interface (UI) elements. In modern applications with complex UI hierarchies, navigating to different UI elements poses a significant accessibility challenge. Users must listen to screen reader audio descriptions and press relevant keyboard keys one at a time. This paper introduces Wheeler, a novel three-wheeled, mouse-shaped stationary input device, to address this issue. Informed by participatory sessions, Wheeler enables blind users to navigate up to three hierarchical levels in an app independently using three wheels instead of navigating just one level at a time using a keyboard. The three wheels also offer versatility, allowing users to repurpose them for other tasks, such as 2D cursor manipulation. A study with 12 blind users indicates a significant reduction (40%) in navigation time compared to using a keyboard. Further, a diary study with our blind co-author highlights Wheeler's additional benefits, such as accessing UI elements with partial metadata and facilitating mixed-ability collaboration.2024MIMd Touhidul Islam et al.Vibrotactile Feedback & Skin StimulationHaptic WearablesVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)UIST
BubbleCam: Engaging Privacy in Remote Sighted AssistanceRemote sighted assistance (RSA) offers prosthetic support to people with visual impairments (PVI) through image- or video-based conversations with remote sighted assistants. While useful, RSA services introduce privacy concerns, as PVI may reveal private visual content inadvertently. Solutions have emerged to address these concerns on image-based asynchronous RSA, but exploration into solutions for video-based synchronous RSA remains limited. In this study, we developed BubbleCam, a high-fidelity prototype allowing PVI to conceal objects beyond a certain distance during RSA, granting them privacy control. Through an exploratory field study with 24 participants, we found that 22 appreciated the privacy enhancements offered by BubbleCam. The users gained autonomy, reducing embarrassment by concealing private items, messy areas, or bystanders, while assistants could avoid irrelevant content. Importantly, BubbleCam maintained RSA's primary function without compromising privacy. Our study highlighted a cooperative approach to privacy preservation, transitioning the traditionally individual task of maintaining privacy into an interactive, engaging privacy-preserving experience.2024JXJingyi Xie et al.Pennsylvania State UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Telemedicine & Remote Patient MonitoringCHI
Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted IndividualsThis paper explores how blind and sighted individuals perceive real and spoofed audio, highlighting differences and similarities between the groups. Through two studies, we find that both groups focus on specific human traits in audio--such as accents, vocal inflections, breathing patterns, and emotions--to assess audio authenticity. We further reveal that humans, irrespective of visual ability, can still outperform current state-of-the-art machine learning models in discerning audio authenticity; however, the task proves psychologically demanding. Moreover, detection accuracy scores between blind and sighted individuals are comparable, but each group exhibits unique strengths: the sighted group excels at detecting deepfake-generated audio, while the blind group excels at detecting text-to-speech (TTS) generated audio. These findings not only deepen our understanding of machine-manipulated and neural-renderer audio but also have implications for developing countermeasures, such as perceptible watermarks and human-AI collaboration strategies for spoofing detection.2024CHChaeeun Han et al.Pennsylvania State UniversityExplainable AI (XAI)Deaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)Deepfake & Synthetic Media DetectionCHI
Abacus Gestures: A Large Set of Math-Based Usable Finger-Counting Gestures for Mid-Air Interactions"Designing an extensive set of mid-air gestures that are both easy to learn and perform quickly presents a significant challenge. Further complicating this challenge is achieving high-accuracy detection of such gestures using commonly available hardware, like a 2D commodity camera. Previous work often proposed smaller, application-specific gesture sets, requiring specialized hardware and struggling with adaptability across diverse environments. Addressing these limitations, this paper introduces Abacus Gestures, a comprehensive collection of 100 mid-air gestures. Drawing on the metaphor of Finger Abacus counting, gestures are formed from various combinations of open and closed fingers, each assigned different values. We developed an algorithm using an off-the-shelf computer vision library capable of detecting these gestures from a 2D commodity camera feed with an accuracy exceeding 98% for palms facing the camera and 95% for palms facing the body. We assessed the detection accuracy, ease of learning, and usability of these gestures in a user study involving 20 participants. The study found that participants could learn Abacus Gestures within five minutes after executing just 15 gestures and could recall them after a four-month interval. Additionally, most participants developed motor memory for these gestures after performing 100 gestures. Most of the gestures were easy to execute with the designated finger combinations, and the flexibility in executing the gestures using multiple finger combinations further enhanced the usability. Based on these findings, we created a taxonomy that categorizes Abacus Gestures into five groups based on motor memory development and three difficulty levels according to their ease of execution. Finally, we provided design guidelines and proposed potential use cases for Abacus Gestures in the realm of mid-air interaction" https://doi.org/10.1145/36108982023MEMd Ehtesham-Ul-Haque et al.Hand Gesture RecognitionComputational Methods in HCIUbiComp
Space-Mag: An Automatic, Scalable, and Rapid Space Compactor for Optimizing Smartphone App Interfaces for Low-Vision UsersLow-vision users interact with smartphones via screen magnifiers, which uniformly magnify raw screen pixels, including whitespace and user interface (UI) elements. Screen magnifiers thus occlude important contextual information, such as visual cues, from the user's viewport. This requires low-vision users to pan over the occluded portions and mentally reconstruct the context, which is cumbersome, tiring, and mentally demanding. Prior work aimed to address these usability issues with screen magnifiers by optimizing the representation of UI elements suitable for low-vision users or by magnifying whitespace and non-whitespace content (e.g., text, graphics, borders) differently. This paper combines both techniques and presents SpaceXMag, an optimization framework that automatically reduces whitespace within a smartphone app, thereby packing more information within the current magnification viewport. A study with 11 low-vision users indicates that, with a traditional screen magnifier, the space-optimized UI is more usable and saves at least 28.13% time for overview tasks and 42.89% time for target acquisition tasks, compared to the original, unoptimized UI of the same app. Furthermore, our framework is scalable, fast, and automatable. For example, on a public dataset containing 16, 566 screenshots of different Android apps, it saves approximately 47.17% of the space (area) on average, with a mean runtime of around 1.44 seconds, without requiring any human input. All are indicative of the promise and potential of SpaceXMag for low-vision screen magnifier users. https://doi.org/10.1145/35962532023MIMd Touhidul Islam et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Visualization Perception & CognitionUbiComp
Are Two Heads Better than One? Investigating Remote Sighted Assistance with Paired VolunteersRemote Sighted Assistance (RSA) is a popular smartphone-mediated aid for people with blindness, where a sighted individual converses with a blind individual in a one-on-one (1:1) session. Since sighted assistants outnumber blind individuals (13:1), this paper investigates what happens when more than one sighted individual assists a single blind individual in a session. Specifically, we propose paired-volunteer RSA, a new paradigm where two sighted volunteers assist a single user with blindness. We investigate the feasibility, desirability, and challenges of this paradigm and explore its opportunities. Our study with 8 sighted volunteers and 9 blind users reveals that the proposed paradigm extends the one-on-one RSA to cover a broader range of more intellectual and experiential tasks, providing new and distinctive opportunities in supporting complex, open-ended tasks (e.g., pursuing hobbies, appreciating arts, and seeking entertainment). These opportunities can not only enrich the blind users' quality of life and independence but also offer a fun and engaging experience for the sighted volunteers. The study also reveals the costs of extended collaboration in this paradigm. Finally, we synthesize a taxonomy of tasks where the proposed RSA paradigm can succeed and outline how HCI researchers and system designers can realize this paradigm.2023JXJingyi Xie et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Collaborative Learning & Peer TeachingEmpowerment of Marginalized GroupsDIS
A Probabilistic Model and Metrics for Estimating Perceived Accessibility of Desktop Applications in Keystroke-Based Non-Visual InteractionsPerceived accessibility of an application is a subjective measure of how well an individual with a particular disability, skills, and goals experiences the application via assistive technology. This paper first presents a study with 11 blind users to report how they perceive the accessibility of desktop applications while interacting via assistive technology such as screen readers and a keyboard. The study identifies the low navigational complexity of the user interface (UI) elements as the primary contributor to higher perceived accessibility of different applications. Informed by this study, we develop a probabilistic model that accounts for the number of user actions needed to navigate between any two arbitrary UI elements within an application. This model contributes to the area of computational interaction for non-visual interaction. Next, we derive three metrics from this model: complexity, coverage, and reachability, which reveal important statistical characteristics of an application indicative of its perceived accessibility. The proposed metrics are appropriate for comparing similar applications and can be fine-tuned for individual users to cater to their skills and goals. Finally, we present five use cases, demonstrating how blind users, application developers, and accessibility practitioners can benefit from our model and metrics.2023MIMd Touhidul Islam et al.Pennsylvania State UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUniversal & Inclusive DesignCHI
Accessible Data Representation with Natural SoundSonification translates data into non-speech audio. Such auditory representations can make data visualization accessible to people who are blind or have low vision (BLV). This paper presents a sonification method for translating common data visualization into a blend of natural sounds. We hypothesize that people's familiarity with sounds drawn from nature, such as birds singing in a forest, and their ability to listen to these sounds in parallel, will enable BLV users to perceive multiple data points being sonified at the same time. Informed by an extensive literature review and a preliminary study with 5 BLV participants, we designed an accessible data representation tool, Susurrus, that combines our sonification method with other accessibility features, such as keyboard interaction and text-to-speech feedback. Finally, we conducted a user study with 12 BLV participants and report the potential and application of natural sounds for sonification compared to existing sonification tools.2023MHMd Naimul Hoque et al.University of MarylandVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
Grid-Coding: An Accessible, Efficient, and Structured Coding Paradigm for Blind and Low-Vision ProgrammersSighted programmers often rely on visual cues (e.g., syntax coloring, keyword highlighting, code formatting) to perform common coding activities in text-based languages (e.g., Python). Unfortunately, blind and low-vision (BLV) programmers hardly benefit from these visual cues because they interact with computers via assistive technologies (e.g., screen readers), which fail to communicate visual semantics meaningfully. Prior work on making text-based programming languages and environments accessible mostly focused on code navigation and, to some extent, code debugging, but not much toward code editing, which is an essential coding activity. We present Grid-Coding to fill this gap. Grid-Coding renders source code in a structured 2D grid, where each row, column, and cell have consistent, meaningful semantics. Its design is grounded on prior work and refined by 28 BLV programmers through online participatory sessions for 2 months. We implemented the Grid-Coding prototype as a spreadsheet-like web application for Python and evaluated it with a study with 12 BLV programmers. This study revealed that, compared to a text editor (i.e., the go-to editor for BLV programmers), our prototype enabled BLV programmers to navigate source code quickly, find the context of a statement easily, detect syntax errors in existing code effectively, and write new code with fewer syntax errors. The study also revealed how BLV programmers adopted Grid-Coding and demonstrated novel interaction patterns conducive to increased programming productivity.2022MEMd Ehtesham-Ul-Haque et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUniversal & Inclusive DesignUIST
Opportunities for Human-AI Collaboration in Remote Sighted AssistanceRemote sighted assistance (RSA) has emerged as a conversational assistive technology for people with vision impairments, where trained sighted workers (agents) provide realtime navigational assistance to users with visual impairments via video-chat-like communication. In this paper, we first identify the key challenges that adversely affect the agent-user interaction in RSA services through a literature review and an interview study with 12 RSA users. These challenges can be partly addressed by prior work that uses computer vision (CV) technologies, especially augmented reality-based 3D map construction and realtime localization, in RSA services. We argue that addressing the full spectrum of these challenges warrants new development in Human-CV collaboration, which we formalize as five emerging problems: making object recognition and obstacle avoidance algorithms blind-aware; localizing users under poor networks; recognizing digital content on LCD screens; recognizing texts on irregular surfaces; and predicting the trajectory of out-of-frame pedestrians or objects. Addressing these problems can usher in the next generation of RSA service.2022SLSooyeon Lee et al.Voice AccessibilityAR Navigation & Context AwarenessDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)IUI
Tilt-Explore: Making Tilt Gestures Usable for Low-Vision Smartphone UsersPeople with low vision interact with smartphones using assistive technologies like screen magnifiers, which provide built-in touch gestures to pan and zoom onscreen content. These gestures are often cumbersome and require bimanual interaction. Of particular interest is panning gestures, which are issued frequently, which involve 2- or 3-finger dragging. This paper aims to utilize tilt-based interaction as a single-handed alternative to built-in panning gestures. To that end, we first identified our design space from the literature and conducted an exploratory user study with 12 low-vision participants to understand key challenges. Among many findings, the study revealed that built-in panning gestures are error-prone, and most tilt-based interaction techniques are designed for sighted users, which low vision users struggle to use as-is. We addressed these challenges by adapting low-vision users' interaction behavior and proposed Tilt-Explore, a new screen magnifier mode that enables tilt-to-pan. A second study with 16 low-vision participants revealed that, compared to built-in gestures, the participants were significantly less error-prone; and for lower magnification scale (e.g., <4x), they were significantly more efficient with Tilt-Explore. These findings indicate Tilt-Explore is a promising alternative to built-in panning gestures.2021FMFarhani Momotaz et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUIST
Toward Interactively Balancing the Screen Time of Actors Based on Observable Phenotypic Traits in Live TelecastSeveral prominent studies have shown that the imbalanced on-screen exposure of observable phenotypic traits like gender and skin-tone in movies, TV shows, live telecasts, and other visual media can reinforce gender and racial stereotypes in society. Researchers and human rights organizations alike have long been calling to make media producers more aware of such stereotypes. While awareness among media producers is growing, balancing the presence of different phenotypes in a video requires substantial manual effort and can typically only be done in the post-production phase. The task becomes even more challenging in the case of a live telecast where video producers must make instantaneous decisions with no post-production phase to refine or revert a decision. In this paper, we propose Screen-Balancer, an interactive tool that assists media producers in balancing the presence of different phenotypes in a live telecast. The design of Screen-Balancer is informed by a field study conducted in a professional live studio. Screen-Balancer analyzes the facial features of the actors to determine phenotypic traits using facial detection packages; it then facilitates real-time visual feedback for interactive moderation of gender and skin-tone distributions. To demonstrate the effectiveness of our approach, we conducted a user study with 20 participants and asked them to compose live telecasts from a set of video streams simulating different camera angles, and featuring several male and female actors with different skin-tones. The study revealed that the participants were able to reduce the difference of screen times of male and female actors by 43%, and that of light-skinned and dark-skinned actors by 44%, thus showing the promise and potential of using such a tool in commercial production systems.2020MHMd Naimul Hoque et al.Videos, Live Streaming, and VRCSCW