Touchscreens in Motion: Quantifying the Impact of Cognitive Load on Distracted DriversThis study investigates the interplay between a driver's cognitive load, touchscreen interactions, and driving performance. Using an N-back task to induce four levels of cognitive load, we measured physiological responses (pupil diameter, electrodermal activity), subjective workload (NASA-TLX), touchscreen performance (Fitts' law), and driving metrics (lateral deviation, throttle control). Our results reveal significant mutual performance degradation, with touchscreen pointing throughput decreasing by over 58.1% during driving conditions and lateral driving deviation increasing by 41.9% when touchscreen interactions were introduced. Under high cognitive load, participants demonstrated a 20.2% increase in pointing movement time, 16.6% decreased pointing throughput, and 26.3% reduced off-road glance durations. We identified a prevalent "hand-before-eye" phenomenon where ballistic hand movements frequently preceded visual attention shifts. These findings quantify the impact of cognitive load on multitasking performance and demonstrate how drivers adapt their visual attention and motor-visual coordination when cognitive resources are constrained.2025XSXiyuan Shen et al.Head-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)In-Vehicle Haptic, Audio & Multimodal FeedbackUIST
SlideAudit: A Dataset and Taxonomy for Automated Evaluation of Presentation SlidesAutomated evaluation of specific graphic designs like presentation slides is an open problem. We present SlideAudit, a dataset for automated slide evaluation. We collaborated with design experts to develop a thorough taxonomy of slide design flaws. Our dataset comprises 2400 slides collected and synthesized from multiple sources, including a subset intentionally modified with specific design problems. We then fully annotated them using our taxonomy through strictly trained crowdsourcing from Prolific. To evaluate whether AI is capable of identifying design flaws, we compared multiple large language models under different prompting strategies, and with an existing design critique pipeline. We show that AI models struggle to accurately identify slide design flaws, with F1 scores ranging from 0.331 to 0.655. Notably, prompting techniques leveraging our taxonomy achieved the highest performance. We further conducted a remediation study to assess AI’s potential for improving slides. Among 82.0% of slides that showed significant improvement, 87.8% of them were improved more with our taxonomy, further demonstrating its utility.2025ZZMingrui Ray Zhang et al.Explainable AI (XAI)Recommender System UXPrototyping & User TestingUIST
ArtInsight: Enabling AI-Powered Artwork Engagement for Mixed Visual-Ability FamiliesWe introduce ArtInsight, a novel AI-powered system to facilitate deeper engagement with child-created artwork in mixed visual-ability families. ArtInsight leverages large language models (LLMs) to craft a respectful and thorough initial description of a child's artwork, and provides: creative AI-generated descriptions for a vivid overview, audio recording to capture the child's own description of their artwork, and a set of AI-generated questions to facilitate discussion between blind or low-vision (BLV) family members and their children. Alongside ArtInsight, we also contribute a new rubric to score AI-generated descriptions of child-created artwork and an assessment of state-of-the-art LLMs. We evaluated ArtInsight with five groups of BLV family members and their children, and as a case study with one BLV child therapist. Our findings highlight a preference for ArtInsight's longer, artistically-tailored descriptions over those generated by existing BLV AI tools. Participants highlighted the creative description and audio recording components as most beneficial, with the former helping "bring a picture to life" and the latter centering the child's narrative to generate context-aware AI responses. Our findings reveal different ways that AI can be used to support art engagement, including before, during, and after interaction with the child artist, as well as expectations that BLV adults and their sighted children have about AI-powered tools.2025ACArnavi Chheda-Kothary et al.Generative AI (Text, Image, Music, Video)AI-Assisted Creative WritingEmpowerment of Marginalized GroupsIUI
ScreenAudit: Detecting Screen Reader Accessibility Errors in Mobile Apps Using Large Language ModelsMany mobile apps are inaccessible, thereby excluding people from their potential benefits. Existing rule-based accessibility checkers aim to mitigate these failures by identifying errors early during development but are constrained in the types of errors they can detect. We present ScreenAudit, an LLM-powered system designed to traverse mobile app screens, extract metadata and transcripts, and identify screen reader accessibility errors overlooked by existing checkers. We recruited six accessibility experts including one screen reader user to evaluate ScreenAudit's reports across 14 unique app screens. Our findings indicate that ScreenAudit achieves an average coverage of 69.2%, compared to only 31.3% with a widely-used accessibility checker. Expert feedback indicated that ScreenAudit delivered higher-quality feedback and addressed more aspects of screen reader accessibility compared to existing checkers, and that ScreenAudit would benefit app developers in real-world settings.2025MZMingyuan Zhong et al.University of Washington, Computer Science & EngineeringGenerative AI (Text, Image, Music, Video)Explainable AI (XAI)Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
The Ability-Based Design Mobile Toolkit (ABD-MT): Developer Support for Runtime Interface Adaptation Based on Users' AbilitiesDespite significant progress in the capabilities of mobile devices and applications, most apps remain oblivious to their users' abilities. To enable apps to respond to users' situated abilities, we created the Ability-Based Design Mobile Toolkit (ABD-MT). ABD-MT integrates with an app's user input and sensors to observe a user's touches, gestures, physical activities, and attention at runtime, to measure and model these abilities, and to adapt interfaces accordingly. Conceptually, ABD-MT enables developers to engage with a user's "ability profile,'' which is built up over time and inspectable through our API. As validation, we created example apps to demonstrate ABD-MT, enabling ability-aware functionality in 91.5% fewer lines of code compared to not using our toolkit. Further, in a study with 11 Android developers, we showed that ABD-MT is easy to learn and use, is welcomed for future use, and is applicable to a variety of end-user scenarios.2024JKJunhan Kong et al.Motor Impairment Assistive Input TechnologiesCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Universal & Inclusive DesignMobileHCI
Ga11y: an Automated GIF Annotation System for Visually Impaired UsersAnimated GIF images have become prevalent in internet culture, often used to express richer and more nuanced meanings than static images. But animated GIFs often lack adequate alternative text descriptions, and it is challenging to generate such descriptions automatically, resulting in inaccessible GIFs for blind or low-vision (BLV) users. To improve the accessibility of animated GIFs for BLV users, we provide a system called Ga11y (pronounced ``galley''), for creating GIF annotations. Ga11y combines the power of machine intelligence and crowdsourcing and has three components: an Android client for submitting annotation requests, a backend server and database, and a web interface where volunteers can respond to annotation requests. We evaluated three human annotation interfaces and employ the one that yielded the best annotation quality. We also conducted a multi-stage evaluation with 12 BLV participants from the United States and China, receiving positive feedback.2022MZMingrui Ray Zhang et al.University of WashingtonVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
An Aligned Rank Transform Procedure for Multifactor Contrast TestsData from multifactor HCI experiments often violates the assumptions of parametric tests (i.e., nonconforming data). The Aligned Rank Transform (ART) has become a popular nonparametric analysis in HCI that can find main and interaction effects in nonconforming data, but leads to incorrect results when used to conduct post hoc contrast tests. We created a new algorithm called ART-C for conducting contrast tests within the ART paradigm and validated it on 72,000 synthetic data sets. Our results indicate that ART-C does not inflate Type I error rates, unlike contrasts based on ART, and that ART-C has more statistical power than a t-test, Mann-Whitney U test, Wilcoxon signed-rank test, and ART. We also extended an open-source tool called ARTool with our ART-C algorithm for both Windows and R. Our validation had some limitations (e.g., only six distribution types, no mixed factorial designs, no random slopes), and data drawn from Cauchy distributions should not be analyzed with ART-C.2021LELisa A. Elkin et al.User Research Methods (Interviews, Surveys, Observation)Computational Methods in HCIUIST
Understanding Blind Screen-Reader Users' Experiences of Digital ArtboardsTwo-dimensional canvases are the core components of many digital productivity and creativity tools, with "artboards" containing objects rather than pixels. Unfortunately, the contents of artboards remain largely inaccessible to blind users relying on screen-readers, but the precise problems are not well understood. This study sought to understand how blind screen-reader users interact with artboards. Specifically, we conducted contextual interviews, observations, and task-based usability studies with 15 blind participants to understand their experiences of artboards found in Microsoft PowerPoint, Apple Keynote, and Google Slides. Participants expressed that the inaccessibility of these artboards contributes to significant educational and professional barriers. We found that the key problems faced were: (1) high cognitive loads from a lack of feedback about artboard contents and object state; (2) difficulty determining relationships among artboard objects; and (3) constant uncertainty about whether object manipulations were successful. We offer design remedies that improve feedback for object state, relationships, and manipulations.2021ASAnastasia Schaadhardt et al.University of WashingtonVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
Voicemoji: Emoji Entry Using Voice for Visually Impaired PeopleKeyboard-based emoji entry can be challenging for people with visual impairments: users have to sequentially navigate emoji lists using screen readers to find their desired emojis, which is a slow and tedious process. In this work, we explore the design and benefits of emoji entry with speech input, a popular text entry method among people with visual impairments. After conducting interviews to understand blind or low vision (BLV) users’ current emoji input experiences, we developed Voicemoji, which (1) outputs relevant emojis in response to voice commands, and (2) provides context-sensitive emoji suggestions through speech output. We also conducted a multi-stage evaluation study with six BLV participants from the United States and six BLV participants from China, finding that Voicemoji significantly reduced entry time by 91.2% and was preferred by all participants over the Apple iOS keyboard. Based on our findings, we present Voicemoji as a feasible solution for voice-based emoji entry.2021MZMingrui Ray Zhang et al.University of WashingtonIntelligent Voice Assistants (Alexa, Siri, etc.)Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
Crowdlicit: A System for Conducting Distributed End-User Elicitation and Identification StudiesEnd-user elicitation studies are a popular design method. Currently, such studies are usually confined to a lab, limiting the number and diversity of participants, and therefore the representativeness of their results. Furthermore, the quality of the results from such studies generally lacks any formal means of evaluation. In this paper, we address some of the limitations of elicitation studies through the creation of the Crowdlicit system along with the introduction of end-user identification studies, which are the reverse of elicitation studies. Crowdlicit is a new web-based system that enables researchers to conduct online and in-lab elicitation and identification studies. We used Crowdlicit to run a crowd-powered elicitation study based on Morris's "Web on the Wall" study (2012) with 78 participants, arriving at a set of symbols that included six new symbols different from Morris's. We evaluated the effectiveness of 49 symbols (43 from Morris and six from Crowdlicit) by conducting a crowd-powered identification study. We show that the Crowdlicit elicitation study resulted in a set of symbols that was significantly more identifiable than Morris's.2019AAAbdullah X. Ali et al.University of WashingtonCrowdsourcing Task Design & Quality ControlUser Research Methods (Interviews, Surveys, Observation)CHI
Text Entry Throughput: Towards Unifying Speed and Accuracy in a Single Performance MetricHuman-computer input performance inherently involves speed-accuracy tradeoffs---the faster users act, the more inaccurate those actions are. Therefore, comparing speeds and accuracies separately can result in ambiguous outcomes: Does a fast but inaccurate technique perform better or worse overall than a slow but accurate one? For pointing, speed and accuracy has been unified for over 60 years as throughput (bits/s) (Crossman 1957, Welford 1968), but to date, no similar metric has been established for text entry. In this paper, we introduce a text entry method-independent throughput metric based on Shannon information theory (1948). To explore the practical usability of the metric, we conducted an experiment in which 16 participants typed with a laptop keyboard using different cognitive sets, i.e., speed-accuracy biases. Our results show that as a performance metric, text entry throughput remains relatively stable under different speed-accuracy conditions. We also evaluated a smartphone keyboard with 12 participants, finding that throughput varied least compared to other text entry metrics. This work allows researchers to characterize text entry performance with a single unified measure of input efficiency.2019MZMingrui Ray Zhang et al.University of WashingtonUser Research Methods (Interviews, Surveys, Observation)Computational Methods in HCICHI
Beyond the Input Stream: Making Text Entry Evaluations More Flexible with Transcription SequencesText entry method-independent evaluation tools are often used to conduct text entry experiments and compute performance metrics, like words per minute and error rates. The input stream paradigm of Soukoreff & MacKenzie (2001, 2003) still remains prevalent, which presents a string for transcription and uses a serial character representation for encoding the text entry process. Although an advance over prior paradigms, the input stream paradigm is unable to support many modern text entry features. To address these limitations, we present “transcription sequences:” for each new input, a snapshot of the entire transcribed string unto that point is captured. By assembling transcription sequences and comparing adjacent strings, we can compute all prior metrics, reduce artificial constraints on text entry evaluations, and introduce new metrics. We conducted a study with 18 participants who typed 1620 phrases using a laptop keyboard, on-screen keyboard, and smartphone keyboard using features such as auto-correction, word prediction, and copy-and-paste. We also evaluated non-keyboard methods Dasher, gesture typing, and T9. Our results show that modern features and methods can be accommodated, prior metrics can be correctly computed, and new metrics can reveal insights. We validated our algorithms using ground truth based on cursor positioning, confirming 100% accuracy. We also provide a new tool, TextTest++, to facilitate web-based evaluations.2019MZMingrui Ray Zhang et al.Prototyping & User TestingUIST
Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural NetworksCurrent text correction processes on mobile touch devices are laborious: users either extensively use backspace, or navigate the cursor to the error position, make a correction, and navigate back, usually by employing multiple taps or drags over small targets. In this paper, we present three novel text correction techniques to improve the efficiency of the correction process: Drag-n-Drop, Drag-n-Throw, and Magic Key. All of the techniques skip error-deletion and cursor-positioning procedures, and instead allow the user to type the correction first, and then apply that correction to a previously committed error. Specifically, Drag-n-Drop allows a user to drag a correction and drop it on the error position. Drag-n-Throw lets a user drag a correction from the keyboard suggestion list and “throw” it to the approximate area of the error text. Our deep learning algorithm determines the most likely error in the targeted area and applies the correction. Magic Key allows a user to type a correction and tap a designated key to highlight possible error candidates. The user can navigate among these candidates by dragging atop the key, and can apply a correction by tapping the key. We evaluated these techniques in both text correction and transcription tasks. Our experiment results show that correction with the new techniques was significantly faster than de facto cursor and backspace-based correction. Our techniques apply to any touch-based text entry method.2019MZMingrui Ray Zhang et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationUIST
“Suddenly, we got to become therapists for each other": Designing Peer Support Chats for Mental HealthTalk therapy is a common, effective, and desirable form of mental health treatment. Yet, it is inaccessible to many people. Enabling peers to chat online using effective principles of talk therapy could help scale this form of mental health care. To understand how such chats could be designed, we conducted a two-week field experiment with 40 people experiencing mental illnesses comparing two types of online chats—chats guided by prompts, and unguided chats. Results show that anxiety was significantly reduced from pre-test to post-test. User feedback revealed that guided chats provided solutions to problems and new perspectives, and were perceived as “deep,” while unguided chats offered personal connection on shared experiences and were experienced as “smooth.” We contribute the design of an online guided chat tool and insights into the design of peer support chat systems that guide users to initiate, maintain, and reciprocate emotional support.2018KOJasper O'Leary et al.University of WashingtonConversational ChatbotsAgent Personality & AnthropomorphismMental Health Apps & Online Support CommunitiesCHI
Drunk User Interfaces: Determining Blood Alcohol Level through Everyday Smartphone TasksBreathalyzers, the standard quantitative method for assessing inebriation, are primarily owned by law enforcement and used only after a potentially inebriated individual is caught driving. However, not everyone has access to such specialized hardware. We present drunk user interfaces: smartphone user interfaces that measure how alcohol affects a person's motor coordination and cognition using performance metrics and sensor data. We examine five drunk user interfaces and combine them to form the "DUI app". DUI uses machine learning models trained on human performance metrics and sensor data to estimate a person’s blood alcohol level (BAL). We evaluated DUI on 14 individuals in a week-long longitudinal study wherein each participant used DUI at various BALs. We found that with a global model that accounts for user-specific learning, DUI can estimate a person’s BAL with an absolute mean error of 0.005% ± 0.007% and a Pearson's correlation coefficient of 0.96 with breathalyzer measurements.2018AMAlex Mariakakis et al.University of WashingtonBiosensors & Physiological MonitoringContext-Aware ComputingCHI