Enabling Auto-Correction on Soft Braille KeyboardA soft Braille keyboard is a graphical representation of the Braille writing system on smartphones. It provides an essential text input method for visually impaired individuals, but accuracy and efficiency remain significant challenges. We present an intelligent Braille keyboard with auto-correction ability, which uses optimal transportation theory to estimate the distances between touch input and Braille patterns, and combines it with a language model to estimate the probability of entering words. The proposed system was evaluated through both simulations and user studies. In a touch interaction simulation on an Android phone and an iPhone, our intelligent Braille keyboard demonstrated superior error correction performance compared to the Android Braille keyboard with proofreading suggestions and the iPhone Braille keyboard with spelling suggestions. It reduced the error rate from 55.81% on Android and 57.13% on iPhone to 19.80% under high typing noise. Furthermore, in a user study of 12 participants who are legally blind, the intelligent Braille keyboard reduced word error rate (WER) by 59.5% (42.53% to 17.28%) with a slight drop of 0.74 words per minute (WPM), compared to a conventional Braille keyboard without auto-correction. These findings suggest that our approach has the potential to greatly improve the typing experience for Braille users on touchscreen devices.2025DZDan Zhang et al.Voice AccessibilityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUIST
Tap&Say: Touch Location-Informed Large Language Model for Multimodal Text Correction on SmartphonesWhile voice input offers a convenient alternative to traditional text editing on mobile devices, practical implementations face two key challenges: 1) reliably distinguishing between editing commands and content dictation, and 2) effortlessly pinpointing the intended edit location. We propose Tap&Say, a novel multimodal system that combines touch interactions with Large Language Models (LLMs) for accurate text correction. By tapping near an error, users signal their edit intent and location, addressing both challenges. Then, the user speaks the correction text. Tap&Say utilizes the touch location, voice input, and existing text to generate contextually relevant correction suggestions. We propose a novel touch location-informed attention layer that integrates the tap location into the LLM's attention mechanism, enabling it to utilize the tap location for text correction. We fine-tuned the touch location-informed LLM on synthetic touch locations and correction commands, achieving significantly higher correction accuracy than the state-of-the-art method VT. A 16-person user study demonstrated that Tap&Say outperforms VT with 16.4% shorter task completion time and 47.5% fewer keyboard clicks and is preferred by users.2025MZMaozheng Zhao et al.Stony Brook University, Department of Computer ScienceHuman-LLM CollaborationCHI
SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a RingFingerspelling is a critical part of American Sign Language (ASL) recognition and has become an accessible optional text entry method for Deaf and Hard of Hearing (DHH) individuals. In this paper, we introduce SpellRing, a single smart ring worn on the thumb that recognizes words continuously fingerspelled in ASL. SpellRing uses active acoustic sensing (via a microphone and speaker) and an inertial measurement unit (IMU) to track handshape and movement, which are processed through a deep learning algorithm using Connectionist Temporal Classification (CTC) loss. We evaluated the system with 20 ASL signers (13 fluent and 7 learners), using the MacKenzie-Soukoref Phrase Set of 1,164 words and 100 phrases. Offline evaluation yielded top-1 and top-5 word recognition accuracies of 82.45% (±9.67%) and 92.42% (±5.70%), respectively. In real-time, the system achieved a word error rate (WER) of 0.099 (±0.039) on the phrases. Based on these results, we discuss key lessons and design implications for future minimally obtrusive ASL recognition wearables.2025HLHyunchul Lim et al.Cornell, Computing and Information ScienceFoot & Wrist InteractionVoice AccessibilityMotor Impairment Assistive Input TechnologiesCHI
LLM Powered Text Entry Decoding and Flexible Typing on SmartphonesLarge language models (LLMs) have shown exceptional performance in various language-related tasks. However, their application in keyboard decoding, which involves converting input signals (e.g. taps and gestures) into text, remains underexplored. This paper presents a fine-tuned FLAN-T5 model for decoding. It achieves 93.1% top-1 accuracy on user-drawn gestures, outperforming the widely adopted SHARK2 decoder, and 95.4% on real-word tap typing data. In particular, our decoder supports Flexible Typing, allowing users to enter a word with taps, gestures, multi-stroke gestures, and tap-gesture combinations. User study results show that Flexible Typing is beneficial and well-received by participants, where 35.9% of words were entered using word gestures, 29.0% with taps, 6.1% with multi-stroke gestures, and the remaining 29.0% using tap-gestures. Our investigation suggests that the LLM-based decoder improves decoding accuracy over existing word gesture decoders while enabling the Flexible Typing method, which enhances the overall typing experience and accommodates diverse user preferences.2025YMYan Ma et al.Stony Brook University, Computer Science DepartmentEV Charging & Eco-Driving InterfacesHuman-LLM CollaborationCHI
Model Touch Pointing and Detect Parkinson's Disease via a Mobile GameLing 等人开发基于移动游戏的触控点建模方法,通过分析玩家在游戏中的触控行为特征,实现帕金森病的早期辅助检测,为疾病筛查提供新途径。2024KLKaiyan Ling et al.Motor Impairment Assistive Input TechnologiesSerious & Functional GamesUbiComp
Accessible Gesture Typing on Smartphones for People with Low VisionWhile gesture typing is widely adopted on touchscreen keyboards, its support for low vision users is limited. We have designed and implemented two keyboard prototypes, layout-magnified and key-magnified keyboards, to enable gesture typing for people with low vision. Both keyboards facilitate uninterrupted access to all keys while the screen magnifier is active, allowing people with low vision to input text with one continuous stroke. Furthermore, we have created a kinematics-based decoding algorithm to accommodate the typing behavior of people with low vision. This algorithm can decode the gesture input even if the gesture trace deviates from a pre-defined word template, and the starting position of the gesture is far from the starting letter of the target word. Our user study showed that the key-magnified keyboard achieved 5.28 words per minute, 27.5% faster than a conventional gesture typing keyboard with voice feedback.2024DZDan Zhang et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUIST
Hand Gesture Recognition for Blind Users by Tracking 3D Gesture TrajectoryHand gestures provide an alternate interaction modality for blind users and can be supported using commodity smartwatches without requiring specialized sensors. The enabling technology is an accurate gesture recognition algorithm, but almost all algorithms are designed for sighted users. Our study shows that blind user gestures are considerably different from sighted users, rendering current recognition algorithms unsuitable. Blind user gestures have high inter-user variance, making learning gesture patterns difficult without large-scale training data. Instead, we design a gesture recognition algorithm that works on a 3D representation of the gesture trajectory, capturing motion in free space. Our insight is to extract a micro-movement in the gesture that is user-invariant and use this micro-movement for gesture classification. To this end, we develop an ensemble classifier that combines image classification with geometric properties of the gesture. Our evaluation demonstrates a 92% classification accuracy, surpassing the next best state-of-the-art which has an accuracy of 82%.2024PKPrerna Khanna et al.Stony Brook UniversityHand Gesture RecognitionVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
TouchType-GAN: Modeling Touch Typing with Generative Adversarial NetworkModels that can generate touch typing tasks are important to the development of touch typing keyboards. We propose TouchType- GAN, a Conditional Generative Adversarial Network that can sim- ulate locations and time stamps of touch points in touch typing. TouchType-GAN takes arbitrary text as input to generate realistic touch typing both spatially (i.e., (𝑥, 𝑦) coordinates of touch points) and temporally (i.e., timestamps of touch points). TouchType-GAN in- troduces a variational generator that estimates Gaussian Distribu- tions for every target letter to prevent mode collapse. Our experi- ments on a dataset with 3k typed sentences show that TouchType- GAN outperforms existing touch typing models, including the Ro- tational Dual Gaussian model for simulating the distribution of touch points, and the Finger-Fitts Euclidean Model for sim- ulating typing time. Overall, our research demonstrates that the proposed GAN structure can learn the distribution of user typed touch points, and the resulting TouchType-GAN can also estimate typing movements. TouchType-GAN can serve as a valuable tool for designing and evaluating touch typing input systems.2023JCJeremy Chu et al.Force Feedback & Pseudo-Haptic WeightHuman-LLM CollaborationUIST
Modeling Touch-based Menu Selection Performance of Blind Users via Reinforcement LearningAlthough menu selection has been extensively studied in HCI, most existing studies have focused on sighted users, leaving blind users' menu selection under-studied. In this paper, we propose a computational model that can simulate blind users’ menu selection performance and strategies, including the way they use techniques like swiping, gliding, and direct touch. We assume that selection behavior emerges as an adaptation to the user's memory of item positions based on experience and feedback from the screen reader. A key aspect of our model is a model of long-term memory, predicting how a user recalls and forgets item position based on previous menu selections. We compare simulation results predicted by our model against data obtained in an empirical study with ten blind users. The model correctly simulated the effect of the menu length and menu arrangement on selection time, the action composition, and the menu selection strategy of the users.2023ZLZhi Li et al.Stony Brook UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
WordGesture-GAN: Modeling Word-Gesture Movement with Generative Adversarial NetworkWord-gesture production models that can synthesize word-gestures are critical to the training and evaluation of word-gesture keyboard decoders. We propose WordGesture-GAN, a conditional generative adversarial network that takes arbitrary text as input to generate realistic word-gesture movements in both spatial (i.e., $(x,y)$ coordinates of touch points) and temporal (i.e., timestamps of touch points) dimensions. WordGesture-GAN introduces a Variational Auto-Encoder to extract and embed variations of user-drawn gestures into a Gaussian distribution which can be sampled to control variation in generated gestures. Our experiments on a dataset with 38k gesture samples show that WordGesture-GAN outperforms existing gesture production models including the minimum jerk model [37] and the style-transfer GAN [31,32] in generating realistic gestures. Overall, our research demonstrates that the proposed GAN structure can learn variations in user-drawn gestures, and the resulting WordGesture-GAN can generate word-gesture movement and predict the distribution of gestures. WordGesture-GAN can serve as a valuable tool for designing and evaluating gestural input systems.2023JCJeremy Chu et al.Stony Brook UniversityHand Gesture RecognitionHuman-LLM CollaborationCreative Coding & Computational ArtCHI
GlanceWriter: Writing Text by Glancing Over Letters with GazeWriting text with eye gaze only is an appealing hands-free text entry method. However, existing gaze-based text entry methods introduce eye fatigue and are slow in typing speed because they often require users to dwell on letters of a word, or mark the starting and ending positions of a gaze path with extra operations for entering a word. In this paper, we propose GlanceWriter, a text entry method that allows users to enter text by glancing over keys one by one without any need to dwell on any keys or specify the starting and ending positions of a gaze path when typing a word. To achieve so, GlanceWriter probabilistically determines the letters to be typed based on the dynamics of gaze movements and gaze locations. Our user studies demonstrate that GlanceWriter significantly improves the text entry performance over EyeSwipe, a dwell-free input method using ``reverse crossing'' to identify the starting and ending keys. GlanceWriter also outperforms the dwell-free gaze input method of Tobii's Communicator 5, a commercial eye gaze-based communication system. Overall, GlanceWriter achieves dwell-free and crossing-free text entry by probabilistically decoding gaze paths, offering a promising gaze-based text entry method.2023WCWenzhe Cui et al.Stony Brook UniversityEye Tracking & Gaze InteractionCHI
Phrase-Gesture Typing on SmartphonesWe study phrase-gesture typing, a gesture typing method that allows users to type short phrases by swiping through all the letters of the words in a phrase using a single, continuous gesture. Unlike word-gesture typing, where text needs to be entered word by word, phrase-gesture typing enters text phrase by phrase. To demonstrate the usability of phrase-gesture typing, we implemented a prototype called PhraseSwipe. Our system is composed of a frontend interface designed specifically for typing through phrases and a backend phrase-level gesture decoder developed based on a transformer-based neural language model. Our decoder was trained using five million phrases of varying lengths of up to five words, chosen randomly from the Yelp Review Dataset. Through a user study with 12 participants, we demonstrate that participants could type using PhraseSwipe at an average speed of 34.5 WPM with a Word Error Rate of 1.1%.2022ZXZheer Xu et al.Voice User Interface (VUI) DesignGenerative AI (Text, Image, Music, Video)UIST
Bayesian Hierarchical Pointing ModelsBayesian hierarchical models are probabilistic models that have hierarchical structures and use Bayesian methods for inferences. In this paper, we extend Fitts' law to be a Bayesian hierarchical pointing model and compare it with the typical pooled pointing models (i.e., treating all observations as the same pool), and the individual pointing models (i.e., building an individual model for each user separately). The Bayesian hierarchical pointing models outperform pooled and individual pointing models in predicting the distribution \hl{and the mean of pointing movement time, especially when the training data are sparse.} Our investigation also shows that \hl{both noninformative and weakly informative priors are adequate for modeling pointing actions,} although the weakly informative prior performs slightly better than the noninformative prior when the training data size is small. Overall, we conclude that the expected advantages of Bayesian hierarchical models hold for the pointing tasks. Bayesian hierarchical modeling should be adopted a more principled and effective approach of building pointing models than the current common practices in HCI which use pooled or individual models.2022HZHANG ZHAO et al.Visualization Perception & CognitionComputational Methods in HCIUIST
EyeSayCorrect: Eye Gaze and Voice Based Hands-free Text Correction for Mobile DevicesText correction on mobile devices usually requires precise and repetitive manual control. In this paper, we present EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices. To correct text with EyeSayCorrect, the user first utilizes the gaze point on the screen to select a word, then speaks the new phrase. EyeSayCorrect would then infer the user's correction intention based on the inputs and the text context. EyeSayCorrect can accommodate ambiguities and noisy input signals. We used a Bayesian approach for determining the selected word given an eye-gaze trajectory. Given each sampling point in an eye-gaze trajectory, the posterior probability of selecting a word is calculated and accumulated. The target word would be selected when its accumulated interest is larger than a threshold. The misspelled words have higher priors. Our evaluation showed that EyeSayCorrect can correct text with promising performance. The mean +/- 95% CI of the task completion time (in seconds) with priors is 11.63 +/- 1.07 for large font size (28 pt) and 11.57 +/- 1.14 for small font size (14 pt). Using priors for misspelled words reduced the task-completion time of large text by 9.26% and small text by 23.79%, and it reduced the text-selecting time of large text by 23.49% and small text by 40.35%. The subjective ratings are also in favor of the method with priors for misspelled words. Overall, EyeSayCorrect utilizes the advantages of eye gaze and voice input, making hands-free text correction available and efficient on mobile devices.2022MZMaozheng Zhao et al.Eye Tracking & Gaze InteractionVoice User Interface (VUI) DesignIUI
Automatically Generating and Improving Voice Command Interface from Operation Sequences on SmartphonesUsing voice commands to automate smartphone tasks (e.g., making a video call) can effectively augment the interactivity of numerous mobile apps. However, creating voice command interfaces requires a tremendous amount of effort in labeling and compiling the graphical user interface (GUI) and the utterance data. In this paper, we propose AutoVCI, a novel approach to automatically generate voice command interface (VCI) from smartphone operation sequences. The generated voice command interface has two distinct features. First, it automatically maps a voice command to GUI operations and fills in parameters accordingly, leveraging the GUI data instead of corpus or hand-written rules. Second, it launches a complementary Q&A dialogue to confirm the intention in case of ambiguity. In addition, the generated voice command interface can learn and evolve from user interactions. It accumulates the history command understanding results to annotate the user’s input and improve its semantic understanding ability. We implemented this approach on Android devices and conducted a two-phase user study with 16 and 67 participants in each phase. Experimental results of the study demonstrated the practical feasibility of AutoVCI.2022LPLihang Pan et al.Tsinghua University, Tsinghua UniversityVoice User Interface (VUI) DesignHuman-LLM CollaborationCHI
Select or Suggest? Reinforcement Learning-based Method for High-Accuracy Target Selection on TouchscreensSuggesting multiple target candidates based on touch input is a possible option for high-accuracy target selection on small touchscreen devices. But it can become overwhelming if suggestions are triggered too often. To address this, we propose SATS, a Suggestion-based Accurate Target Selection method, where target selection is formulated as a sequential decision problem. The objective is to maximize the utility: the negative time cost for the entire target selection procedure. The SATS decision process is dictated by a policy generated using reinforcement learning. It automatically decides when to provide suggestions and when to directly select the target. Our user studies show that SATS reduced error rate and selection time over Shift~\cite{vogel2007shift}, a magnification-based method, and MUCS, a suggestion-based alternative that optimizes the utility for the current selection. SATS also significantly reduced error rate over BayesianCommand~\cite{zhu2020using}, which directly selects targets based on posteriors, with only a minor increase in selection time.2022ZLZhi Li et al.Stony Brook UniversityHand Gesture RecognitionHuman-LLM CollaborationCHI
Modeling Touch Point Distribution with Rotational Dual Gaussian ModelTouch point distribution models are important tools for designing touchscreen interfaces. In this paper, we investigate how the finger movement direction affects the touch point distribution, and how to account for it in modeling. We propose the Rotational Dual Gaussian model, a refinement and generalization of the Dual Gaussian model, to account for the finger movement direction in predicting touch point distribution. In this model, the major axis of the prediction ellipse of the touch point distribution is along the finger movement direction, and the minor axis is perpendicular to the finger movement direction. We also propose using projected target width and height, in lieu of nominal target width and height to model touch point distribution. Evaluation on three empirical datasets shows that the new model reflects the observation that the touch point distribution is elongated along the finger movement direction, and outperforms the original Dual Gaussian Model in all prediction tests. Compared with the original Dual Gaussian model, the Rotational Dual Gaussian model reduces the RMSE of touch error rate prediction from 8.49% to 4.95%, and more accurately predicts the touch point distribution in target acquisition. Using the Rotational Dual Gaussian model can also improve the soft keyboard decoding accuracy on smartwatches.2021YMYan Ma et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionUIST
Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for SmartphonesEditing operations such as cut, copy, paste, and correcting errors in typed text is often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as "bold" to change the format of the fragment, or the user can tap inside a text area and speak a command such as "highlight this paragraph" to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user's editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS’s Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS’s Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method., and reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS’s Voice Control method.2021MZMaozheng Zhao et al.Voice User Interface (VUI) DesignExplainable AI (XAI)UIST
Variance and Distribution Models for Steering TasksSteering law reveals a linear relationship between the movement time MT and the index of difficulty ID in trajectory-based steering tasks. However, it does not relate the variance or distribution of MT to ID. In this paper, we propose and evaluate models that predict the variance and distribution of MT based on ID for steering tasks. We first propose a quadratic variance model which reveals that the variance of MT is quadratically related to ID with the linear coefficient being 0. Empirical evaluation on a new and a previously collected dataset show that the quadratic variance model accounts for between 78% and 97% of variance of observed MT variances; it outperforms other model candidates such as linear and constant models; adding the linear coefficient leads to no improvement on the model fitness. The variance model enables predicting the distribution of MT given ID: we can use the variance model to predict the variance (or scale) parameter and Steering law to predict the mean (or location) parameter of a distribution. We have evaluated six types of distributions for predicting the distribution of MT. Our investigation also shows that positively skewed distribution such as Gamma, Lognormal, Exponentially Modified Gaussian (ExGaussian), and Extreme value distributions outperformed the symmetric distribution such as Gaussian and truncated Gaussian distribution in predicting the MT distribution, and Gamma distribution performed slightly better than other positively skewed distributions. Overall, our research advances the MT prediction of steering tasks from a point estimate to variance and distribution estimates, which provides a more complete understanding of steering behavior and quantifies the uncertainty of MT prediction.2021MWMichael Wang et al.Force Feedback & Pseudo-Haptic WeightFull-Body Interaction & Embodied InputUIST
Modeling Two Dimensional Touch PointingModeling touch pointing is essential to touchscreen interface development and research, as pointing is one of the most basic and common touch actions users perform on touchscreen devices. Finger-Fitts Law [4] revised the conventional Fitts’ law into a 1D (one-dimensional) pointing model for finger touch by explicitly accounting for the fat finger ambiguity (absolute error) problem which was unaccounted for in the original Fitts’ law. We generalize Finger-Fitts law to 2D touch pointing by solving two critical problems. First, we extend two of the most successful 2D Fitts law forms to accommodate finger ambiguity. Second, we discovered that using nominal target width and height is a conceptually simple yet effective approach for defining amplitude and directional constraints for 2D touch pointing across different movement directions. The evaluation shows our derived 2D Finger-Fitts law models can be both principled and powerful. Specifically, they outperformed the existing 2D Fitts’ laws, as measured by the regression coefficient and model selection information criteria (e.g., Akaike Information Criterion) considering the number of parameters. Finally, 2D Finger-Fitts laws also advance our understanding of touch pointing and thereby serve as the basis for touch interface designs.2020YKYu-Jung Ko et al.Force Feedback & Pseudo-Haptic WeightHand Gesture RecognitionPrototyping & User TestingUIST