StegoType: Surface Typing from Egocentric CamerasText input is a critical component of any general purpose computing system, yet efficient and natural text input remains a challenge in AR and VR. Headset based hand-tracking has recently become pervasive among consumer VR devices and affords the opportunity to enable touch typing on virtual keyboards. We present an approach for decoding touch typing on uninstrumented flat surfaces using only egocentric camera-based hand-tracking as input. While egocentric hand-tracking accuracy is limited by issues like self occlusion and image fidelity, we show that a sufficiently diverse training set of hand motions paired with typed text can enable a deep learning model to extract signal from this noisy input. Furthermore, by carefully designing a closed-loop data collection process, we can train an end-to-end text decoder that accounts for natural sloppy typing on virtual keyboards. We evaluate our work with a user study (n=18) showing a mean online throughput of 42.4 WPM with an uncorrected error rate (UER) of 7% with our method compared to a physical keyboard baseline of 74.5 WPM at 0.8% UER, showing progress towards unlocking productivity and high throughput use cases in AR/VR.2024MRMark Richardson et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionImmersion & Presence ResearchUIST
Programming by Voice: Exploring User Preferences and Speaking StylesProgramming by voice is a potentially useful method for individuals with motor impairments. Spoken programs can be challenging for a standard speech recognizer with a language model trained on written text mined from sources such as web pages. Having an effective language model that captures the variability in spoken programs may be necessary for accurate recognition. In this work, we explore how novice and expert programmers speak code without requiring them to adhere to strict grammar rules. We investigate two approaches to collect data by having programmers speak either highlighted or missing lines of code. We observed that expert programmers spoke more naturally, while novice programmers spoke more syntactically. A commercial speech recognizer had a high error rate on our spoken programs. However, by adapting the recognizer's language model with our spoken code transcripts, we were able to substantially reduce the error rate by 27\% relative to the baseline on unseen spoken code.2023SNSadia Nowrin et al.Voice User Interface (VUI) DesignMotor Impairment Assistive Input TechnologiesCUI
LGBTQ Futures and Participatory Design: Investigating visibility, community, and the future of future workshopsThis paper presents the findings from a series of participatory design workshops with LGBTQ people living in the rural Midwestern United States. Using future workshops as a method, we seek to understand contemporary problems facing rural LGBTQ people and leverage design exercises to facilitate community members to come up with creative solutions. What we find are people grappling with the complexities of visibility, safety, and resource access in their rural communities; people who wanted to be able to use and create sociotechnical solutions that could help them navigate these complexities. Drawing on these findings, we argue for further exploration of design that experiments with the tension between visibility and safety for LGBTQ people. Further, we argue that future workshops and participatory design are well-positioned for continued work with marginalized communities, but that we need to maintain political orientations towards liberation and justice.2022JHJean Hardy et al.Maker Culture, Workshops, and Emerging Practices; Maker Culture, Workshops, and Emerging PracticesCSCW
A Performance Evaluation of Nomon: A Flexible Interface for Noisy Single-Switch UsersSome individuals with motor impairments communicate using a single switch --- such as a button click, air puff, or blink. Row-column scanning provides a method for choosing items arranged in a grid using a single switch. An alternative, Nomon, allows potential selections to be arranged arbitrarily rather than requiring a grid (as desired for gaming, drawing, etc.) --- and provides an alternative probabilistic selection method. While past results suggest that Nomon may be faster and easier to use than row-column scanning, no work has yet quantified performance of the two methods over longer time periods or in tasks beyond writing. In this paper, we also develop and validate a webcam-based switch that allows a user without a motor impairment to approximate the response times of a motor-impaired single switch user; although the approximation is not a replacement for testing with single-switch users, it allows us to better initialize, calibrate, and evaluate our method. Over 10 sessions with the webcam switch, we found users typed faster and more easily with Nomon than with row-column scanning. The benefits of Nomon were even more pronounced in a picture-selection task. Evaluation and feedback from a motor-impaired switch user further supports the promise of Nomon.2022NBNicholas Bonaker et al.Massachusetts Institute of TechnologyMotor Impairment Assistive Input TechnologiesCHI
Enhancing the Composition Task in Text Entry Studies: Eliciting Difficult Text and Improving Error Rate CalculationParticipants in text entry studies usually copy phrases or compose novel messages. A composition task mimics actual user behavior and can allow researchers to better understand how a system might perform in reality. A problem with composition is that participants may gravitate towards writing simple text, that is, text containing only common words. Such simple text is insufficient to explore all factors governing a text entry method, such as its error correction features. We contribute to enhancing composition tasks in two ways. First, we show participants can modulate the difficulty of their compositions based on simple instructions. While it took more time to compose difficult messages, they were longer, had more difficult words, and resulted in more use of error correction features. Second, we compare two methods for obtaining a participant's intended text, comparing both methods with a previously proposed crowdsourced judging procedure. We found participant-supplied references were more accurate.2021DGDylan Gaines et al.Michigan Technological UniversityHuman-LLM CollaborationAI-Assisted Creative WritingCHI
VelociWatch: Designing and Evaluating a Virtual Keyboard for the Input of Challenging TextVirtual keyboard typing is typically aided by an auto-correct method that decodes a user's noisy taps into their intended text. This decoding process can reduce error rates and possibly increase entry rates by allowing users to type faster but less precisely. However, virtual keyboard decoders sometimes make mistakes that change a user's desired word into another. This is particularly problematic for challenging text such as proper names. We investigate whether users can guess words that are likely to cause auto-correct problems and whether users can adjust their behavior to assist the decoder. We conduct computational experiments to decide what predictions to offer in a virtual keyboard and design a smartwatch keyboard named VelociWatch. Novice users were able to use the features of VelociWatch to enter challenging text at 17 words-per-minute with a corrected error rate of 3%. Interestingly, they wrote slightly faster and just as accurately on a simpler keyboard with limited correction options. Our finding suggest users may be able to type difficult words on a smartwatch simply by tapping precisely without the use of auto-correct.2019KVKeith Vertanen et al.Michigan Technological UniversityVoice User Interface (VUI) DesignNotification & Interruption ManagementCHI
The Impact of Word, Multiple Word, and Sentence Input on Virtual Keyboard Decoding PerformanceEntering text on non-desktop computing devices is often done via an onscreen virtual keyboard. Input on such keyboards normally consists of a sequence of noisy tap events that specify some amount of text, most commonly a single word. But is single word-at-a-time entry the best choice? This paper compares user performance and recognition accuracy of word-at-a-time, phrase-at-a-time, and sentence-at-a-time text entry on a smartwatch keyboard. We evaluate the impact of differing amounts of input in both text copy and free composition tasks. We found providing input of an entire sentence significantly improved entry rates from 26 wpm to 32 wpm while keeping character error rates below 4%. In offline experiments with more processing power and memory, sentence input was recognized with a much lower 2.0% error rate. Our findings suggest virtual keyboards can enhance performance by encouraging users to provide more input per recognition event.2018KVKeith Vertanen et al.Michigan Technological UniversityVoice User Interface (VUI) DesignSmartwatches & Fitness BandsCHI