Subthreshold Jitter in VR Can Induce Visual DiscomfortVisual-vestibular conflicts (VVCs) are a primary contributor to visually induced motion sickness (VIMS) in head-mounted displays (HMDs). However, virtual reality (VR) comfort studies often rely on exposing seated or standing users to experiences with high intensity visual motion (such as roller coasters). These drastic VVCs tend to induce pronounced VIMS symptoms that can be reliably detected across individuals using common survey measures. The conclusions from studies using these extreme motion-based conflicts may not accurately generalize to naturalistic use cases in VR where efforts are made to minimize, rather than maximize, VIMS symptoms. In this work, we show that a subthreshold visual-vestibular conflict can induce measurable discomfort during naturalistic, long duration use. We first present a psychophysical study, conducted outside of an HMD, to rigorously identify the perceptual thresholds for sinusoidal noise in render pose (i.e., jitter) resulting in erroneous 3D motion of rendered content. We next introduce subthreshold levels of jitter to a Meta Quest 3 VR HMD and demonstrate that this can induce visual discomfort in participants playing the commercially-available game Cubism across a three-session, repeated-measures study. Importantly, significant differences in comfort were identified using the Motion Illness Symptoms Classification (MISC) survey administered every 10 minutes across each 90 minute session, but no statistically significant differences between control and jitter conditions were identified using a more traditional comparison of pre- and post-test Simulator Sickness Questionnaire (SSQ) scores. This highlights the benefits of incorporating time-resolved data points and suggests that lightweight, more frequent surveys may be important tools for measuring visual discomfort.2025SLSamuel J Levulis et al.Motion Sickness & Passenger ExperienceImmersion & Presence ResearchUIST
ProMemAssist: Exploring Timely Proactive Assistance Through Working Memory Modeling in Multi-Modal Wearable DevicesWearable AI systems aim to provide timely assistance in daily life, but existing approaches often rely on user initiation or predefined task knowledge, neglecting users' current mental states. We introduce ProMemAssist, a smart glasses system that models a user's working memory (WM) in real-time using multi-modal sensor signals. Grounded in cognitive theories of WM, our system represents perceived information as memory items and episodes with encoding mechanisms, such as displacement and interference. This WM model informs a timing predictor that balances the value of assistance with the cost of interruption. In a user study with 12 participants completing cognitively demanding tasks, ProMemAssist delivered more selective assistance and received higher engagement compared to an LLM baseline system. Qualitative feedback highlights the benefits of WM modeling for nuanced, context-sensitive support, offering design implications for more attentive and user-aware proactive agents.2025KPKevin Pu et al.In-Vehicle Haptic, Audio & Multimodal FeedbackHuman-LLM CollaborationBiosensors & Physiological MonitoringUIST
Viago: Exploring Visual-Audio Modality Transitions for Social Media Consumption on the GoAs mobile phone use while walking becomes increasingly prevalent, users often divide their visual attention between their surroundings and phone displays, raising concerns around safety and interaction efficiency. Alternative input and output modalities, such as eyes-free touch gestures and audio feedback, offer a promising avenue for reducing visual demands in these contexts. However, the design of seamless transitions between visual and audio modalities for mobile interaction on the go remains underexplored. To fill this gap, we conducted a design probe study with ten participants, simulating screen reader-like experiences across diverse applications to identify five key design insights and three design guidelines. Informed by these insights, we developed Viago, a background service that facilitates fluid transitions between visual and audio modalities for mobile task management while walking. A subsequent evaluation with thirteen participants demonstrated that Viago effectively supports on-the-go interactions by enabling users to interleave modalities as needed. We conclude by discussing the broader implications of visual-audio modality transitions and their potential to enhance mobile interactions in everyday, dynamic environments.2025RCYu-Cheng Chang et al.Eye Tracking & Gaze InteractionVoice User Interface (VUI) DesignDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)UIST
Squiggle: Multimodal Lasso Selection in the Real WorldSmart glasses devices are emerging with egocentric cameras and gaze tracking, raising the possibility of new interaction techniques that enable users to reference real-world objects they wish to digitally interact with. However, many of these devices lack a display, making precise object referencing difficult due to the lack of continuous visual feedback. We introduce Squiggle, an interaction technique that enables users to reference real-world objects without continuous feedback by drawing an invisible loop or "lasso" with an imagined ray-cast pointer. Through a virtual reality data collection, we observed that this gesture can elicit useful gaze behavior in addition to providing drawing input itself. Based on these results we implemented and evaluated a real-world prototype of Squiggle, demonstrating that it can improve accuracy of object referencing over Gaze + Pinch alone, particularly for selecting compound objects and groups.2025JFJacqui Fashimpaur et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionContext-Aware ComputingUIST
Contextra: Detecting Object Grasps With Low-Power Cameras and Sensor Fusion On the WristKnowing when a user picks up an object plays a vital role in many context-aware applications. For example, tracking water consumption, counting calories consumed, or reminding you to bring your keys are all context-centered scenarios involving picking up objects. In this project, we propose Contextra, a wrist-worn system that uses sensor fusion to recognize when a user grasps objects. Sensor fusion allows all parts of the grasp to be sensed in ways single channels cannot alone. In our wristband, we fuse EMG and IMU data with video captured from three low-power IR cameras. These cameras maintain privacy by using an active-illumination technique to only capture features close to the sensors. Beyond grasps alone, we see Contextra as playing a foundational role in providing continuous awareness of context triggers to extend the functionality of existing AI devices that cannot run continuously due to power and privacy concerns.2025NDNathan DeVrio et al.Foot & Wrist InteractionContext-Aware ComputingMobileHCI
From Goals to Actions: Designing Context-aware LLM Chatbots for New Year's ResolutionsWhen pursuing new goals, people often struggle to determine what actions to take. Large-language-model (LLM) chatbots can provide information and interactivity, and combining them with context awareness could enhance the relevance and proactivity of action recommendations. However, there is a gap in understanding the role that such technologies can play in taking a holistic view of the user's multiple goals, complex contexts, and constraints over time. We developed a technology probe of a personalized context-aware LLM chatbot and deployed it with 14 participants for 2-4 weeks for their 2024 New Year's resolutions. We observed users achieve a high adoption rate of actions and greater success in the pursuit of goals in the first week, as well as the rapidly evolving user needs over time. We discuss how to best leverage context-awareness for AI agent design, and the novel roles that AI could adopt for an ecosystem of services and agents.2025YXYan Xu et al.Conversational ChatbotsHuman-LLM CollaborationContext-Aware ComputingCUI
Authoring LLM-Based Assistance for Real-World Contexts and TasksAdvances in AI hold the possibility of assisting users with highly varied and individual needs, but the breadth of assistance that these systems could provide creates a challenge for how users specify their goals to the system. To support the authoring of AI assistance for real-world tasks, we propose the concept of Contextually-Driven Prompts (CDPs) that define how an AI assistant should respond to real-world context. We implemented a prototype system for authoring and executing CDPs, which provides suggestions to assist users with finding the right level of assistance for their goal. We also conducted a user study (N=10) to investigate how participants express and refine their goals for real-world tasks. Results revealed a number of strategies for initiating and refining CDPs with suggestions, and implications for the design of future authoring interfaces.2025HDHai Dang et al.Human-LLM CollaborationContext-Aware ComputingIUI
A Dynamic Bayesian Network Based Framework for Multimodal Context-Aware InteractionsMultimodal context-aware interactions integrate multiple sensory inputs, such as gaze, gestures, speech, and environmental signals, to provide adaptive support across diverse user contexts. Building such systems is challenging due to the complexity of sensor fusion, real-time decision-making, and managing uncertainties from noisy inputs. To address these challenges, we propose a hybrid approach combining a dynamic Bayesian network (DBN) with a large language model (LLM). The DBN offers a probabilistic framework for modeling variables, relationships, and temporal dependencies, enabling robust, real-time inference of user intent, while the LLM incorporates world knowledge for contextual reasoning beyond explicitly modeled relationships. We demonstrate our approach with a tri-level DBN implementation for tangible interactions, integrating gaze and hand actions to infer user intent in real time. A user evaluation with 10 participants in an everyday office scenario showed that our system can accurately and efficiently infer user intentions, achieving 0.83 per frame accuracy, even in complex environments. These results validate the effectiveness of the DBN+LLM framework for multimodal context-aware interactions.2025VHJoel Chan et al.Context-Aware ComputingComputational Methods in HCIIUI
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small DevicesLarge Language Models (LLMs) have shown remarkable potential in recommending everyday actions as personal AI assistants, while Explainable AI (XAI) techniques are being increasingly utilized to help users understand why a recommendation is given. Personal AI assistants today are often located on ultra-small devices such as smartwatches, which have limited screen space. The verbosity of LLM-generated explanations, however, makes it challenging to deliver glanceable LLM explanations on such ultra-small devices. To address this, we explored 1) spatially structuring an LLM’s explanation text using defined contextual components during prompting and 2) presenting temporally adaptive explanations to users based on confidence levels. We conducted a user study to understand how these approaches impacted user experiences when interacting with LLM recommendations and explanations on ultra-small devices. The results showed that structured explanations reduced users’ time to action and cognitive load when reading an explanation. Always-on structured explanations increased users’ acceptance of AI recommendations. However, users were less satisfied with structured explanations compared to unstructured ones due to their lack of sufficient, readable details. Additionally, adaptively presenting structured explanations was less effective at improving user perceptions of the AI compared to the always-on structured explanations. Together with users' interview feedback, the results led to design implications to be mindful of when personalizing the content and timing of LLM explanations that are displayed on ultra-small devices.2025XWXinru Wang et al.Human-LLM CollaborationExplainable AI (XAI)IUI
Persistent Assistant: Seamless Everyday AI Interactions via Intent Grounding and Multimodal FeedbackCurrent AI assistants predominantly use natural language interactions, which can be time-consuming and cognitively demanding, especially for frequent, repetitive tasks in daily life. We propose Persistent Assistant, a framework for seamless and unobtrusive interactions with AI assistants. The framework has three key functionalities: (1) efficient intent specification through grounded interactions, (2) seamless target referencing through embodied input, and (3) intuitive response comprehension through multimodal perceptible feedback. We developed a proof-of-concept system for everyday decision-making tasks, where users can easily repeat queries over multiple objects using eye gaze and pinch gesture, as well as receiving multimodal haptic and speech feedback. Our study shows that multimodal feedback enhances user experience and preference by reducing physical demand, increasing perceived speed, and enabling intuitive and instinctive human-AI assistant interaction. We discuss how our framework can be applied to build seamless and unobtrusive AI assistants for everyday persistent tasks.2025HCHyunsung Cho et al.Meta Inc., Reality Labs Research; Carnegie Mellon University, Human-Computer Interaction InstituteIn-Vehicle Haptic, Audio & Multimodal FeedbackVoice User Interface (VUI) DesignIntelligent Voice Assistants (Alexa, Siri, etc.)CHI
VibraForge: A Scalable Prototyping Toolkit For Creating Spatialized Vibrotactile Feedback SystemsSpatialized vibrotactile feedback systems deliver tactile information by placing multiple vibrotactile actuators on the body. As increasing numbers of actuators are required to adequately convey information in complicated applications, haptic designers find it difficult to create such systems due to limited scalability of existing toolkits. We propose VibraForge, an open-source vibrotactile toolkit that supports up to 128 vibrotactile actuators. Each actuator is encapsulated within a self-contained vibration unit and driven by its own microcontroller. By leveraging a chain-connection method, each unit receives independent vibration commands from a control unit, with fine-grained control over intensity and frequency. We also designed a GUI Editor to expedite the authoring of spatial vibrotactile patterns. Technical evaluation showed that vibration units reliably reproduced audio waveforms with low-latency and high-bandwidth data communication. Case studies of a phonemic tactile display, virtual reality fitness training, and drone teleoperation demonstrated the potential usage of VibraForge within different domains. A usability study with non-expert users highlighted the low technical barrier and customizability of the toolkit.2025BHBingjian Huang et al.University of Toronto, Dynamic Graphics Project LabVibrotactile Feedback & Skin StimulationForce Feedback & Pseudo-Haptic WeightCHI
Gesture and Audio-Haptic Guidance Techniques to Direct Conversations with Intelligent Voice InterfacesAdvances in large language models (LLMs) empower new interactive capabilities for wearable voice interfaces, yet traditional voice-and-audio I/O techniques limit users' ability to flexibly navigate information and manage timing for complex conversational tasks. We developed a suite of gesture and audio-haptic guidance techniques that enable users to control conversation flows and maintain awareness of possible future actions, while simultaneously contributing and receiving conversation content through voice and audio. A 14-participant exploratory study compared our parallelized I/O techniques to a baseline of voice-only interaction. The results demonstrate the efficiency of gestures and haptics for information access, while allowing system speech to be redirected and interrupted in a socially acceptable manner. The techniques also raised user awareness of how to leverage intelligent capabilities. Our findings inform design recommendations to facilitate role-based collaboration between multimodal I/O techniques and reduce users' perception of time pressure when interleaving interactions with system speech.2025SRShwetha Rajaram et al.University of Michigan, School of InformationIn-Vehicle Haptic, Audio & Multimodal FeedbackHand Gesture RecognitionVoice User Interface (VUI) DesignCHI
A Multimodal Approach for Targeting Error Detection in Virtual Reality Using Implicit User BehaviorAlthough the point-and-select interaction method has been shown to lead to user and system-initiated errors, it is still prevalent in VR scenarios. Current solutions to facilitate selection interactions exist, however they do not address the challenges caused by targeting inaccuracy. To reduce the effort required to target objects, we developed a model that quickly detected targeting errors after they occurred. The model used implicit multimodal user behavioral data to identify possible targeting outcomes. Using a dataset composed of 23 participants engaged in VR targeting tasks, we then trained a deep learning model to differentiate between correct and incorrect targeting events within 0.5 seconds of a selection, resulting in an AUC-ROC of 0.9. The utility of this model was then evaluated in a user study with 25 participants that identified that participants recovered from more errors and faster when assisted by the model. These results advance our understanding of targeting errors in VR and facilitate the design of future intelligent error-aware systems.2025NSNaveen Sendhilnathan et al.MetaSocial & Collaborative VRImmersion & Presence ResearchHuman-LLM CollaborationCHI
SoundScroll: Robust Finger Slide Detection Using Friction Sound and Wrist-Worn MicrophonesSmartwatches have firmly established themselves as a popular wearable form factor. The potential expansion of their interaction space to nearby surfaces offers a promising avenue for enhancing input accuracy and usability beyond the confines of a small screen. However, a key challenge is in detecting continuous contact states with the surface to inform the start and end of stateful interactions. In this paper, we introduce SoundScroll, enabling a rapid and precise determination of contact state and fingertip speed of sliding finger. We leverage vibrations from friction between a moving finger and a surface. Our proof-of-concept wristband captures a dual-channel vibration signal for robust sensing, considering both on-skin and in-air components. Our software predicts a finger sliding state as fast as 20 ms with an accuracy of 93.3%. Augmenting prior approaches detecting tap events, SoundScroll can be a robust, low-latency, and precise contact and motion sensing technique.2024DKDaehwa Kim et al.Vibrotactile Feedback & Skin StimulationFoot & Wrist InteractionSmartwatches & Fitness BandsUbiComp
SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRWe introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e.g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user's environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.2024HCHyunsung Cho et al.Mid-Air Haptics (Ultrasonic)Eye Tracking & Gaze InteractionSocial & Collaborative VRUIST
TouchpadAnyWear: Textile-Integrated Tactile Sensors for Multimodal High Spatial-Resolution Touch Inputs with Motion Artifacts ToleranceThis paper presents TouchpadAnyWear, a novel family of textile-integrated force sensors capable of multi-modal touch input, encompassing micro-gesture detection, two-dimensional (2D) continuous input, and force-sensitive strokes. This thin (\textless 1.5~mm) and conformal device features high spatial resolution sensing and motion artifact tolerance through its unique capacitive sensor architecture. The sensor consists of a knitted textile compressive core, sandwiched by stretchable silver electrodes, and conductive textile shielding layers on both sides. With a high-density sensor pixel array (25/cm\textsuperscript{2}), TouchpadAnyWear can detect touch input locations and sizes with millimeter-scale spatial resolution and a wide range of force inputs (0.05~N to 20~N). The incorporation of miniature polymer domes, referred to as ``poly-islands'', onto the knitted textile locally stiffens the sensing areas, thereby reducing motion artifacts during deformation. These poly-islands also provide passive tactile feedback to users, allowing for eyes-free localization of the active sensing pixels. Design choices and sensor performance are evaluated using in-depth mechanical characterization. Demonstrations include an 8-by-8 grid sensor as a miniature high-resolution touchpad and a T-shaped sensor for thumb-to-finger micro-gesture input. User evaluations validate the effectiveness and usability of TouchpadAnyWear in daily interaction contexts, such as tapping, forceful pressing, swiping, 2D cursor control, and 2D stroke-based gestures. This paper further discusses potential applications and explorations for TouchpadAnyWear in wearable smart devices, gaming, and augmented reality devices.2024JZJunyi Zhao et al.Haptic WearablesShape-Changing Interfaces & Soft Robotic MaterialsFoot & Wrist InteractionUIST
StegoType: Surface Typing from Egocentric CamerasText input is a critical component of any general purpose computing system, yet efficient and natural text input remains a challenge in AR and VR. Headset based hand-tracking has recently become pervasive among consumer VR devices and affords the opportunity to enable touch typing on virtual keyboards. We present an approach for decoding touch typing on uninstrumented flat surfaces using only egocentric camera-based hand-tracking as input. While egocentric hand-tracking accuracy is limited by issues like self occlusion and image fidelity, we show that a sufficiently diverse training set of hand motions paired with typed text can enable a deep learning model to extract signal from this noisy input. Furthermore, by carefully designing a closed-loop data collection process, we can train an end-to-end text decoder that accounts for natural sloppy typing on virtual keyboards. We evaluate our work with a user study (n=18) showing a mean online throughput of 42.4 WPM with an uncorrected error rate (UER) of 7% with our method compared to a physical keyboard baseline of 74.5 WPM at 0.8% UER, showing progress towards unlocking productivity and high throughput use cases in AR/VR.2024MRMark Richardson et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionImmersion & Presence ResearchUIST
picoRing: battery-free rings for subtle thumb-to-index inputSmart rings for subtle, reliable finger input offer an attractive path for ubiquitous interaction with wearable computing platforms. However, compared to ordinary rings worn for cultural or fashion reasons, smart rings are much bulkier and less comfortable, largely due to the space required for a battery, which also limits the space available for sensors. This paper presents picoRing, a flexible sensing architecture that enables a variety of battery-free smart rings paired with a wristband. By inductively connecting a wristband-based sensitive reader coil with a ring-based fully-passive sensor coil, picoRing enables the wristband to stably detect the passive response from the ring via a weak inductive coupling. We demonstrate four different rings that support thumb-to-finger interactions like pressing, sliding, or scrolling. When users perform these interactions, the corresponding ring converts each input into a unique passive response through a network of passive switches. Combining the coil-based sensitive readout with the fully-passive ring design enables a tiny ring that weighs as little as 1.5 g and achieves a 13 cm stable readout despite finger bending, and proximity to metal.2024RTRyo Takahashi et al.Force Feedback & Pseudo-Haptic WeightHaptic WearablesFoot & Wrist InteractionUIST
RadarHand: a Wrist-Worn Radar for On-Skin Touch based Proprioceptive GesturesWe introduce RadarHand, a wrist-worn wearable with millimetre wave radar that detects on-skin touch-based proprioceptive hand gestures. Radars are robust, private, small, penetrate materials, and require low computation costs. We first evaluated the proprioceptive and tactile perception nature of the back of the hand and found that tapping on the thumb is the least proprioceptive error of all the finger joints, followed by the index finger, middle finger, ring finger, and pinky finger in the eyes-free and high cognitive load situation. Next, we trained deep-learning models for gesture classification. We introduce two types of gestures based on the locations of the back of the hand: generic gestures and discrete gestures. Discrete gestures are gestures that start at specific locations and end at specific locations at the back of the hand, in contrast to generic gestures, which can start anywhere and end anywhere on the back of the hand. Out of 27 gesture group possibilities, we achieved 92% accuracy for a set of seven gestures and 93% accuracy for the set of eight discrete gestures. Finally, we evaluated RadarHand’s performance in real-time under two interaction modes: Active interaction and Reactive interaction. Active interaction is where the user initiates input to achieve the desired output, and reactive interaction is where the device initiates interaction and requires the user to react. We obtained an accuracy of 87% and 74% for active generic and discrete gestures, respectively, as well as 91% and 81.7% for reactive generic and discrete gestures, respectively. We discuss the implications of RadarHand for gesture recognition and directions for future works.2024MHMr Ryo Hajika et al.Vibrotactile Feedback & Skin StimulationFoot & Wrist InteractionUIST
TouchInsight: Uncertainty-aware Rapid Touch and Text Input for Mixed Reality from Egocentric VisionWhile passive surfaces offer numerous benefits for interaction in mixed reality, reliably detecting touch input solely from head-mounted cameras has been a long-standing challenge. Camera specifics, hand self-occlusion, and rapid movements of both head and fingers introduce considerable uncertainty about the exact location of touch events. Existing methods have thus not been capable of achieving the performance needed for robust interaction. In this paper, we present a real-time pipeline that detects touch input from all ten fingers on any physical surface, purely based on egocentric hand tracking. Our method TouchInsight comprises a neural network to predict the moment of a touch event, the finger making contact, and the touch location. TouchInsight represents locations through a bivariate Gaussian distribution to account for uncertainties due to sensing inaccuracies, which we resolve through contextual priors to accurately infer intended user input. We first evaluated our method offline and found that it locates input events with a mean error of 6.3 mm, and accurately detects touch events (F1=0.99) and identifies the finger used (F1=0.96). In an online evaluation, we then demonstrate the effectiveness of our approach for a core application of dexterous touch input: two-handed text entry. In our study, participants typed 37.0 words per minute with an uncorrected error rate of 2.9% on average.2024PSPaul Streli et al.Mixed Reality WorkspacesImmersion & Presence ResearchVisualization Perception & CognitionUIST