ProMemAssist: Exploring Timely Proactive Assistance Through Working Memory Modeling in Multi-Modal Wearable DevicesWearable AI systems aim to provide timely assistance in daily life, but existing approaches often rely on user initiation or predefined task knowledge, neglecting users' current mental states. We introduce ProMemAssist, a smart glasses system that models a user's working memory (WM) in real-time using multi-modal sensor signals. Grounded in cognitive theories of WM, our system represents perceived information as memory items and episodes with encoding mechanisms, such as displacement and interference. This WM model informs a timing predictor that balances the value of assistance with the cost of interruption. In a user study with 12 participants completing cognitively demanding tasks, ProMemAssist delivered more selective assistance and received higher engagement compared to an LLM baseline system. Qualitative feedback highlights the benefits of WM modeling for nuanced, context-sensitive support, offering design implications for more attentive and user-aware proactive agents.2025KPKevin Pu et al.In-Vehicle Haptic, Audio & Multimodal FeedbackHuman-LLM CollaborationBiosensors & Physiological MonitoringUIST
Squiggle: Multimodal Lasso Selection in the Real WorldSmart glasses devices are emerging with egocentric cameras and gaze tracking, raising the possibility of new interaction techniques that enable users to reference real-world objects they wish to digitally interact with. However, many of these devices lack a display, making precise object referencing difficult due to the lack of continuous visual feedback. We introduce Squiggle, an interaction technique that enables users to reference real-world objects without continuous feedback by drawing an invisible loop or "lasso" with an imagined ray-cast pointer. Through a virtual reality data collection, we observed that this gesture can elicit useful gaze behavior in addition to providing drawing input itself. Based on these results we implemented and evaluated a real-world prototype of Squiggle, demonstrating that it can improve accuracy of object referencing over Gaze + Pinch alone, particularly for selecting compound objects and groups.2025JFJacqui Fashimpaur et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionContext-Aware ComputingUIST
A Dynamic Bayesian Network Based Framework for Multimodal Context-Aware InteractionsMultimodal context-aware interactions integrate multiple sensory inputs, such as gaze, gestures, speech, and environmental signals, to provide adaptive support across diverse user contexts. Building such systems is challenging due to the complexity of sensor fusion, real-time decision-making, and managing uncertainties from noisy inputs. To address these challenges, we propose a hybrid approach combining a dynamic Bayesian network (DBN) with a large language model (LLM). The DBN offers a probabilistic framework for modeling variables, relationships, and temporal dependencies, enabling robust, real-time inference of user intent, while the LLM incorporates world knowledge for contextual reasoning beyond explicitly modeled relationships. We demonstrate our approach with a tri-level DBN implementation for tangible interactions, integrating gaze and hand actions to infer user intent in real time. A user evaluation with 10 participants in an everyday office scenario showed that our system can accurately and efficiently infer user intentions, achieving 0.83 per frame accuracy, even in complex environments. These results validate the effectiveness of the DBN+LLM framework for multimodal context-aware interactions.2025VHJoel Chan et al.Context-Aware ComputingComputational Methods in HCIIUI
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small DevicesLarge Language Models (LLMs) have shown remarkable potential in recommending everyday actions as personal AI assistants, while Explainable AI (XAI) techniques are being increasingly utilized to help users understand why a recommendation is given. Personal AI assistants today are often located on ultra-small devices such as smartwatches, which have limited screen space. The verbosity of LLM-generated explanations, however, makes it challenging to deliver glanceable LLM explanations on such ultra-small devices. To address this, we explored 1) spatially structuring an LLM’s explanation text using defined contextual components during prompting and 2) presenting temporally adaptive explanations to users based on confidence levels. We conducted a user study to understand how these approaches impacted user experiences when interacting with LLM recommendations and explanations on ultra-small devices. The results showed that structured explanations reduced users’ time to action and cognitive load when reading an explanation. Always-on structured explanations increased users’ acceptance of AI recommendations. However, users were less satisfied with structured explanations compared to unstructured ones due to their lack of sufficient, readable details. Additionally, adaptively presenting structured explanations was less effective at improving user perceptions of the AI compared to the always-on structured explanations. Together with users' interview feedback, the results led to design implications to be mindful of when personalizing the content and timing of LLM explanations that are displayed on ultra-small devices.2025XWXinru Wang et al.Human-LLM CollaborationExplainable AI (XAI)IUI
Persistent Assistant: Seamless Everyday AI Interactions via Intent Grounding and Multimodal FeedbackCurrent AI assistants predominantly use natural language interactions, which can be time-consuming and cognitively demanding, especially for frequent, repetitive tasks in daily life. We propose Persistent Assistant, a framework for seamless and unobtrusive interactions with AI assistants. The framework has three key functionalities: (1) efficient intent specification through grounded interactions, (2) seamless target referencing through embodied input, and (3) intuitive response comprehension through multimodal perceptible feedback. We developed a proof-of-concept system for everyday decision-making tasks, where users can easily repeat queries over multiple objects using eye gaze and pinch gesture, as well as receiving multimodal haptic and speech feedback. Our study shows that multimodal feedback enhances user experience and preference by reducing physical demand, increasing perceived speed, and enabling intuitive and instinctive human-AI assistant interaction. We discuss how our framework can be applied to build seamless and unobtrusive AI assistants for everyday persistent tasks.2025HCHyunsung Cho et al.Meta Inc., Reality Labs Research; Carnegie Mellon University, Human-Computer Interaction InstituteIn-Vehicle Haptic, Audio & Multimodal FeedbackVoice User Interface (VUI) DesignIntelligent Voice Assistants (Alexa, Siri, etc.)CHI
A Multimodal Approach for Targeting Error Detection in Virtual Reality Using Implicit User BehaviorAlthough the point-and-select interaction method has been shown to lead to user and system-initiated errors, it is still prevalent in VR scenarios. Current solutions to facilitate selection interactions exist, however they do not address the challenges caused by targeting inaccuracy. To reduce the effort required to target objects, we developed a model that quickly detected targeting errors after they occurred. The model used implicit multimodal user behavioral data to identify possible targeting outcomes. Using a dataset composed of 23 participants engaged in VR targeting tasks, we then trained a deep learning model to differentiate between correct and incorrect targeting events within 0.5 seconds of a selection, resulting in an AUC-ROC of 0.9. The utility of this model was then evaluated in a user study with 25 participants that identified that participants recovered from more errors and faster when assisted by the model. These results advance our understanding of targeting errors in VR and facilitate the design of future intelligent error-aware systems.2025NSNaveen Sendhilnathan et al.MetaSocial & Collaborative VRImmersion & Presence ResearchHuman-LLM CollaborationCHI
SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRWe introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e.g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user's environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.2024HCHyunsung Cho et al.Mid-Air Haptics (Ultrasonic)Eye Tracking & Gaze InteractionSocial & Collaborative VRUIST
MineXR: Mining Personalized Extended Reality InterfacesExtended Reality (XR) interfaces offer engaging user experiences, but their effective design requires a nuanced understanding of user behavior and preferences. This knowledge is challenging to obtain without the widespread adoption of XR devices. We introduce MineXR, a design mining workflow and data analysis platform for collecting and analyzing personalized XR user interaction and experience data. MineXR enables elicitation of personalized interfaces from participants of a data collection: for any particular context, participants create interface elements using application screenshots from their own smartphone, place them in the environment, and simultaneously preview the resulting XR layout on a headset. Using MineXR, we contribute a dataset of personalized XR interfaces collected from 31 participants, consisting of 695 XR widgets created from 178 unique applications. We provide insights for XR widget functionalities, categories, clusters, UI element types, and placement. Our open-source tools and data support researchers and designers in developing future XR interfaces.2024HCHyunsung Cho et al.Carnegie Mellon UniversityMixed Reality WorkspacesImmersion & Presence ResearchInteractive Data VisualizationCHI
Fast-Forward Reality: Authoring Error-Free Context-Aware Policies with Real-Time Unit Tests in Extended RealityAdvances in ubiquitous computing have enabled end-user authoring of context-aware policies (CAPs) that control smart devices based on specific contexts of the user and environment. However, authoring CAPs accurately and avoiding run-time errors is challenging for end-users as it is difficult to foresee CAP behaviors under complex real-world conditions. We propose Fast-Forward Reality, an Extended Reality (XR) based authoring workflow that enables end-users to iteratively author and refine CAPs by validating their behaviors via simulated unit test cases. We develop a computational approach to automatically generate test cases based on the authored CAP and the user's context history. Our system delivers each test case with immersive visualizations in XR, facilitating users to verify the CAP behavior and identify necessary refinements. We evaluated Fast-Forward Reality in a user study (N=12). Our authoring and validation process improved the accuracy of CAPs and the users provided positive feedback on the system usability.2024XQXun Qian et al.Reality Labs ResearchContext-Aware ComputingUbiquitous ComputingCHI
A Meta-Bayesian Approach for Rapid Online Parametric Optimization for Wrist-based InteractionsWrist-based input often requires tuning parameter settings in correspondence to between-user and between-session differences, such as variations in hand anatomy, wearing position, posture, etc. Traditionally, users either work with predefined parameter values not optimized for individuals or undergo time-consuming calibration processes. We propose an online Bayesian Optimization (BO)-based method for rapidly determining the user-specific optimal settings of wrist-based pointing. Specifically, we develop a meta-Bayesian optimization (meta-BO) method, differing from traditional human-in-the-loop BO: By incorporating meta-learning of prior optimization data from a user population with BO, meta-BO enables rapid calibration of parameters for new users with a handful of trials. We evaluate our method with two representative and distinct wrist-based interactions: absolute and relative pointing. On a weighted-sum metric that consists of completion time, aiming error, and trajectory quality, meta-BO improves absolute pointing performance by 22.92% and 21.35% compared to BO and manual calibration, and improves relative pointing performance by 25.43% and 13.60%.2024YLYi-Chi Liao et al.Aalto UniversityVibrotactile Feedback & Skin StimulationForce Feedback & Pseudo-Haptic WeightFoot & Wrist InteractionCHI
Investigating Wrist Deflection Scrolling Techniques for Extended RealityScrolling in extended reality (XR) is currently performed using handheld controllers or vision-based arm-in-front gestures, which have the limitations of encumbering the user's hands or requiring a specific arm posture, respectively. To address these limitations, we investigate freehand, posture-independent scrolling driven by wrist deflection. We propose two novel techniques: Wrist Joystick, which uses rate control, and Wrist Drag, which uses position control. In an empirical study of a rapid item acquisition task and a casual browsing task, both Wrist Drag and Wrist Joystick performed on par with a comparable state-of-the-art technique on one of the two tasks. Further, using a relaxed arm-at-side posture, participants retained their arm-in-front performance for both wrist techniques. Finally, we analyze behavioral and ergonomic data to provide design insights for wrist deflection scrolling. Our results demonstrate that wrist deflection provides a promising method for performant scrolling controls while offering additional benefits over existing XR interaction techniques.2023JFJacqui Fashimpaur et al.Meta Inc.Foot & Wrist InteractionMixed Reality WorkspacesImmersion & Presence ResearchCHI
Investigating Eyes-away Mid-air Typing in Virtual Reality using Squeeze haptics-based Postural ReinforcementIn this paper, we investigate postural reinforcement haptics for mid-air typing using squeeze actuation on the wrist. We propose and validate eye-tracking based objective metrics that capture the impact of haptics on the user's experience, which traditional performance metrics like speed and accuracy are not able to capture. To this end, we design four wrist-based haptic feedback conditions: no haptics, vibrations on keypress, squeeze+vibrations on keypress, and squeeze posture reinforcement + vibrations on keypress. We conduct a text input study with 48 participants to compare the four conditions on typing and gaze metrics. Our results show that for expert qwerty users, posture reinforcement haptics significantly benefit typing by reducing the visual attention on the keyboard by up to 44% relative to no haptics, thus enabling eyes-away behaviors.2023AGAakar Gupta et al.Meta IncVibrotactile Feedback & Skin StimulationHand Gesture RecognitionEye Tracking & Gaze InteractionCHI
XAIR: A Framework of Explainable AI in Augmented RealityExplainable AI (XAI) has established itself as an important component of AI-driven interactive systems. With Augmented Reality (AR) becoming more integrated in daily lives, the role of XAI also becomes essential in AR because end-users will frequently interact with intelligent services. However, it is unclear how to design effective XAI experiences for AR. We propose XAIR, a design framework that addresses when, what, and how to provide explanations of AI output in AR. The framework was based on a multi-disciplinary literature review of XAI and HCI research, a large-scale survey probing 500+ end-users’ preferences for AR-based explanations, and three workshops with 12 experts collecting their insights about XAI design in AR. XAIR's utility and effectiveness was verified via a study with 10 designers and another study with 12 end-users. XAIR can provide guidelines for designers, inspiring them to identify new design opportunities and achieve effective XAI designs in AR.2023XXXuhai Xu et al.Reality Labs Research, University of WashingtonAR Navigation & Context AwarenessExplainable AI (XAI)CHI
RIDS: Implicit Detection of a Selection Gesture Using Hand Motion Dynamics During Freehand Pointing in Virtual RealityFreehand interactions with augmented and virtual reality are grow- ing in popularity, but they lack reliability and robustness. Implicit behavior from users, such as hand or gaze movements, might pro- vide additional signals to improve the reliability of input. In this paper, the primary goal is to improve the detection of a selection gesture in VR during point-and-click interaction. Thus, we propose and investigate the use of information contained within the hand motion dynamics that precede a selection gesture. We built two models that classified if a user is likely to perform a selection gesture at the current moment in time. We collected data during a pointing-and-selection task from 15 participants and trained two models with different architectures, i.e., a logistic regression classifier was trained using predefined hand motion features and a temporal convolutional network (TCN) classifier was trained using raw hand motion data. Leave-one-subject-out cross-validation PR- AUCs of 0.36 and 0.90 were obtained for each model respectively, demonstrating that the models performed well above chance (=0.13). The TCN model was found to improve the precision of a noisy selection gesture by 11.2% without sacrificing recall performance. An initial analysis of the generalizability of the models demonstrated above-chance performance, suggesting that this approach could be scaled to other interaction tasks in the future.2022ZHZhenhong Hu et al.Hand Gesture RecognitionHuman Pose & Activity RecognitionUIST
Detecting Input Recognition Errors and User Errors Using Gaze Dynamics in Virtual RealityGesture-based recognition systems are susceptible to input recognition errors and user errors, both of which negatively affect user experiences and can be frustrating to correct. Prior work has suggested that user gaze patterns following an input event could be used to detect input recognition errors and subsequently improve interaction. However, to be useful, error detection systems would need to detect various types of high-cost errors. Furthermore, to build a reliable detection model for errors, gaze behaviour following these errors must be manifested consistently across different tasks. Using data analysis and machine learning models, this research examined gaze dynamics following input events in virtual reality (VR). Across three distinct point-and-select tasks, we found differences in user gaze patterns following three input events: correctly recognized input actions, input recognition errors, and user errors. These differences were consistent across tasks, selection versus deselection actions, and naturally occurring versus experimentally injected input recognition errors. A multi-class deep neural network successfully discriminated between these three input events using only gaze dynamics, achieving an AUC-ROC-OVR score of 0.78. Together, these results demonstrate the utility of gaze in detecting interaction errors and have implications for the design of intelligent systems that can assist with adaptive error recovery.2022NSNaveen Sendhilnathan et al.Eye Tracking & Gaze InteractionHuman Pose & Activity RecognitionImmersion & Presence ResearchUIST
Optimizing the Timing of Intelligent Suggestion in Virtual RealityIntelligent suggestion techniques can enable low-friction selection-based input within virtual or augmented reality (VR/AR) systems. Such techniques leverage probability estimates from a target prediction model to provide users with an easy-to-use method to select the most probable target in an environment. For example, a system could highlight the predicted target and enable a user to select it with a simple click. However, as the probability estimates can be made at any time, it is unclear when an intelligent suggestion should be presented. Earlier suggestions could save a user time and effort but be less accurate. Later suggestions, on the other hand, could be more accurate but save less time and effort. This paper thus proposes a computational framework that can be used to determine the optimal timing of intelligent suggestions based on user-centric costs and benefits. A series of studies demonstrated the value of the framework for minimizing task completion time and maximizing suggestion usage and showed that it was both theoretically and empirically effective at determining the optimal timing for intelligent suggestions.2022DYDifeng Yu et al.Social & Collaborative VRAI-Assisted Decision-Making & AutomationUIST