EchoMind: Supporting Real-time Complex Problem Discussions through Human-AI Collaborative FacilitationTeams often engage in group discussions to leverage collective intelligence when solving complex problems. However, in real-time discussions, such as face-to-face meetings, participants frequently struggle with managing diverse perspectives and structuring content, which can lead to unproductive outcomes like forgetfulness and off-topic conversations. Through a formative study, we explores a human-AI collaborative facilitation approach, where AI assists in establishing a shared knowledge framework to provide a guiding foundation. We present EchoMind, a system that visualizes discussion knowledge through real-time issue mapping. EchoMind empowers participants to maintain focus on specific issues, review key ideas or thoughts, and collaboratively expand the discussion. The system leverages large language models (LLMs) to dynamically organize dialogues into nodes based on the current context recorded on the map. Our user study with four teams (N=16) reveals that EchoMind helps clarify discussion objectives, trace knowledge pathways, and enhance overall productivity. We also discuss the design implications for human-AI collaborative facilitation and the potential of shared knowledge visualization to transform group dynamics in future collaborations.2025WCWeihao Chen et al.Human-AI (and Robot!) CollaborationCSCW
Understanding Users' Perceptions and Expectations toward a Social Balloon Robot via an Exploratory StudyWe are witnessing a new epoch in embodied social agents. Most of the work has focused on ground or desktop robots that enjoy technical maturity and rich social channels but are often limited by terrain. Drones, which enable spatial mobility, currently face issues with safety and proximity. This paper explores a social balloon robot as a viable alternative that combines these advantages and alleviates limitations. To this end, we developed a hardware prototype named BalloonBot that integrates various devices for social functioning and a helium balloon. We conducted an exploratory lab study on users’ perceptions and expectations about its demonstrated interactions and functions. Our results show promise in using such a robot as another form of socially embodied agent. We highlight its unique mobile and approachable characteristics that harvest novel user experiences and outline factors that should be considered before its broad applications.2025CWChongyang Wang et al.Social Robot InteractionUIST
InterQuest: A Mixed-Initiative Framework for Dynamic User Interest Modeling in Conversational SearchIn online information-seeking tasks (e.g., for products and restaurants), users seek information that aligns with their individual preferences to make informed decisions. However, existing systems often struggle to infer users' implicit interests—unstated yet essential preference factors that directly impact decision quality. Our formative study reveals that User-Centric Knowledge—cross-task persistent preference attributes of users (e.g., "user cares about functionality details for electronics")—serves as a key indicator for resolving users' implicit interests. However, constructing such knowledge from task-specific data alone is insufficient due to three types of uncertainties—cold-start limitation, content accuracy, and scope applicability—which require user-provided information for knowledge alignment. Based on these insights, we present InterQuest, an LLM-based conversational search agent that dynamically models user interests. InterQuest combines two strategies: (1) Dynamic User Knowledge Modeling, which infers and adjusts the content and scope of User-Centric Knowledge, and (2) Uncertainty-Driven Questioning, where InterQuest proactively asks questions to resolve knowledge uncertainties. A user study with 18 participants demonstrates that InterQuest outperforms the baselines in user interest inference, accuracy of user knowledge modeling, and the overall information-seeking experience. Additionally, our findings provide valuable design implications for improving mixed-initiative user modeling in future systems.2025YMYu Mei et al.Human-LLM CollaborationRecommender System UXAlgorithmic Fairness & BiasUIST
Unknown Word Detection for English as a Second Language (ESL) Learners using Gaze and Pre-trained Language ModelsEnglish as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.2025JDJiexin Ding et al.Tsinghua University, Key Laboratory of Pervasive Computing, Ministry of Education, Department of Computer Science and Technology, Global Innovation Exchange (GIX) Institute; University of Washington, Paul G. Allen School of Computer Science & EngineeringHuman Pose & Activity RecognitionHuman-LLM CollaborationCHI
From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task AutomationTraditional Programming by Demonstration (PBD) systems primarily automate tasks by recording and replaying operations on Graphical User Interfaces (GUIs), without fully considering the cognitive processes behind operations. This limits their ability to generalize tasks with interdependent operations to new contexts (e.g. collecting and summarizing introductions depending on different search keywords from varied websites). We propose TaskMind, a system that automatically identifies the semantics of operations, and the cognitive dependencies between operations from demonstrations, building a user-interpretable task graph. Users modify this graph to define new task goals, and TaskMind executes the graph to dynamically generalize new parameters for operations, with the integration of Large Language Models (LLMs). We compared TaskMind with a baseline end-to-end LLM which automates tasks from demonstrations and natural language commands, without task graph. In studies with 20 participants on both predefined and customized tasks, TaskMind significantly outperforms the baseline in both success rate and controllability.2025YYYiwen Yin et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
The Odyssey Journey: Top-Tier Medical Resource Seeking for Specialized Disorder in ChinaIt is pivotal for patients to receive accurate health information, diagnoses, and timely treatments. However, in China, the significant imbalanced doctor-to-patient ratio intensifies the information and power asymmetries in doctor-patient relationships. Health information-seeking, which enables patients to collect information from sources beyond doctors, is a potential approach to mitigate these asymmetries. While HCI research predominantly focuses on common chronic conditions, our study focuses on specialized disorders, which are often familiar to specialists but not to general practitioners and the public. With Hemifacial Spasm (HFS) as an example, we aim to understand patients' health information and top-tier medical resource seeking journeys in China. Through interviews with three neurosurgeons and 12 HFS patients from rural and urban areas, and applying Actor-Network Theory, we provide empirical insights into the roles, interactions, and workflows of various actors in the health information-seeking network. We also identified five strategies patients adopted to mitigate asymmetries and access top-tier medical resources, illustrating these strategies as subnetworks within the broader health information-seeking network and outlining their advantages and challenges.2025KCKa I Chan et al.Tsinghua University, Global Innovation ExchangeChronic Disease Self-Management (Diabetes, Hypertension, etc.)Telemedicine & Remote Patient MonitoringCHI
Palmpad: Enabling Real-Time Index-to-Palm Touch Interaction with a Single RGB CameraIndex-to-palm interaction plays a crucial role in Mixed Reality(MR) interactions. However, achieving a satisfactory inter-hand interaction experience is challenging with existing vision-based hand tracking technologies, especially in scenarios where only a single camera is available. Therefore, we introduce Palmpad, a novel sensing method utilizing a single RGB camera to detect the touch of an index finger on the opposite palm. Our exploration reveals that the incorporation of optical flow techniques to extract motion information between consecutive frames for the index finger and palm leads to a significant improvement in touch status determination. By doing so, our CNN model achieves 97.0% recognition accuracy and a 96.1% F1 score. In usability evaluation, we compare Palmpad with Quest's inherent hand gesture algorithms. Palmpad not only delivers superior accuracy 95.3% but also reduces operational demands and significantly improves users’ willingness and confidence. Palmpad aims to enhance accurate touch detection for lightweight MR devices.2025ZHZhe He et al.Tsinghua University, Department of Computer Science and TechnologyHand Gesture RecognitionFull-Body Interaction & Embodied InputMixed Reality WorkspacesCHI
WritingRing: Enabling Natural Handwriting Input with a Single IMU RingTracking continuous 2D sequential handwriting trajectories accurately using a single IMU ring is extremely challenging due to the significant displacement between the IMU's wearing position and the location of the tracked fingertip. We propose WritingRing, a system that uses a single IMU ring worn at the base of the finger to support natural handwriting input and provide real-time 2D trajectories. To achieve this, we first built a handwriting dataset using a touchpad and an IMU ring (N=20). Next, we improved the LSTM model by incorporating streaming input and a TCN network, significantly enhancing accuracy and computational efficiency, and achieving an average trajectory accuracy of 1.63mm. Real-time usability studies demonstrated that the system achieved 88.7% letter recognition accuracy and 68.2% word recognition accuracy, which reached 84.36% when restricting the output to words within a vocabulary of size 3000. WritingRing can also be embedded into existing ring systems, providing a natural and real-time solution for various applications.2025ZHZhe He et al.Tsinghua University, Department of Computer Science and TechnologyElectrical Muscle Stimulation (EMS)Hand Gesture RecognitionFoot & Wrist InteractionCHI
VAction: A Lightweight and Integrated VR Training System for Authentic Film-Shooting ExperienceThe film industry exerts significant economic and cultural influence, and its rapid development is contingent upon the expertise of industry professionals, underscoring the critical importance of film-shooting education. However, this process typically necessitates multiple practice in complex professional venues using expensive equipment, presenting a significant obstacle for ordinary learners who struggle to access such training environments. Despite VR technology has already shown its potential in education, existing research has not addressed the crucial learning component of replicating the shooting process. Moreover, the limited functionality of traditional controllers hinder the fulfillment of the educational requirements. Therefore, we developed VAction VR system, combining high-fidelity virtual environments with a custom-designed controller to simulate the real-world camera operation experience. The system’s lightweight design ensures cost-effective and efficient deployment. Experiment results demonstrated that VAction significantly outperforms traditional methods in both practice effectiveness and user experience, indicating its potential and usefulness in film-shooting education.2025SWShaocong Wang et al.Tsinghua University, Department of Computer Science and TechnologyMixed Reality WorkspacesHome Energy ManagementCHI
AutoPBL: An LLM-powered Platform to Guide and Support Individual Learners Through Self Project-based LearningSelf project-based learning (SPBL) is a popular learning style where learners follow tutorials and build projects by themselves. SPBL combines project-based learning’s benefit of being engaging and effective with the flexibility of self-learning. However, insufficient guidance and support during SPBL may lead to unsatisfactory learning experiences and outcomes. While LLM chatbots (e.g., ChatGPT) could potentially serve as SPBL tutors, we have yet to see an SPBL platform with responsible and systematic LLM integration. To address this gap, we present AutoPBL, an interactive learning platform for SPBL learners. We examined human PBL tutors’ roles through formative interviews to inform our design. AutoPBL features an LLM-guided learning process with checkpoint questions and in-context Q&A. In a user study where 29 beginners learned machine learning through entry-level projects, we found that AutoPBL effectively improves learning outcomes and elicits better learning behavior and metacognition by clarifying current priorities and providing timely assistance.2025YZYihao Zhu et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsCHI
Enhancing Smartphone Eye Tracking with Cursor-Based Interactive Implicit CalibrationThe limited accuracy of eye-tracking on smartphones restricts its use. Existing RGB-camera-based eye-tracking relies on extensive datasets, which could be enhanced by continuous fine-tuning using calibration data implicitly collected from the interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during the interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye-tracking er- ror of 278.3 px (1.60 cm, 2.29◦), representing a 27.2% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze is 150.0 px (0.86 cm) yields the best eye-tracking results.2025CLChang Liu et al.Tsinghua University, Department of Computer Science and TechnologyEye Tracking & Gaze InteractionHuman-LLM CollaborationVisualization Perception & CognitionCHI
Modeling the Impact of Visual Stimuli on Redirection Noticeability with Gaze Behavior in Virtual RealityWhile users could embody virtual avatars that mirror their physical movements in Virtual Reality, these avatars' motions can be redirected to enable novel interactions. Excessive redirection, however, could break the user's sense of embodiment due to perceptual conflicts between vision and proprioception. While prior work focused on avatar-related factors influencing the noticeability of redirection, we investigate how the visual stimuli in the surrounding virtual environment affect user behavior and, in turn, the noticeability of redirection. Given the wide variety of different types of visual stimuli and their tendency to elicit varying individual reactions, we propose to use users' gaze behavior as an indicator of their response to the stimuli and model the noticeability of redirection. We conducted two user studies to collect users' gaze behavior and noticeability, investigating the relationship between them and identifying the most effective gaze behavior features for predicting noticeability. Based on the data, we developed a regression model that takes users' gaze behavior as input and outputs the noticeability of redirection. We then conducted an evaluation study to test our model on unseen visual stimuli, achieving an accuracy of 0.012 MSE. We further implemented an adaptive redirection technique and conducted a proof-of-concept study to evaluate its effectiveness with complex visual stimuli in two applications. The results indicated that participants experienced less physical demanding and a stronger sense of body ownership when using our adaptive technique, demonstrating the potential of our model to support real-world use cases.2025ZLZhipeng Li et al.ETH Zürich, Department of Computer ScienceEye Tracking & Gaze InteractionMixed Reality WorkspacesImmersion & Presence ResearchCHI
Investigating Context-Aware Collaborative Text Entry on Smartphones using Large Language ModelsText entry is a fundamental and ubiquitous task, but users often face challenges such as situational impairments or difficulties in sentence formulation. Motivated by this, we explore the potential of large language models (LLMs) to assist with text entry in real-world contexts. We propose a collaborative smartphone-based text entry system, CATIA, that leverages LLMs to provide text suggestions based on contextual factors, including screen content, time, location, activity, and more. In a 7-day in-the-wild study with 36 participants, the system offered appropriate text suggestions in over 80% of cases. Users exhibited different collaborative behaviors depending on whether they were composing text for interpersonal communication or information services. Additionally, the relevance of contextual factors beyond screen content varied across scenarios. We identified two distinct mental models: AI as a supportive facilitator or as a more equal collaborator. These findings outline the design space for human-AI collaborative text entry on smartphones.2025WCWeihao Chen et al.Tsinghua University, Department of Computer Science and TechnologyVoice User Interface (VUI) DesignHuman-LLM CollaborationContext-Aware ComputingCHI
UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural LanguageWang 等人开发 UbiPhysio 系统,通过动作理解和自然语言反馈,帮助用户进行日常功能锻炼、健身和康复训练。2024CWChongyang Wang et al.Vibrotactile Feedback & Skin StimulationFull-Body Interaction & Embodied InputUbiComp
Evaluating the Privacy Valuation of Personal Data on SmartphonesFan 等人研究智能手机用户对个人数据隐私的价值评估问题。2024LFLihua Fan et al.Privacy Perception & Decision-MakingUbiComp
EasyAsk: An In-App Contextual Tutorial Search Assistant for Older Adults with Voice and Touch InputsGao 等人设计 EasyAsk 应用内教程搜索助手,通过语音和触觉输入帮助老年人在应用中快速找到所需教程。2024WGWeiwei Gao et al.Voice User Interface (VUI) DesignAging-Friendly Technology DesignUbiComp
The EarSAVAS Dataset: Enabling Subject-Aware Vocal Activity Sensing on EarablesZhang 等人构建 EarSAVAS 数据集,支持智能耳穿戴设备进行主体感知的语音活动检测,推动相关算法研究。2024XZXiyuxing Zhang et al.Biosensors & Physiological MonitoringUbiComp
G-VOILA: Gaze-Facilitated Information Querying in Daily ScenariosWang 等人提出 G-VOILA 系统,利用眼动追踪技术 Facilitate 日常场景中的信息查询交互。2024ZWZeyu Wang et al.Eye Tracking & Gaze InteractionUbiComp
ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsBy combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user’s multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease. ReactGenie translates multimodal user commands into NLPL (Natural Language Programming Language), a programming language we created, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed NLPL and composes primitives in the computational model to implement complex user commands. As a result, ReactGenie allows easy implementation and unprecedented richness in commands for end-users of multimodal apps. Our evaluation showed that 12 developers can learn and build a non-trivial ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end-users can complete tasks faster and with less task load using ReactGenie apps.2024JYJackie (Junrui) Yang et al.Stanford UniversityVoice User Interface (VUI) DesignGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users’ physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users’ in-the-moment app usage behaviors, physical contexts, mental states, goals & habits as input, and generates personalized and dynamic persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with its simplified version (remove mental states) and baseline techniques (fixed reminder). The results show that MindShift improves intervention acceptance rates by 4.7-22.5% and reduces smartphone usage duration by 7.4-9.8%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy scale scores. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.2024RWRuolan Wu et al.Tsinghua UniversityHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesPrivacy by Design & User ControlCHI