EchoMind: Supporting Real-time Complex Problem Discussions through Human-AI Collaborative FacilitationTeams often engage in group discussions to leverage collective intelligence when solving complex problems. However, in real-time discussions, such as face-to-face meetings, participants frequently struggle with managing diverse perspectives and structuring content, which can lead to unproductive outcomes like forgetfulness and off-topic conversations. Through a formative study, we explores a human-AI collaborative facilitation approach, where AI assists in establishing a shared knowledge framework to provide a guiding foundation. We present EchoMind, a system that visualizes discussion knowledge through real-time issue mapping. EchoMind empowers participants to maintain focus on specific issues, review key ideas or thoughts, and collaboratively expand the discussion. The system leverages large language models (LLMs) to dynamically organize dialogues into nodes based on the current context recorded on the map. Our user study with four teams (N=16) reveals that EchoMind helps clarify discussion objectives, trace knowledge pathways, and enhance overall productivity. We also discuss the design implications for human-AI collaborative facilitation and the potential of shared knowledge visualization to transform group dynamics in future collaborations.2025WCWeihao Chen et al.Human-AI (and Robot!) CollaborationCSCW
Understanding Users' Perceptions and Expectations toward a Social Balloon Robot via an Exploratory StudyWe are witnessing a new epoch in embodied social agents. Most of the work has focused on ground or desktop robots that enjoy technical maturity and rich social channels but are often limited by terrain. Drones, which enable spatial mobility, currently face issues with safety and proximity. This paper explores a social balloon robot as a viable alternative that combines these advantages and alleviates limitations. To this end, we developed a hardware prototype named BalloonBot that integrates various devices for social functioning and a helium balloon. We conducted an exploratory lab study on users’ perceptions and expectations about its demonstrated interactions and functions. Our results show promise in using such a robot as another form of socially embodied agent. We highlight its unique mobile and approachable characteristics that harvest novel user experiences and outline factors that should be considered before its broad applications.2025CWChongyang Wang et al.Social Robot InteractionUIST
InterQuest: A Mixed-Initiative Framework for Dynamic User Interest Modeling in Conversational SearchIn online information-seeking tasks (e.g., for products and restaurants), users seek information that aligns with their individual preferences to make informed decisions. However, existing systems often struggle to infer users' implicit interests—unstated yet essential preference factors that directly impact decision quality. Our formative study reveals that User-Centric Knowledge—cross-task persistent preference attributes of users (e.g., "user cares about functionality details for electronics")—serves as a key indicator for resolving users' implicit interests. However, constructing such knowledge from task-specific data alone is insufficient due to three types of uncertainties—cold-start limitation, content accuracy, and scope applicability—which require user-provided information for knowledge alignment. Based on these insights, we present InterQuest, an LLM-based conversational search agent that dynamically models user interests. InterQuest combines two strategies: (1) Dynamic User Knowledge Modeling, which infers and adjusts the content and scope of User-Centric Knowledge, and (2) Uncertainty-Driven Questioning, where InterQuest proactively asks questions to resolve knowledge uncertainties. A user study with 18 participants demonstrates that InterQuest outperforms the baselines in user interest inference, accuracy of user knowledge modeling, and the overall information-seeking experience. Additionally, our findings provide valuable design implications for improving mixed-initiative user modeling in future systems.2025YMYu Mei et al.Human-LLM CollaborationRecommender System UXAlgorithmic Fairness & BiasUIST
From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task AutomationTraditional Programming by Demonstration (PBD) systems primarily automate tasks by recording and replaying operations on Graphical User Interfaces (GUIs), without fully considering the cognitive processes behind operations. This limits their ability to generalize tasks with interdependent operations to new contexts (e.g. collecting and summarizing introductions depending on different search keywords from varied websites). We propose TaskMind, a system that automatically identifies the semantics of operations, and the cognitive dependencies between operations from demonstrations, building a user-interpretable task graph. Users modify this graph to define new task goals, and TaskMind executes the graph to dynamically generalize new parameters for operations, with the integration of Large Language Models (LLMs). We compared TaskMind with a baseline end-to-end LLM which automates tasks from demonstrations and natural language commands, without task graph. In studies with 20 participants on both predefined and customized tasks, TaskMind significantly outperforms the baseline in both success rate and controllability.2025YYYiwen Yin et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
Palmpad: Enabling Real-Time Index-to-Palm Touch Interaction with a Single RGB CameraIndex-to-palm interaction plays a crucial role in Mixed Reality(MR) interactions. However, achieving a satisfactory inter-hand interaction experience is challenging with existing vision-based hand tracking technologies, especially in scenarios where only a single camera is available. Therefore, we introduce Palmpad, a novel sensing method utilizing a single RGB camera to detect the touch of an index finger on the opposite palm. Our exploration reveals that the incorporation of optical flow techniques to extract motion information between consecutive frames for the index finger and palm leads to a significant improvement in touch status determination. By doing so, our CNN model achieves 97.0% recognition accuracy and a 96.1% F1 score. In usability evaluation, we compare Palmpad with Quest's inherent hand gesture algorithms. Palmpad not only delivers superior accuracy 95.3% but also reduces operational demands and significantly improves users’ willingness and confidence. Palmpad aims to enhance accurate touch detection for lightweight MR devices.2025ZHZhe He et al.Tsinghua University, Department of Computer Science and TechnologyHand Gesture RecognitionFull-Body Interaction & Embodied InputMixed Reality WorkspacesCHI
WritingRing: Enabling Natural Handwriting Input with a Single IMU RingTracking continuous 2D sequential handwriting trajectories accurately using a single IMU ring is extremely challenging due to the significant displacement between the IMU's wearing position and the location of the tracked fingertip. We propose WritingRing, a system that uses a single IMU ring worn at the base of the finger to support natural handwriting input and provide real-time 2D trajectories. To achieve this, we first built a handwriting dataset using a touchpad and an IMU ring (N=20). Next, we improved the LSTM model by incorporating streaming input and a TCN network, significantly enhancing accuracy and computational efficiency, and achieving an average trajectory accuracy of 1.63mm. Real-time usability studies demonstrated that the system achieved 88.7% letter recognition accuracy and 68.2% word recognition accuracy, which reached 84.36% when restricting the output to words within a vocabulary of size 3000. WritingRing can also be embedded into existing ring systems, providing a natural and real-time solution for various applications.2025ZHZhe He et al.Tsinghua University, Department of Computer Science and TechnologyElectrical Muscle Stimulation (EMS)Hand Gesture RecognitionFoot & Wrist InteractionCHI
AutoPBL: An LLM-powered Platform to Guide and Support Individual Learners Through Self Project-based LearningSelf project-based learning (SPBL) is a popular learning style where learners follow tutorials and build projects by themselves. SPBL combines project-based learning’s benefit of being engaging and effective with the flexibility of self-learning. However, insufficient guidance and support during SPBL may lead to unsatisfactory learning experiences and outcomes. While LLM chatbots (e.g., ChatGPT) could potentially serve as SPBL tutors, we have yet to see an SPBL platform with responsible and systematic LLM integration. To address this gap, we present AutoPBL, an interactive learning platform for SPBL learners. We examined human PBL tutors’ roles through formative interviews to inform our design. AutoPBL features an LLM-guided learning process with checkpoint questions and in-context Q&A. In a user study where 29 beginners learned machine learning through entry-level projects, we found that AutoPBL effectively improves learning outcomes and elicits better learning behavior and metacognition by clarifying current priorities and providing timely assistance.2025YZYihao Zhu et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsCHI
Enhancing Smartphone Eye Tracking with Cursor-Based Interactive Implicit CalibrationThe limited accuracy of eye-tracking on smartphones restricts its use. Existing RGB-camera-based eye-tracking relies on extensive datasets, which could be enhanced by continuous fine-tuning using calibration data implicitly collected from the interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during the interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye-tracking er- ror of 278.3 px (1.60 cm, 2.29◦), representing a 27.2% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze is 150.0 px (0.86 cm) yields the best eye-tracking results.2025CLChang Liu et al.Tsinghua University, Department of Computer Science and TechnologyEye Tracking & Gaze InteractionHuman-LLM CollaborationVisualization Perception & CognitionCHI
Investigating Context-Aware Collaborative Text Entry on Smartphones using Large Language ModelsText entry is a fundamental and ubiquitous task, but users often face challenges such as situational impairments or difficulties in sentence formulation. Motivated by this, we explore the potential of large language models (LLMs) to assist with text entry in real-world contexts. We propose a collaborative smartphone-based text entry system, CATIA, that leverages LLMs to provide text suggestions based on contextual factors, including screen content, time, location, activity, and more. In a 7-day in-the-wild study with 36 participants, the system offered appropriate text suggestions in over 80% of cases. Users exhibited different collaborative behaviors depending on whether they were composing text for interpersonal communication or information services. Additionally, the relevance of contextual factors beyond screen content varied across scenarios. We identified two distinct mental models: AI as a supportive facilitator or as a more equal collaborator. These findings outline the design space for human-AI collaborative text entry on smartphones.2025WCWeihao Chen et al.Tsinghua University, Department of Computer Science and TechnologyVoice User Interface (VUI) DesignHuman-LLM CollaborationContext-Aware ComputingCHI
UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural LanguageWang 等人开发 UbiPhysio 系统,通过动作理解和自然语言反馈,帮助用户进行日常功能锻炼、健身和康复训练。2024CWChongyang Wang et al.Vibrotactile Feedback & Skin StimulationFull-Body Interaction & Embodied InputUbiComp
EasyAsk: An In-App Contextual Tutorial Search Assistant for Older Adults with Voice and Touch InputsGao 等人设计 EasyAsk 应用内教程搜索助手,通过语音和触觉输入帮助老年人在应用中快速找到所需教程。2024WGWeiwei Gao et al.Voice User Interface (VUI) DesignAging-Friendly Technology DesignUbiComp
G-VOILA: Gaze-Facilitated Information Querying in Daily ScenariosWang 等人提出 G-VOILA 系统,利用眼动追踪技术 Facilitate 日常场景中的信息查询交互。2024ZWZeyu Wang et al.Eye Tracking & Gaze InteractionUbiComp
ContextCam: Bridging Context Awareness with Creative Human-AI Image Co-CreationThe rapid advancement of AI-generated content (AIGC) promises to transform various aspects of human life significantly. This work particularly focuses on the potential of AIGC to revolutionize image creation, such as photography and self-expression. We introduce ContextCam, a novel human-AI image co-creation system that integrates context awareness with mainstream AIGC technologies like Stable Diffusion. ContextCam provides user's image creation process with inspiration by extracting relevant contextual data, and leverages Large Language Model-based (LLM) multi-agents to co-create images with the user. A study with 16 participants and 136 scenarios revealed that ContextCam was well-received, showcasing personalized and diverse outputs as well as interesting user behavior patterns. Participants provided positive feedback on their engagement and enjoyment when using ContextCam, and acknowledged its ability to inspire creativity.2024XFXianzhe Fan et al.Tsinghua UniversityGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsPhotography & Image ProcessingCHI
MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users’ physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users’ in-the-moment app usage behaviors, physical contexts, mental states, goals & habits as input, and generates personalized and dynamic persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with its simplified version (remove mental states) and baseline techniques (fixed reminder). The results show that MindShift improves intervention acceptance rates by 4.7-22.5% and reduces smartphone usage duration by 7.4-9.8%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy scale scores. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.2024RWRuolan Wu et al.Tsinghua UniversityHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesPrivacy by Design & User ControlCHI
PepperPose: Full-Body Pose Estimation with a Companion RobotAccurate full-body pose estimation across diverse actions in a user-friendly and location-agnostic manner paves the way for interactive applications in realms like sports, fitness, and healthcare. This task becomes challenging in real-world scenarios due to factors like the user's dynamic positioning, the diversity of actions, and the varying acceptability of the pose-capturing system. In this context, we present PepperPose, a novel companion robot system tailored for optimized pose estimation. Unlike traditional methods, PepperPose actively tracks the user and refines its viewpoint, facilitating enhanced pose accuracy across different locations and actions. This allows users to enjoy a seamless action-sensing experience. Our evaluation, involving 30 participants undertaking daily functioning and exercise actions in a home-like space, underscores the robot's promising capabilities. Moreover, we demonstrate the opportunities that PepperPose presents for human-robot interaction, its current limitations, and future developments.2024CWChongyang Wang et al.Tsinghua UniversityHuman Pose & Activity RecognitionHuman-Robot Collaboration (HRC)CHI
Exploring Experience Gaps Between Active and Passive Users During Multi-user Locomotion in VRMulti-user locomotion in VR has grown increasingly common, posing numerous challenges. A key factor contributing to these challenges is the gaps in experience between active and passive users during co-locomotion. Yet, there remains a limited understanding of how and to what extent these experiential gaps manifest in diverse multi-user co-locomotion scenarios. This paper systematically explores the gaps in physiological and psychological experience indicators between active and passive users across various locomotion situations. Such situations include when active users walk, fly by joystick, or teleport, and passive users stand still or look around. We also assess the impact of factors such as sub-locomotion type, speed/teleport-interval, motion sickness susceptibility, etc. Accordingly, we delineate acceptability disparities between active and passive users, offering insights into leveraging notable experimental findings to mitigate discomfort during co-locomotion through avoidance or intervention.2024TLTianren Luo et al.Institute of Software, College of Computer Science and TechnologySocial & Collaborative VRImmersion & Presence ResearchCHI
DRG-Keyboard: Enabling Subtle Gesture Typing on the Fingertip with Dual IMU Rings"We present DRG-Keyboard, a gesture keyboard enabled by dual IMU rings, allowing the user to swipe the thumb on the index fingertip to perform word gesture typing as if typing on a miniature QWERTY keyboard. With dual IMUs attached to the user's thumb and index finger, DRG-Keyboard can 1) measure the relative attitude while mapping it to the 2D fingertip coordinates and 2) detect the thumb's touch-down and touch-up events combining the relative attitude data and the synchronous frequency domain data, based on which a fingertip gesture keyboard can be implemented. To understand users typing behavior on the index fingertip with DRG-Keyboard, we collected and analyzed user data in two typing manners. Based on the statistics of the gesture data, we enhanced the elastic matching algorithm with rigid pruning and distance measurement transform. The user study showed DRG-Keyboard achieved an input speed of 12.9 WPM (68.3% of their gesture typing speed on the smartphone) for all participants. The appending study also demonstrated the superiority of DRG-Keyboard for better form factors and wider usage scenarios. To sum up, DRG-Keyboard not only achieves good text entry speed merely on a tiny fingertip input surface, but is also well accepted by the participants for the input subtleness, accuracy, good haptic feedback, and availability. https://doi.org/10.1145/3569463"2023CLChen Liang et al.Vibrotactile Feedback & Skin StimulationHaptic WearablesHand Gesture RecognitionUbiComp
From 2D to 3D: Facilitating Single-Finger Mid-Air Typing on QWERTY Keyboards with Probabilistic Touch ModelingMid-air text entry on virtual keyboards suffers from the lack of tactile feedback, which brings challenges to both tap detection and input prediction. In this paper, we explored the feasibility of single-finger typing on virtual QWERTY keyboards in mid-air. We first conducted a study to examine users' 3D typing behavior on different sizes of virtual keyboards. Results showed that the participants perceived the vertical projection of the lowest point on the keyboard during a tap as the target location and inferring taps based on the intersection between the finger and the keyboard was not applicable. Aiming at this challenge, we derived a novel input prediction algorithm that took the uncertainty in tap detection into a calculation as probability, and performed probabilistic decoding that could tolerate false detection. We analyzed the performance of the algorithm through a full-factorial simulation. Results showed that the SVM-based probabilistic touch detection together with a 2D elastic probabilistic decoding algorithm (elasticity = 2) could achieve the optimal top-5 accuracy of 94.2%. In the evaluation user study, the participants reached a single-finger typing speed of 26.1 WPM with 3.2% uncorrected word-level error rate, which was significantly better than both tap-based and gesture-based baseline techniques. Also, the proposed technique received the highest preference score from the users, proving its usability in real text entry tasks. https://dl.acm.org/doi/10.1145/35808292023XYXin Yi et al.Mid-Air Haptics (Ultrasonic)Hand Gesture RecognitionVoice User Interface (VUI) DesignUbiComp
ShadowTouch: Enabling Free-Form Touch-Based Hand-to-Surface Interaction with Wrist-Mounted Illuminant by Shadow ProjectionWe present ShadowTouch, a novel sensing method to recognize the subtle hand-to-surface touch state for independent fingers based on optical auxiliary. ShadowTouch mounts a forward-facing light source on the user's wrist to construct shadows on the surface in front of the fingers when the corresponding fingers are close to the surface. With such an optical design, the subtle vertical movements of near-surface fingers are magnified and turned to shadow features cast on the surface, which are recognizable for computer vision algorithms. To efficiently recognize the touch state of each finger, we devised a two-stage CNN-based algorithm that first extracted all the fingertip regions from each frame and then classified the touch state of each region from the cropped consecutive frames. Evaluations showed our touch state detection algorithm achieved a recognition accuracy of 99.1% and an F-1 score of 96.8% in the leave-one-out cross-user evaluation setting. We further outlined the hand-to-surface interaction space enabled by ShadowTouch's sensing capability from the aspects of touch-based interaction, stroke-based interaction, and out-of-surface information and developed four application prototypes to showcase ShadowTouch's interaction potential. The usability evaluation study showed the advantages of ShadowTouch over threshold-based techniques in aspects of lower mental demand, lower effort, lower frustration, more willing to use, easier to use, better integrity, and higher confidence.2023CLChen Liang et al.Foot & Wrist InteractionContext-Aware ComputingUIST
Sustainflatable: Harvesting, Storing and Utilizing Ambient Energy for Pneumatic Morphing InterfacesWhile the majority of pneumatic interfaces are powered and controlled by traditional electric pumps and valves, alternative sustainable energy-harnessing technology has been attracting attention. This paper presents a novel solution to this challenge with the development of the Sustainflatable system, a self-sustaining pneumatic system that can harvest renewable energy sources such as wind, water flow, moisture, and sunlight, convert the energy into compressed air, and store it for later use in a programmable and intelligent way. The system is completely electronic-free, incorporating customized energy harvesting pumps, storage units with variable volume-pressure characteristics, and tailored valves that operate autonomously. Additionally, the paper provides a design tool to guide the development of the system and includes several environmental applications to showcase its capabilities.2023QLQiuyu Lu et al.Shape-Changing Interfaces & Soft Robotic MaterialsEcological Design & Green ComputingUIST