Understanding Users' Perceptions and Expectations toward a Social Balloon Robot via an Exploratory StudyWe are witnessing a new epoch in embodied social agents. Most of the work has focused on ground or desktop robots that enjoy technical maturity and rich social channels but are often limited by terrain. Drones, which enable spatial mobility, currently face issues with safety and proximity. This paper explores a social balloon robot as a viable alternative that combines these advantages and alleviates limitations. To this end, we developed a hardware prototype named BalloonBot that integrates various devices for social functioning and a helium balloon. We conducted an exploratory lab study on users’ perceptions and expectations about its demonstrated interactions and functions. Our results show promise in using such a robot as another form of socially embodied agent. We highlight its unique mobile and approachable characteristics that harvest novel user experiences and outline factors that should be considered before its broad applications.2025CWChongyang Wang et al.Social Robot InteractionUIST
Exploring the Design of LLM-based Agent in Enhancing Self-disclosure Among the Older AdultsSocial difficulties have become an increasingly serious issue among older adults. For older adults, regular self-disclosure is essential for maintaining mental health and building close relationships. Leveraging conversational agents to encourage self-disclosure in older adults has shown increasing potential. Understanding how LLM-based agents can influence and stimulate self-disclosure across different topics is crucial for designing future agents tailored to older users. This study introduces Disclosure-Agent, an LLM-based conversational agent, and examines its impact on self-disclosure in older adults through a user study involving 20 participants, 8 topics, and two interactive interfaces equipped with Disclosure-Agent. The findings provide valuable insights into how LLM-based agents can promote self-disclosure in older adults and offer design recommendations for future elderly-oriented conversational agents.2025YGYijie Guo et al.Tsinghua University, Academy of Arts and Design; Tsinghua University, The Future LaboratoryAgent Personality & AnthropomorphismHuman-LLM CollaborationCHI
Characterizing Developers’ Linguistic Behaviors in Open Source Development across Their Social StatusesOpen Source Software (OSS) development has attracted numerous developers. As a typical complex sociotechnical system, an OSS project often forms a hierarchical social structure where a few developers are elite while the rest are non-elite. Differences in social status may result in distinct language use behaviors in interpersonal communication. Characterizing such behaviors is critical for supporting efficient and effective communication among developers with different social statuses. This study empirically compared elite and non-elite developers' language behaviors in their communication. We compiled a corpus of ~216,000 discourses collected from 20 large projects on GitHub. We investigated the linguistic differences in three aspects, namely, linguistic styles and characters, main concerns, and sentence patterns. Our findings reveal that elite and non-elite developers showed different linguistic patterns and had different concerns in their discourses. Their discourses also reflect the variation of the main focuses in the development process. Furthermore, elite and non-elite developers exhibited noticeable patterns in their linguistic behaviors in accordance with their roles and corresponding divisions of labor in the production process, no matter which semantic contexts. These findings provide implications for supporting communication that crosses social statuses in OSS development.2024YHYisi Han et al.Session 3b: Work, Non-Work, and Social TechnologiesCSCW
airTac: A Contactless Digital Tactile Receptor for Detecting Material and Roughness via Terahertz SensingZhang 等人提出 airTac 非接触式数字触觉传感器,利用太赫兹技术检测材料和表面粗糙度,为人机交互提供新途径2024ZZZhan Zhang et al.Mid-Air Haptics (Ultrasonic)Shape-Changing Interfaces & Soft Robotic MaterialsUbiComp
UHead: Driver Attention Monitoring System Using UWB RadarXu 等人提出 UHead 系统,利用超宽带雷达技术实时监测驾驶员注意力状态,提升驾驶安全。2024CXHuadong Ma et al.Human Pose & Activity RecognitionUbiComp
AirECG: Contactless Electrocardiogram for Cardiac Disease Monitoring via mmWave Sensing and Cross-domain Diffusion Model2024LZAnfu Zhou et al.Mental Health Apps & Online Support CommunitiesBiosensors & Physiological MonitoringUbiComp
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task AutomationThe emergent large language/multimodal models facilitate the evolution of mobile agents, especially in mobile UI task automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined action sequences, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device mobile UI task execution and faithful, scalable task evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with realistic mobile environments for task execution. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level application state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen, even with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 496 tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones to cover more diverse mobile applications. Evaluation results demonstrate LlamaTouch’s high faithfulness of evaluation in real-world mobile environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.2024LZLi Zhang et al.Human-LLM CollaborationUIST
MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users’ physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users’ in-the-moment app usage behaviors, physical contexts, mental states, goals & habits as input, and generates personalized and dynamic persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with its simplified version (remove mental states) and baseline techniques (fixed reminder). The results show that MindShift improves intervention acceptance rates by 4.7-22.5% and reduces smartphone usage duration by 7.4-9.8%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy scale scores. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.2024RWRuolan Wu et al.Tsinghua UniversityHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesPrivacy by Design & User ControlCHI
Time2Stop: Adaptive and Explainable Human-AI Loop for Smartphone Overuse InterventionDespite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collects user feedback to establish a human-AI loop and adapt the intervention model over time. We conducted an 8-week field experiment (N=71) to evaluate the effectiveness of both the adaptation and explanation aspects of Time2Stop. Our results indicate that our adaptive models significantly outperform the baseline methods on intervention accuracy (>32.8% relatively) and receptivity (>8.0%). In addition, incorporating explanations further enhances the effectiveness by 53.8% and 11.4% on accuracy and receptivity, respectively. Moreover, Time2Stop significantly reduces overuse, decreasing app visit frequency by 7.0∼8.9%. Our subjective data also echoed these quantitative measures. Participants preferred the adaptive interventions and rated the system highly on intervention time accuracy, effectiveness, and level of trust. We envision our work can inspire future research on JITAI systems with a human-AI loop to evolve with users.2024AOAdiba Orzikulova et al.KAISTExplainable AI (XAI)AI-Assisted Decision-Making & AutomationNotification & Interruption ManagementCHI
mmStress: Distilling Human Stress from Daily Activities via Contact-less Millimeter-wave Sensing"Long-term exposure to stress hurts human's mental and even physical health,and stress monitoring is of increasing significance in the prevention, diagnosis, and management of mental illness and chronic disease. However, current stress monitoring methods are either burdensome or intrusive, which hinders their widespread usage in practice. In this paper, we propose mmStress, a contact-less and non-intrusive solution, which adopts a millimeter-wave radar to sense a subject's activities of daily living, from which it distills human stress. mmStress is built upon the psychologically-validated relationship between human stress and "displacement activities", i.e., subjects under stress unconsciously perform fidgeting behaviors like scratching, wandering around, tapping foot, etc. Despite the conceptual simplicity, to realize mmStress, the key challenge lies in how to identify and quantify the latent displacement activities autonomously, as they are usually transitory and submerged in normal daily activities, and also exhibit high variation across different subjects. To address these challenges, we custom-design a neural network that learns human activities from both macro and micro timescales and exploits the continuity of human activities to extract features of abnormal displacement activities accurately. Moreover, we also address the unbalance stress distribution issue by incorporating a post-hoc logit adjustment procedure during model training. We prototype, deploy and evaluate mmStress in ten volunteers' apartments for over four weeks, and the results show that mmStress achieves a promising accuracy of ~80% in classifying low, medium and high stress. In particular, mmStress manifests advantages, particularly under free human movement scenarios, which advances the state-of-the-art that focuses on stress monitoring in quasi-static scenarios." https://doi.org/10.1145/36109262023KLKun Liang et al.Human Pose & Activity RecognitionSleep & Stress MonitoringBiosensors & Physiological MonitoringUbiComp
Side-lobe Can Know More: Towards Simultaneous Communication and Sensing for mmWave"Thanks to the wide bandwidth, large antenna array, and short wavelength, millimeter wave (mmWave) has superior performance in both communication and sensing. Thus, the integration of sensing and communication is a developing trend for the mmWave band. However, the directional transmission characteristics of the mmWave limits the sensing scope to a narrow sector. Existing works coordinate sensing and communication in a time-division manner, which takes advantage of the sector level sweep during the beam training interval for sensing and the data transmission interval for communication. Beam training is a low frequency (e.g., 10Hz) and low duty-cycle event, which makes it hard to track fast movement or perform continuous sensing. Such time-division designs imply that we need to strike a balance between sensing and communication, and it is hard to get the best of both worlds. In this paper, we try to solve this dilemma by exploiting side lobes for sensing. We design Sidense, where the main lobe of the transmitter is directed towards the receiver, while in the meantime, the side lobes can sense the ongoing activities in the surrounding. In this way, sensing and downlink communication work simultaneously and will not compete for hardware and radio resources. In order to compensate for the low antenna gain of side lobes, Sidense performs integration to boost the quality of sensing signals. Due to the uneven side-lobe energy, Sidense also designs a target separation scheme to tackle the mutual interference in multi-target scenarios. We implement Sidense with Sivers mmWave module. Results show that Sidense can achieve millimeter motion tracking accuracy at 6m. We also demonstrate a multi-person respiration monitoring application. As Sidense does not modify the communication procedure or the beamforming strategy, the downlink communication performance will not be sacrificed due to concurrent sensing. We believe that more fascinating applications can be implemented on this concurrent sensing and communication platform. https://dl.acm.org/doi/10.1145/3569498"2023QYQian Yang et al.V2X (Vehicle-to-Everything) Communication DesignContext-Aware ComputingUbiComp
Midas: Generating mmWave Radar Data from Videos for Training Pervasive and Privacy-preserving Human Sensing Tasks"Millimeter wave radar is a promising sensing modality for enabling pervasive and privacy-preserving human sensing. However, the lack of large-scale radar datasets limits the potential of training deep learning models to achieve generalization and robustness. To close this gap, we resort to designing a software pipeline that leverages wealthy video repositories to generate synthetic radar data, but it confronts key challenges including i) multipath reflection and attenuation of radar signals among multiple humans, ii) unconvertible generated data leading to poor generality for various applications, and iii) the class-imbalance issue of videos leading to low model stability. To this end, we design Midas to generate realistic, convertible radar data from videos via two components: (i) a data generation network (DG-Net) combines several key modules, depth prediction, human mesh fitting and multi-human reflection model, to simulate the multipath reflection and attenuation of radar signals to output convertible coarse radar data, followed by a Transformer model to generate realistic radar data; (ii) a variant Siamese network (VS-Net) selects key video clips to eliminate data redundancy for addressing the class-imbalance issue. We implement and evaluate Midas with video data from various external data sources and real-world radar data, demonstrating its great advantages over the state-of-the-art approach for both activity recognition and object detection tasks. https://dl.acm.org/doi/10.1145/3580872"2023KDKaikai Deng et al.Human Pose & Activity RecognitionContext-Aware ComputingUbiComp
Combining Smart Speaker and Smart Meter to Infer Your Residential Power Usage by Self-supervised Cross-modal Learning"Energy disaggregation is a key enabling technology for residential power usage monitoring, which benefits various applications such as carbon emission monitoring and human activity recognition. However, existing methods are difficult to balance the accuracy and usage burden (device costs, data labeling and prior knowledge). As the high penetration of smart speakers offers a low-cost way for sound-assisted residential power usage monitoring, this work aims to combine a smart speaker and a smart meter in a house to liberate the system from a high usage burden. However, it is still challenging to extract and leverage the consistent/complementary information (two types of relationships between acoustic and power features) from acoustic and power data without data labeling or prior knowledge. To this end, we design COMFORT, a cross-modality system for self-supervised power usage monitoring, including (i) a cross-modality learning component to automatically learn the consistent and complementary information, and (ii) a cross-modality inference component to utilize the consistent and complementary information. We implement and evaluate COMFORT with a self-collected dataset from six houses in 14 days, demonstrating that COMFORT finds the most appliances (98%), improves the appliance recognition performance in F-measure by at least 41.1%, and reduces the Mean Absolute Error (MAE) of energy disaggregation by at least 30.4% over other alternative solutions." https://doi.org/10.1145/36109052023GZGuanzhou Zhu et al.Context-Aware ComputingHome Energy ManagementUbiComp
DEEP: 3D Gaze Pointing in Virtual Reality Leveraging Eyelid MovementGaze-based target suffers from low input precision and target occlusion. In this paper, we explored to leverage the continuous eyelid movement to support high-efficient and occlusion-robust dwell-based gaze pointing in virtual reality. We first conducted two user studies to examine the users' eyelid movement pattern both in unintentional and intentional conditions. The results proved the feasibility of leveraging intentional eyelid movement that was distinguishable with natural movements for input. We also tested the participants' dwelling pattern for targets with different sizes and locations. Based on these results, we propose DEEP, a novel technique that enables the users to see through occlusions by controlling the aperture angle of their eyelids and dwell to select the targets with the help of a probabilistic input prediction model. Evaluation results showed that DEEP with dynamic depth and location selection incorporation significantly outperformed its static variants, as well as a naive dwelling baseline technique. Even for 100% occluded targets, it could achieve an average selection speed of 2.5s with an error rate of 2.3%.2022XYXin Yi et al.Eye Tracking & Gaze InteractionImmersion & Presence ResearchUIST
TypeOut: Leveraging Just-in-Time Self-Affirmation for Smartphone Overuse ReductionSmartphone overuse is related to a variety of issues such as lack of sleep and anxiety. We explore the application of Self-Affirmation Theory on smartphone overuse intervention in a just-in-time manner. We present \projectname{}, a just-in-time intervention technique that integrates two components: an in-situ typing-based unlock process to improve user engagement, and self-affirmation-based typing content to enhance effectiveness. We hypothesize that the integration of typing and self-affirmation content can better reduce smartphone overuse. We conducted a 10-week within-subject field experiment (N=54) and compared \projectname{} against two baselines: one only showing the self-affirmation content (a common notification-based intervention), and one only requiring typing non-semantic content (a state-of-the-art method). \projectname{} reduces app usage by over 50\%, and both app opening frequency and usage duration by over 25\%, all significantly outperforming baselines. \projectname{} can potentially be used in other domains where an intervention may benefit from integrating self-affirmation exercises with an engaging just-in-time mechanism.2022XXXuhai Xu et al.University of WashingtonMental Health Apps & Online Support CommunitiesNotification & Interruption ManagementCHI
MoveVR: Enabling Multiform Force Feedback in Virtual Reality using Household Cleaning RobotHaptic feedback can significantly enhance the realism and immersiveness of virtual reality (VR) systems. In this paper, we propose MoveVR, a technique that enables realistic, multiform force feedback in VR leveraging commonplace cleaning robots. MoveVR can generate tension, resistance, impact and material rigidity force feedback with multiple levels of force intensity and directions. This is achieved by changing the robot's moving speed, rotation, position as well as the carried proxies. We demonstrated the feasibility and effectiveness of MoveVR through interactive VR gaming. In our quantitative and qualitative evaluation studies, participants found that MoveVR provides more realistic and enjoyable user experience when compared to commercially available haptic solutions such as vibrotactile haptic systems.2020YWYuntao Wang et al.Tsinghua University & University of WashingtonForce Feedback & Pseudo-Haptic WeightHuman-Robot Collaboration (HRC)CHI
EarBuddy: Enabling On-Face Interaction via Wireless EarbudsPast research regarding on-body interaction typically requires custom sensors, limiting their scalability and generalizability. We propose EarBuddy, a real-time system that leverages the microphone in commercial wireless earbuds to detect tapping and sliding gestures near the face and ears. We develop a design space to generate 27 valid gestures and conducted a user study (N=16) to select the eight gestures that were optimal for both human preference and microphone detectability. We collected a dataset on those eight gestures (N=20) and trained deep learning models for gesture detection and classification. Our optimized classifier achieved an accuracy of 95.3%. Finally, we conducted a user study (N=12) to evaluate EarBuddy's usability. Our results show that EarBuddy can facilitate novel interaction and that users feel very positively about the system. EarBuddy provides a new eyes-free, socially acceptable input method that is compatible with commercial wireless earbuds and has the potential for scalability and generalizability2020XXXuhai Xu et al.University of Washington & Tsinghua UniversityHaptic WearablesFoot & Wrist InteractionCHI
Tessutivo: Contextual Interactions on Interactive Fabrics with Inductive SensingWe present Tessutivo, a contact-based inductive sensing technique for contextual interactions on interactive fabrics. Our technique recognizes conductive objects (mainly metallic) that are commonly found in households and workplaces, such as keys, coins, and electronic devices. We built a prototype containing six by six spiral-shaped coils made of conductive thread, sewn onto a four-layer fabric structure. We carefully designed the coil shape parameters to maximize the sensitivity based on a new inductance approximation formula. Through a ten- participant study, we evaluated the performance of our proposed sensing technique across 27 common objects. We yielded 93.9% real-time accuracy for object recognition. We conclude by presenting several applications to demonstrate the unique interactions enabled by our technique.2019JGJun Gong et al.Electronic Textiles (E-textiles)On-Skin Display & On-Skin InputUIST