ThingMoji: User-Captured Cut-Outs For In-Stream Visual CommunicationLive streaming has become increasingly popular, driven by the desire for direct and real-time interactions between streamers and viewers. However, current text-based interactions and pre-defined emojis limit expressiveness, especially when referring to specific stream moments. We propose ThingMoji, a type of user-captured cut-outs to enhance user expression and foster more effective communication between streamers and their audience. ThingMojis are unique digital icons created by users by capturing snapshots and annotating specific areas at any point during the stream. We developed StreamThing, a live-streaming platform integrated with ThingMojis, to explore their use during object-focused live streaming contexts. In a user study with three in-the-wild deployments reveals the expressive use of ThingMojis in diverse live-streaming scenarios with rich visual contents. Our findings show that ThingMojis enable viewers to reference specific objects, express emotions, and create shared visual narratives. Streamers found ThingMojis valuable for facilitating on-the-fly communication around visual content and fostering playful interactions. The study also uncovered challenges in ThingMoji comprehension, issues for long-term uses of ThingMojis, and potential concerns regarding misuse. Based on these insights, we discussed new opportunities for supporting object-focused communication during live streaming environments.2025EHErzhen Hu et al.Online Interactions with Friends and StrangersCSCW
Thing2Reality: Enabling Spontaneous Creation of 3D Objects from 2D Content using Generative AI in XR MeetingsDuring remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects limits spatial referencing in immersive environments. To address this, we propose Thing2Reality, an Extended Reality (XR) meeting platform that facilitates spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user studies revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.2025EHErzhen Hu et al.Social & Collaborative VRIdentity & Avatars in XRGenerative AI (Text, Image, Music, Video)UIST
DialogLab: Authoring, Simulating, and Testing Dynamic Human-AI Group ConversationsDesigning compelling multi-party conversations involving both humans and AI agents presents significant challenges, particularly in balancing scripted structure with emergent, human-like interactions. We introduce DialogLab, a prototyping toolkit for authoring, simulating, and testing hybrid human-AI dialogues. DialogLab provides a unified interface to configure conversational scenes, define agent personas, manage group structures, specify turn-taking rules, and orchestrate transitions between scripted narratives and improvisation. Crucially, DialogLab allows designers to introduce controlled deviations from the script—through configurable agents that emulate human unpredictability—to systematically probe how conversations adapt and recover. DialogLab facilitates rapid iteration and evaluation of complex, dynamic multi-party human-AI dialogues. An evaluation with both end users and domain experts demonstrates that DialogLab supports efficient iteration and structured verification, with applications in training, rehearsal, and research on social dynamics. Our findings show the value of integrating real-time, human-in-the-loop improvisation with structured scripting to support more realistic and adaptable multi-party conversation design.2025EHErzhen Hu et al.Conversational ChatbotsHuman-LLM CollaborationUIST
AR-Based Embodied Avatar Assistance for Nonspeaking Autistic People? Design and Feasibility StudyMany nonspeaking autistic individuals rely on Communication and Regulation Partners (CRPs) to develop spelling-based communication using physical letterboards, but this support is often geographically inaccessible. We developed a remote presence system using Augmented Reality (AR) to enable immersive, collaborative spelling instruction. The system features holographic letterboards and fully embodied avatars with real-time head and hand tracking, allowing remote interaction between students and CRPs. In a study with 18 nonspeaking autistic participants, 15 (83%) successfully completed avatar-supported sessions. Interaction was higher, and participants reported a preference for the avatar condition over voice-only support. These findings demonstrate the feasibility of avatar-based AR telepresence for remote communication training. The system provides a demonstration of AR-supported interaction designed with nonspeaking autistic individuals—an underrepresented group in HCI—and offers design insights for inclusive telepresence technologies that address geographic and accessibility barriers.2025TDTravis Dow et al.Identity & Avatars in XRAugmentative & Alternative Communication (AAC)DIS
Grab-and-Release Spelling in XR: A Feasibility Study for Nonspeaking Autistic People Using Video-Passthrough DevicesThis paper explores the feasibility of using video-passthrough Extended Reality (XR) devices to support communication in nonspeaking autistic individuals. Prior XR work relied on expensive AR headsets and near-hand tapping interactions. We present LetterBox, a novel application for video-passthrough XR headsets (e.g., Meta Quest) that enables spelling via a “grab-snap-release” interaction. The app includes three immersion levels and a dynamic pass-through window to maintain caregiver presence. We conducted a study with 19 participants across four North American sites. All completed a multiphase spelling task and answered open-ended questions. Despite tolerability concerns, all participants wore the headset throughout; only one requested a break. The average spelling accuracy was 90.91%. In open-spelling, 14 participants responded—often independently. Reaction time and interaction speed data highlighted the impact of visual complexity, offering insights for reducing errors. These findings suggest video-passthrough XR is well tolerated and that grab-snap-release interactions may benefit users with motor challenges.2025LALorans Alabood et al.Identity & Avatars in XRSpecial Education TechnologyDIS
Infrastructures for Inspiration: The Routine of Creative Identity Through Inspiration on the Creative InternetOnline, visual artists have more places than ever to routinely share their creative work and connect with other artists. These interactions support the routine enactment of creative identity in artists and provide inspirational opportunities for artists. As creative work shifts online, interactions between artists and routines around how these artists get inspired to do creative work are mediated by and through the logics of the online platforms where they take place. In an interview study of 22 artists, this paper explores the interplay between the development of artists' creative identities and the, at times, contradictory practices they have around getting inspired. We find platforms which support the disciplined practice of creative work while supporting spontaneous moments of inspiration, play an increasing role in passive approaches to searching for inspiration, and foster numerous small community spaces for artists to negotiate their creative identities. We discuss how platforms can better support and embed mechanisms for inspiration into their infrastructures into their design and platform policy.2025ESEllen Simpson et al.University of Virginia, School of Data ScienceCreative Collaboration & Feedback SystemsKnowledge Worker Tools & WorkflowsCHI
Understanding Attitudes and Trust of Generative AI Chatbots for Social Anxiety SupportSocial anxiety (SA) has become increasingly prevalent. Traditional coping strategies often face accessibility challenges. Generative AI (GenAI), known for their knowledgeable and conversational capabilities, are emerging as alternative tools for mental well-being. With the increased integration of GenAI, it is important to examine individuals' attitudes and trust in GenAI chatbots' support for SA. Through a mixed-method approach that involved surveys (n = 159) and interviews (n = 17), we found that individuals with severe symptoms tended to trust and embrace GenAI chatbots more readily, valuing their non-judgmental support and perceived emotional comprehension. However, those with milder symptoms prioritized technical reliability. We identified factors influencing trust, such as GenAI chatbots' ability to generate empathetic responses and its context-sensitive limitations, which were particularly important among individuals with SA. We also discuss the design implications and use of GenAI chatbots in fostering cognitive and emotional trust, with practical and design considerations.2025YWYimeng Wang et al.William & MaryConversational ChatbotsHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesCHI
Your Hands Can Tell: Detecting Redirected Hand Movements in Virtual RealityIn-air hand interactions are prevalent in Virtual Reality (VR), and prior studies have shown that manipulating the visual movement of the hand to be different from the actual hand movement, i.e., hand redirection, could create a more immersive and engaging VR experience. However, this manipulation risks degrading task performance and, if maliciously applied, poses a threat to user safety. Such manipulations may arise from VR applications developed with intentional or inadvertent perceptual manipulations that yield harmful outcomes. We advocate for a user's prerogative to be informed of any such potential manipulations before application usage. To address this, our study introduces an \textit{Autoencoder}-based anomaly detection technique that leverages users' inherent hand movements to identify hand redirection, thereby preserving the integrity of application use. Our model is trained on regular (i.e., non-manipulated) hand movement patterns and employs a stochastic thresholding approach for anomaly detection. We validated our method through a technical evaluation involving 21 participants engaged in reaching tasks under manipulated and non-manipulated scenarios. The results demonstrated a high accuracy of hand redirection detection at 93.7%, with an F1-score of 93.9%.2025MAMd Aashikur Rahman Azim et al.University of Virginia, Department of Computer ScienceVibrotactile Feedback & Skin StimulationHand Gesture RecognitionFull-Body Interaction & Embodied InputCHI
CommSense: A Wearable-Based Computational Framework for Evaluating Patient-Clinician InteractionsQuality patient-provider communication is critical to improve clinical care and patient outcomes. While progress has been made with communication skills training for clinicians, significant gaps exist in how to best monitor, measure, and evaluate the implementation of communication skills in the actual clinical setting. Advancements in ubiquitous technology and natural language processing make it possible to realize more objective, real-time assessment of clinical interactions and in turn provide more timely feedback to clinicians about their communication effectiveness. In this paper, we propose CommSense, a computational sensing framework that combines smartwatch audio and transcripts with natural language processing methods to measure selected ''best-practice'' communication metrics captured by wearable devices in the context of palliative care interactions, including understanding, empathy, presence, emotion, and clarity. We conducted a pilot study involving N=40 clinician participants, to test the technical feasibility and acceptability of CommSense in a simulated clinical setting. Our findings demonstrate that CommSense effectively captures most communication metrics and is well-received by both practicing clinicians and student trainees. Our study also highlights the potential for digital technology to enhance communication skills training for healthcare providers and students, ultimately resulting in more equitable delivery of healthcare and accessible, lower cost tools for training with the potential to improve patient outcomes.2024ZWZhiyuan Wang et al.Session 4b: Patient-Centered Care and Youth EmpowermentCSCW
Changing Your Tune: Lessons for Using Music to Encourage Physical ActivityClark 等人分析音乐干预对体育活动的影响,总结出音乐提升运动动机和表现的设计原则,为促进身体活动提供实用策略。2024MCMatthew Clark et al.Fitness Tracking & Physical Activity MonitoringMusic Composition & Sound Design ToolsUbiComp
JupyterLab in Retrograde: Contextual Notifications That Highlight Fairness and Bias Issues for Data ScientistsCurrent algorithmic fairness tools focus on auditing completed models, neglecting the potential downstream impacts of iterative decisions about cleaning data and training machine learning models. In response, we developed Retrograde, a JupyterLab environment extension for Python that generates real-time, contextual notifications for data scientists about decisions they are making regarding protected classes, proxy variables, missing data, and demographic differences in model performance. Our novel framework uses automated code analysis to trace data provenance in JupyterLab, enabling these notifications. In a between-subjects online experiment, 51 data scientists constructed loan-decision models with Retrograde providing notifications continuously throughout the process, only at the end, or never. Retrograde's notifications successfully nudged participants to account for missing data, avoid using protected classes as predictors, minimize demographic differences in model performance, and exhibit healthy skepticism about their models.2024GHGalen Harrison et al.University of Virginia, University of ChicagoGenerative AI (Text, Image, Music, Video)Explainable AI (XAI)Algorithmic Transparency & AuditabilityCHI
From Letterboards to Holograms: Advancing Assistive Technology for Nonspeaking Autistic Individuals with the HoloBoardAbout one-third of autistic individuals are nonspeaking, i.e., they cannot use speech to convey their thoughts reliably. Many in this population communicate via spelling, a process in which they point to letters on a letterboard held upright in their field of view by a trained Communication and Regulation Partner (CRP). This paper focuses on transitioning such individuals to more independent, digital spelling that requires less support from the CRP, a goal most nonspeakers we consulted with desire. To enable this transition, we followed an approach that mimics an environment familiar to the nonspeaker and that harnesses the skills they already possess from physical letterboard training. Using this approach, we developed HoloBoard, a system that allows a nonspeaker, their CRP, and others, e.g., researchers, to share a common Augmented Reality (AR) environment containing a virtual letterboard. We configured the system to offer a brief (less than 10 minutes, on average) training module with graduated spelling tasks on the virtual letterboard. In a study involving 23 participants, 16 completed the entire module. These participants were able to spell words on the virtual letterboard without the CRP holding that board, an outcome we had not expected. When offered the opportunity to continue interacting with the virtual letterboard after the training module, 14 performed more complicated tasks than we had anticipated, spelling full sentences, or even offering feedback on the HoloBoard using solely the virtual board. Furthermore, five of these participants used the system solo, i.e., with the CRP and researchers absent from the virtual environment. These results suggest that training with the HoloBoard can lay the foundation for more independent communication, providing new social and educational opportunities for this marginalized population.2024LALorans Alabood et al.University of CalgaryAR Navigation & Context AwarenessAugmentative & Alternative Communication (AAC)CHI
The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture TechnologyMotion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, and their varying attention to errors, become ingrained in motion capture design and innovation over time. Moreover, we show how contemporary motion capture systems perpetuate assumptions about human bodies and their movements. We suggest that social practices of measurement and validation are ubiquitous in the development of data- and sensor-driven systems more broadly, and provide this work as a basis for investigating hidden design assumptions and their potential negative consequences in human-computer interaction.2024EHEmma Harvey et al.Cornell UniversityHuman Pose & Activity RecognitionComputational Methods in HCICHI
PoseTron: Enabling Close-Proximity Human-Robot Collaboration Through Multi-human Motion PredictionAs robots enter human workspaces, there is a crucial need for robots to understand and predict human motion to achieve safe and fluent human-robot collaboration (HRC). However, achieving accurate human motion prediction remains a significant challenge due to the lack of large-scale datasets capturing close-proximity HRC and the lack of efficient, generalizable algorithms that can reliably predict the motion of multiple humans in human-robot teams. To address these challenges, we introduce INTERACT, a comprehensive multimodal dataset comprising 3-D Skeleton data, RGB+D data from two viewpoints, ego-view, eye-tracking, and gaze data of two participants, and robot joint data, covering both human-human and human-robot collaboration in teams. Next, to address the gap in learning algorithms to predict multi-human motion accurately, we propose PoseTron, a novel transformer-based encoder-decoder architecture that can generalize to multiple agents and utilize various data modalities. One of PoseTron’s key contributions is the novel conditional attention mechanism in the encoder, enabling efficient extraction and weighing of motion information from all agents to incorporate team dynamics. Additionally, the decoder introduces a novel multimodal attention mechanism, which weights representations from different modalities and the encoder outputs to predict future motion accurately. We extensively evaluated PoseTron by comparing its performance on human-human and human-robot collaboration scenarios from the INTERACT dataset against state-of-the-art multi-agent motion prediction methods. The results suggest that PoseTron outperformed all other methods across all the scenarios and evaluated temporal horizons. Furthermore, we conducted a comprehensive ablation study that underscores the architectural and multimodal design choices. The superior performance of PoseTron provides a promising direction to integrate motion prediction with robot perception and enable safe and effective HRC.2024MYMohammad Samin Yasar et al.Human Pose & Activity RecognitionHuman-Robot Collaboration (HRC)HRI
Personalizing an AR-based Communication System for Nonspeaking Autistic UsersNonspeaking autistic individuals ("nonspeakers") represent about one-third of the autistic population, and most are never provided with an effective alternative to speech, hindering their educational, employment, and social opportunities. Some individuals have learned to spell words and sentences by pointing to letters on a physical letterboard held vertically in their field of view by a trained human assistant. While this method is effective, nonspeakers have expressed to us a desire to transition towards a more independent communication method that relies less on a human assistant, which would provide them with more autonomy and privacy. Augmented Reality (AR) based communication systems have the potential to address this objective. For example, an AR-based communication system can lessen the reliance on a human assistant by employing a virtual letterboard that is automatically and adaptively placed in a personalized manner that considers a given user's unique motor skills and movement patterns. In this paper, we explore the use of Behavioural Cloning (BC) to derive such a personalized placement policy. Specifically, we observe finger, palm, head, and physical letterboard poses during real-life interactions between a nonspeaker and their assistant. These observations are then used to train a BC Machine Learning (ML) model that can adapt the placement of a virtual letterboard for that user within an AR environment. Results show that our approach can accurately replicate the actions of the human assistant of any given user, outperforming a non-ML baseline personalized placement policy in both positional and rotational accuracies. This work represents a foundational step toward enabling more autonomous and private communications for nonspeakers, thereby opening up new opportunities for them.2024ANAhmadreza Nazari et al.Augmentative & Alternative Communication (AAC)STEM Education & Science CommunicationSpecial Education TechnologyIUI
MMTSA: Multi-Modal Temporal Segment Attention Network for Efficient Human Activity Recognition"Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition (HAR), but introduce significantly higher computational load, which reduces efficiency. This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs) called Multimodal Temporal Segment Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a temporal and structure-preserving gray-scale image using the Gramian Angular Field (GAF), representing the inherent properties of human activities. MMTSA then applies a multimodal sparse sampling method to reduce data redundancy. Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal fusion. Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR. Results show that our method achieves superior performance improvements (11.13% of cross-subject F1-score on the MMAct dataset) than the previous state-of-the-art (SOTA) methods. The ablation study and analysis suggest that MMTSA's effectiveness in fusing multimodal data for accurate HAR. The efficiency evaluation on an edge device showed that MMTSA achieved significantly better accuracy, lower computational load, and lower inference latency than SOTA methods." https://doi.org/10.1145/36108722023ZGZiqi Gao et al.Human Pose & Activity RecognitionUbiComp
CrowdQ: Predicting the Queue State of Hospital Emergency Department Using Crowdsensing Mobility Data-Driven Models"Hospital Emergency Departments (EDs) are essential for providing emergency medical services, yet often overwhelmed due to increasing healthcare demand. Current methods for monitoring ED queue states, such as manual monitoring, video surveillance, and front-desk registration are inefficient, invasive, and delayed to provide real-time updates. To address these challenges, this paper proposes a novel framework, CrowdQ, which harnesses spatiotemporal crowdsensing data for real-time ED demand sensing, queue state modeling, and prediction. By utilizing vehicle trajectory and urban geographic environment data, CrowdQ can accurately estimate emergency visits from noisy traffic flows. Furthermore, it employs queueing theory to model the complex emergency service process with medical service data, effectively considering spatiotemporal dependencies and event context impact on ED queue states. Experiments conducted on large-scale crowdsensing urban traffic datasets and hospital information system datasets from Xiamen City demonstrate the framework's effectiveness. It achieves an F1 score of 0.93 in ED demand identification, effectively models the ED queue state of key hospitals, and reduces the error in queue state prediction by 18.5%-71.3% compared to baseline methods. CrowdQ, therefore, offers valuable alternatives for public emergency treatment information disclosure and maximized medical resource allocation." https://doi.org/10.1145/36108752023TSTieqi Shou et al.Content Moderation & Platform GovernancePublic Transit & Trip PlanningUbiComp
Detecting Social Contexts from Mobile Sensing Indicators in Virtual Interactions with Socially Anxious Individuals"Mobile sensing is a ubiquitous and useful tool to make inferences about individuals' mental health based on physiology and behavior patterns. Along with sensing features directly associated with mental health, it can be valuable to detect different features of social contexts to learn about social interaction patterns over time and across different environments. This can provide insight into diverse communities' academic, work and social lives, and their social networks. We posit that passively detecting social contexts can be particularly useful for social anxiety research, as it may ultimately help identify changes in social anxiety status and patterns of social avoidance and withdrawal. To this end, we recruited a sample of highly socially anxious undergraduate students (N=46) to examine whether we could detect the presence of experimentally manipulated virtual social contexts via wristband sensors. Using a multitask machine learning pipeline, we leveraged passively sensed biobehavioral streams to detect contexts relevant to social anxiety, including (1) whether people were in a social situation, (2) size of the social group, (3) degree of social evaluation, and (4) phase of social situation (anticipating, actively experiencing, or had just participated in an experience). Results demonstrated the feasibility of detecting most virtual social contexts, with stronger predictive accuracy when detecting whether individuals were in a social situation or not and the phase of the situation, and weaker predictive accuracy when detecting the level of social evaluation. They also indicated that sensing streams are differentially important to prediction based on the context being predicted. Our findings also provide useful information regarding design elements relevant to passive context detection, including optimal sensing duration, the utility of different sensing modalities, and the need for personalization. We discuss implications of these findings for future work on context detection (e.g., just-in-time adaptive intervention development)." https://doi.org/10.1145/36109162023ZWZhiyuan Wang et al.Human Pose & Activity RecognitionMental Health Apps & Online Support CommunitiesBiosensors & Physiological MonitoringUbiComp
Sounds of Health: Using Personalized Sonification Models to Communicate Health InformationThis paper explores the feasibility of using sonification in delivering and communicating health and wellness status on personal devices. Ambient displays have proven to inform users of their health and wellness and help them to make healthier decisions, yet, little technology provides health assessments through sounds, which can be even more pervasive than visual displays. We developed a method to generate music from user preferences and evaluated it in a two-step user study. In the first step, we acquired general healthiness impressions from each user. In the second step, we generated customized melodies from music preferences in the first step to capture participants' perceived healthiness of those melodies. We deployed our surveys for 55 participants to complete on their own over 31 days. We analyzed the data to understand commonalities and differences in users' perceptions of music as an expression of health. Our findings show the existence of clear associations between perceived healthiness and different music features. We provide useful insights into how different musical features impact the perceived healthiness of music, how perceptions of healthiness vary between users, what trends exist between users' impressions, and what influences (or does not influence) a user's perception of healthiness in a melody. Overall, our results indicate validity in presenting health data through personalized music models. The findings can inform the design of behavior management applications on personal and ubiquitous devices. https://dl.acm.org/doi/10.1145/35703462023MCMatthew Clark et al.Generative AI (Text, Image, Music, Video)Sleep & Stress MonitoringMusic Composition & Sound Design ToolsUbiComp
Robust Finger Interactions with COTS Smartwatches via Unsupervised Siamese AdaptationWearable devices like smartwatches and smart wristbands have gained substantial popularity in recent years. However, their small interfaces create inconvenience and limit computing functionality. To fill this gap, we propose ViWatch, which enables robust finger interactions under deployment variations, and relies on a single IMU sensor that is ubiquitous in COTS smartwatches. To this end, we design an unsupervised Siamese adversarial learning method. We built a real-time system on commodity smartwatches and tested it with over one hundred volunteers. Results show that the system accuracy is about 97% over a week. In addition, it is resistant to deployment variations such as different hand shapes, finger activity strengths, and smartwatch positions on the wrist. We also developed a number of mobile applications using our interactive system and conducted a user study where all participants preferred our unsupervised approach to supervised calibration. The demonstration of ViWatch is shown at https://youtu.be/N5-ggvy2qfI2023WCWenqiang Chen et al.Foot & Wrist InteractionSmartwatches & Fitness BandsBiosensors & Physiological MonitoringUIST