Conversational Agents on Your Behalf: Opportunities and Challenges of Shared Autonomy in Voice Communication for MultitaskingAdvancements in computational agents will enable them to act as surrogates for users in online communication, promising enhanced productivity by supporting multitasking. This capability may be especially powerful when combined with human control, allowing users to retain agency while achieving better performance than either human or agent alone. However, it remains unclear how people might leverage this technology to multitask effectively. We present a study with 18 dyads exploring how users employ automated responses to support an arithmetic task while staying engaged in a voice call. Participants multitasked with a conversational agent under three levels of autonomy: none, shared, and full. Our findings indicate that fully automated systems can maintain conversational engagement, enabling users to multitask effectively. Surprisingly, shared autonomy hindered this ability. Based on our results, we discuss implications for designing shared autonomy in conversations, highlighting new considerations and challenges.2025YCYi Fei Cheng et al.Carnegie Mellon University, Human-Computer Interaction InstituteConversational ChatbotsAgent Personality & AnthropomorphismCHI
Realism Drives Interpersonal Reciprocity but Yields to AI-Assisted Egocentrism in a Coordination ExperimentVirtual reality technologies that enhance realism and artificial intelligence (AI) systems that assist human behavior are increasingly interwoven in social applications. However, how these technologies might jointly influence interpersonal coordination remains unclear. We conducted an experiment with 240 participants in 120 pairs who interacted through remote-controlled robot cars in a physical space or virtual cars in a digital space, with or without autosteering assistance, using the chicken game, an established model of interpersonal coordination. We find that both realism and AI assistance help improve user performance but through opposing mechanisms. Real-world contexts enhanced communication, fostering reciprocal actions and collective benefits. In contrast, autosteering assistance diminished the need for interpersonal coordination, shifting participants’ focus towards self-interest. Notably, when combined, the egocentric effects of autosteering assistance outweighed the prosocial effects of realism. The design of HCI systems that involve social coordination will, we believe, need to take such effects into account.2025HSHirokazu Shirado et al.Carnegie Mellon University, School of Computer ScienceAutomated Driving Interface & Takeover DesignHuman-Robot Collaboration (HRC)Technology Ethics & Critical HCICHI
PiaMuscle: Improving Piano Skill Acquisition by Cost-effectively Estimating and Visualizing Activities of Miniature Hand MusclesUnderstanding neuromusculoskeletal mechanisms significantly impacts skill specialization and proficiency. While existing methods can infer large muscle activities during gross motor movements, the estimation of dexterous motor control involving miniature muscles remains underexplored. Targeting the coordinated hand muscles in advanced piano performance, we learn spatiotemporal discrete representations of electromyography (EMG) data and hand postures utilizing a multimodal dataset. Subsequently, we train a precise and cost-effective neural network model. Based on this model, PiaMuscle is introduced to investigate if visualizing muscle activities during piano training enhances piano performance. Quantitative and qualitative results of a user study with highly skilled professional pianists demonstrate that PiaMuscle provides reliable muscle activation data to support and optimize force control. Our research underscores the potential of a naturalistic workflow to estimate small muscles' activities from readily accessible human-centric information and more accurately when combined with tool-centric data, thereby enhancing skill acquisition.2025RLRuofan Liu et al.Tokyo Institute of Technology, School of Computing; Sony Computer Science Laboratories Inc.Human Pose & Activity RecognitionBiosensors & Physiological MonitoringCHI
Morphing Identity: Exploring Self-Other Identity Continuum through Interpersonal Facial Morphing ExperienceWe explored continuous changes in self-other identity by designing an interpersonal facial morphing experience where the facial images of two users are blended and then swapped over time. Both users' facial images are displayed side by side, with each user controlling their own morphing facial images, allowing us to create and investigate a multifaceted interpersonal experience. To explore this with diverse social relationships, we conducted qualitative and quantitative investigations through public exhibitions. We found that there is a window of self-identification as well as a variety of interpersonal experiences in the facial morphing process. From these insights, we synthesized a Self-Other Continuum represented by a sense of agency and facial identity. This continuum has implications in terms of the social and subjective aspects of interpersonal communication, which enables further scenario design and could complement findings from research on interactive devices for remote communication.2023KSKye Shimizu et al.Sony Computer Science Laboratories, IncIdentity & Avatars in XRInteractive Narrative & Immersive StorytellingCHI
“I am both here and there” Parallel Control of Multiple Robotic Avatars by Disabled Workers in a CafeRobotic avatars can help disabled people extend their reach in interacting with the world. Technological advances make it possible for individuals to embody multiple avatars simultaneously. However, existing studies have been limited to laboratory conditions and did not involve disabled participants. In this paper, we present a real-world implementation of a parallel control system allowing disabled workers in a café to embody multiple robotic avatars at the same time to carry out different tasks. Our data corpus comprises semi-structured interviews with workers, customer surveys, and videos of café operations. Results indicate that the system increases workers' agency, enabling them to better manage customer journeys. Parallel embodiment and transitions between avatars create multiple interaction loops where the links between disabled workers and customers remain consistent, but the intermediary avatar changes. Based on our observations, we theorize that disabled individuals possess specific competencies that increase their ability to manage multiple avatar bodies.2023GBGiulia Barbareschi et al.Keio UniversityDomestic RobotsSocial Robot InteractionRobots in Education & HealthcareCHI
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech interactionsRecognizing whispered speech and converting it to normal speech creates many possibilities for speech interaction. Because the sound pressure of whispered speech is significantly lower than that of normal speech, it can be used as a semi-silent speech interaction in public places without being audible to others. Converting whispers to normal speech also improves the speech quality for people with speech or hearing impairments. However, conventional speech conversion techniques do not provide sufficient conversion quality or require speaker-dependent datasets consisting of pairs of whispered and normal speech utterances. To address these problems, we propose WESPER, a zero-shot, real-time whisper-to-normal speech conversion mechanism based on self-supervised learning. WESPER consists of a speech-to-unit encoder, which generates hidden speech units common to both whispered and normal speech, and a unit-to-speech (UTS) decoder, which reconstructs speech from the encoded speech units. Unlike the existing methods, this conversion is user-independent and does not require a paired dataset for whispered and normal speech. The UTS decoder can reconstruct speech in any target speaker's voice from speech units, and it requires only an unlabeled target speaker's speech data. We confirmed that the quality of the speech converted from a whisper was improved while preserving its natural prosody. Additionally, we confirmed the effectiveness of the proposed approach to perform speech reconstruction for people with speech or hearing disabilities.2023JRJun RekimotoThe University of Tokyo, Sony CSL KyotoIntelligent Voice Assistants (Alexa, Siri, etc.)Voice AccessibilityCHI
Upvotes? Downvotes? No Votes? Understanding the relationship between reaction mechanisms and political discourse on RedditA significant share of political discourse occurs online on social media platforms. Policymakers and researchers try to understand the role of social media design in shaping the quality of political discourse around the globe. In the past decades, scholarship on political discourse theory has produced distinct characteristics of different types of prominent political rhetoric such as deliberative, civic, or demagogic discourse. This study investigates the relationship between social media reaction mechanisms (i.e., upvotes, downvotes) and political rhetoric in user discussions by engaging in an in-depth conceptual analysis of political discourse theory. First, we analyze 155 million user comments in 55 political subforums on Reddit between 2010 and 2018 to explore whether users' style of political discussion aligns with the essential components of deliberative, civic, and demagogic discourse. Second, we perform a quantitative study that combines confirmatory factor analysis with difference in differences models to explore whether different reaction mechanism schemes (e.g., upvotes only, upvotes and downvotes, no reaction mechanisms) correspond with political user discussion that is more or less characteristic of deliberative, civic, or demagogic discourse. We produce three main takeaways. First, despite being "ideal constructs of political rhetoric," we find that political discourse theories describe political discussions on Reddit to a large extent. Second, we find that discussions in subforums with only upvotes, or both up- and downvotes black are associated with} user discourse that is more deliberate and civic. Third, and perhaps most strikingly, social media discussions are most demagogic in subreddits with no reaction mechanisms at all. These findings offer valuable contributions for ongoing policy discussions on the relationship between social media interface design and respectful political discussion among users.2023OPOrestis Papakyriakopoulos et al.Sony AISocial Platform Design & User BehaviorContent Moderation & Platform GovernanceActivism & Political ParticipationCHI
Machine-Mediated Teaming: Mixture of Human and Machine in Physical Gaming ExperienceTechnological advancement has opened up opportunities for new sports and physical activities. We introduce a concept called {\it machine-mediated teaming}, in which a human and a surrogate machine form a team to participate in physical sports games. To understand the experience of machine-mediated teaming and the guidelines for designing the system to achieve the concept, we built a case study system based on tug-of-war. Our system is a sports game played by two against two. One team consists of a player who actually pulls the rope and another player who participates in the physical game by controlling the machine's actuators. We conducted user studies using this system to investigate the sport experience in this form and to reveal insights to inform future research on machine-mediated teaming. Based on the data obtained from the user studies, we clarified three perspectives, machine stamina, action space, and explicit feedback, that should be considered when designing future machine-mediated teaming systems. The research presented in this paper offers a first step towards exploring how humans and machines can coexist in highly dynamic physical interactions.2022AMAzumi Maekawa et al.The University of TokyoFull-Body Interaction & Embodied InputSerious & Functional GamesCHI
Preserving Agency During Electrical Muscle Stimulation Training Speeds up Reaction Time Directly After Removing EMSForce feedback devices, such as motor-based exoskeletons or wearables based on electrical muscle stimulation (EMS), have the unique potential to accelerate users’ own reaction time (RT). However, this speedup has only been explored while the device is attached to the user. In fact, very little is known regarding whether this faster reaction time still occurs after the user removes the device from their bodies–this is precisely what we investigated by means of a simple reaction time (RT) experiment, in which participants were asked to tap as soon as they saw an LED flashing. Participants experienced this in three EMS conditions: (1) fast-EMS, the electrical impulses were synced with the LED; (2) agency-EMS, the electrical impulse was delivered 40ms faster than the participant’s own RT, which prior work has shown to preserve one’s sense of agency over this movement; and, (3) late-EMS: the impulse was delivered after the participant’s own RT. Our results revealed that the participants’ RT was significantly reduced by approximately 8ms(up to 20ms) only after training with the agency-EMS condition. This finding suggests that the prioritizing agency during EMS training is key to motor-adaptation, i.e., it enables a faster motor response even after the user has removed the EMS device from their body.2021SKShunichi Kasahara et al.Sony CSL, The University of TokyoVibrotactile Feedback & Skin StimulationElectrical Muscle Stimulation (EMS)CHI
Evaluation of Machine Learning Techniques for Hand Pose Estimation on Handheld Device with Proximity SensorTracking finger movement for natural interaction using hand is commonly studied. For vision-based implementations of finger tracking in virtual reality (VR) application, finger movement is occluded by a handheld device which is necessary for auxiliary input, thus tracking finger movement using cameras is still challenging. Finger tracking controllers using capacitive proximity sensors on the surface are starting to appear. However, research on estimating articulated hand pose from curved capacitance sensing electrodes is still immature. Therefore, we built a prototype with 62 electrodes and recorded training datasets using an optical tracking system. We have introduced 2.5D representation to apply convolutional neural network methods on a capacitive image of the curved surface, and two types of network architectures based on recent achievements in the computer vision field were evaluated with our dataset. We also implemented real-time interactive applications using the prototype and demonstrated the possibility of intuitive interaction using fingers in VR applications.2020KAKazuyuki Arimatsu et al.Sony Interactive Entertainment Inc.Hand Gesture RecognitionContext-Aware ComputingCHI
Preemptive Action: Accelerating Human Reaction using Electrical Muscle Stimulation Without Compromising AgencyWe enable preemptive force-feedback systems to speed up human reaction time without fully compromising the user's sense of agency. Typically these interfaces actuate by means of electrical muscle stimulation (EMS) or mechanical actuators; they preemptively move the user to perform a task, such as to improve movement performance (e.g., EMS-assisted drumming). Unfortunately, when using preemptive force-feedback users do not feel in control and loose their sense of agency. We address this by actuating the user's body, using EMS, within a particular time window (160 ms after visual stimulus), which we found to speed up reaction time by 80 ms in our first study. With this preemptive timing, when the user and system move congruently, the user feels that they initiated the motion, yet their reaction time is faster than usual. As our second study demonstrated, this particular timing significantly increased agency when compared to the current practice in EMS-based devices. We conclude by illustrating, using examples from the HCI literature, how to leverage our findings to provide more agency to automated haptic interfaces.2019SKShunichi Kasahara et al.Sony CSL & University of TokyoForce Feedback & Pseudo-Haptic WeightElectrical Muscle Stimulation (EMS)CHI