Thing2Reality: Enabling Spontaneous Creation of 3D Objects from 2D Content using Generative AI in XR MeetingsDuring remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects limits spatial referencing in immersive environments. To address this, we propose Thing2Reality, an Extended Reality (XR) meeting platform that facilitates spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user studies revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.2025EHErzhen Hu et al.Social & Collaborative VRIdentity & Avatars in XRGenerative AI (Text, Image, Music, Video)UIST
DialogLab: Authoring, Simulating, and Testing Dynamic Human-AI Group ConversationsDesigning compelling multi-party conversations involving both humans and AI agents presents significant challenges, particularly in balancing scripted structure with emergent, human-like interactions. We introduce DialogLab, a prototyping toolkit for authoring, simulating, and testing hybrid human-AI dialogues. DialogLab provides a unified interface to configure conversational scenes, define agent personas, manage group structures, specify turn-taking rules, and orchestrate transitions between scripted narratives and improvisation. Crucially, DialogLab allows designers to introduce controlled deviations from the script—through configurable agents that emulate human unpredictability—to systematically probe how conversations adapt and recover. DialogLab facilitates rapid iteration and evaluation of complex, dynamic multi-party human-AI dialogues. An evaluation with both end users and domain experts demonstrates that DialogLab supports efficient iteration and structured verification, with applications in training, rehearsal, and research on social dynamics. Our findings show the value of integrating real-time, human-in-the-loop improvisation with structured scripting to support more realistic and adaptable multi-party conversation design.2025EHErzhen Hu et al.Conversational ChatbotsHuman-LLM CollaborationUIST
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and ReasoningMultimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "let me know if my toddler is getting into mischief in the living room"), with the MLLM analyzing the camera feed and responding within just seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements to the model and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a 12-participant user study, users reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' own "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we describe insights into how unique characteristics of MLLMs–such as hallucinations and inconsistent responses–can impact the sensor-creation process. Together, these findings contribute to the design of future MLLM-powered sensing systems that are intuitive and customizable by everyday users.2025MLMichael Xieyang Liu et al.Eye Tracking & Gaze InteractionContext-Aware ComputingUbiquitous ComputingIUI
SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone LocalizationSpeech-to-text capabilities on mobile devices have proven helpful for hearing and speech accessibility, language translation, note-taking, and meeting transcripts. However, our foundational large-scale survey (n=263) shows that the inability to distinguish and indicate speaker direction makes them challenging in group conversations. SpeechCompass addresses this limitation through real-time, multi-microphone speech localization, where the direction of speech allows visual separation and guidance (e.g., arrows) in the user interface. We introduce efficient real-time audio localization algorithms and custom sound perception hardware, running on a low-power microcontroller with four integrated microphones, which we characterize in technical evaluations. Informed by a large-scale survey (n=494), we conducted an in-person study of group conversations with eight frequent users of mobile speech-to-text, who provided feedback on five visualization styles. The value of diarization and visualizing localization was consistent across participants, with everyone agreeing on the value and potential of directional guidance for group conversations.2025ADArtem Dementyev et al.Google Inc.Voice AccessibilityBiosensors & Physiological MonitoringCHI
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMsVisual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas.2025ZZZhongyi Zhou et al.Google; The University of TokyoHuman-LLM CollaborationAutoML InterfacesCHI
ChatDirector: Enhancing Video Conferencing with Space-Aware Scene Rendering and Speech-Driven Layout TransitionRemote video conferencing systems (RVCS) are widely adopted in personal and professional communication. However, they often lack the co-presence experience of in-person meetings. This is largely due to the absence of intuitive visual cues and clear spatial relationships among remote participants, which can lead to speech interruptions and loss of attention. This paper presents ChatDirector, a novel RVCS that overcomes these limitations by incorporating space-aware visual presence and speech-aware attention transition assistance. ChatDirector employs a real-time pipeline that converts participants' RGB video streams into 3D portrait avatars and renders them in a virtual 3D scene. We also contribute a decision tree algorithm that directs the avatar layouts and behaviors based on participants' speech states. We report on results from a user study (N=16) where we evaluated ChatDirector. The satisfactory algorithm performance and complimentary subject user feedback imply that ChatDirector significantly enhances communication efficacy and user engagement.2024XQXun Qian et al.Purdue UniversitySocial & Collaborative VRMixed Reality WorkspacesCHI
UI Mobility Control in XR: Switching UI Positionings between Static, Dynamic, and Self EntitiesExtended reality (XR) has the potential for seamless user interface (UI) transitions across people, objects, and environments. However, the design space, applications, and common practices of 3D UI transitions remain underexplored. To address this gap, we conducted a need-finding study with 11 participants, identifying and distilling a taxonomy based on three types of UI placements --- affixed to static, dynamic, or self entities. We further surveyed 113 commercial applications to understand the common practices of 3D UI mobility control, where only 6.2% of these applications allowed users to transition UI between entities. In response, we built interaction prototypes to facilitate UI transitions between entities. We report on results from a qualitative user study (N=14) on 3D UI mobility control using our FingerSwitches technique, which suggests that perceived usefulness is affected by types of entities and environments. We aspire to tackle a vital need in UI mobility within XR.2024SPSiyou Pei et al.University of California Los AngelesMixed Reality WorkspacesImmersion & Presence ResearchCHI
Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual ProgrammingIn recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows. This study helped us derive six design goals, which informed Rapsai. Rapsai features a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input.2023RDRuofei Du et al.GoogleHuman-LLM CollaborationComputational Methods in HCICHI
Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsVideo conferencing solutions like Zoom, Google Meet, and Microsoft Teams are becoming increasingly popular for facilitating conversations, and recent advancements such as live captioning help people better understand each other. We believe that the addition of visuals based on the context of conversations could further improve comprehension of complex or unfamiliar concepts. To explore the potential of such capabilities, we conducted a formative study through remote interviews (N=10) and crowdsourced a dataset of over 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that integrates with a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We present the findings from a lab study (N=26) and an in-the-wild case study (N=10), demonstrating how Visual Captions can help improve communication through visual augmentation in various scenarios.2023XLXingyu "Bruce" Liu et al.UCLAVoice User Interface (VUI) DesignDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)CHI
Hidden Interfaces for Ambient Computing: Enabling Interaction in Everyday Materials through High-brightness Visuals on Low-cost Matrix DisplaysConsumer electronics are increasingly using everyday materials to blend into home environments, often using LEDs or symbol displays under textile meshes. Our surveys (n=1499 and n=1501) show interest in interactive graphical displays for hidden interfaces --- however, covering such displays significantly limits brightness, material possibilities and legibility. To overcome these limitations, we leverage parallel rendering to enable ultrabright graphics that can pass through everyday materials. We unlock expressive hidden interfaces using rectilinear graphics on low-cost, mass-produced passive-matrix OLED displays. A technical evaluation across materials, shapes and display techniques, suggests 3.6--40X brightness increase compared to more complex active-matrix OLEDs. We present interactive prototypes that blend into wood, textile, plastic and mirrored surfaces. Survey feedback (n=1572) on our prototypes suggests that smart mirrors are particularly desirable. A lab evaluation (n=11) reinforced these findings and allowed us to also characterize performance from hands-on interaction with different content, materials and under varying lighting conditions.2022AOAlex Olwal et al.Google Inc.Context-Aware ComputingUbiquitous ComputingSmart Home Interaction DesignCHI
SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatographySpeech is inappropriate in many situations, limiting when voice control can be used. Most unvoiced speech text entry systems can not be used while on-the-go due to movement artifacts. Using a dental retainer with capacitive touch sensors, SilentSpeller tracks tongue movement, enabling users to type by spelling words without voicing. SilentSpeller achieves an average 97% character accuracy in offline isolated word testing on a 1164-word dictionary. Walking has little effect on accuracy; average offline character accuracy was roughly equivalent on 107 phrases entered while walking (97.5%) or seated (96.5%). To demonstrate extensibility, the system was tested on 100 unseen words, leading to an average 94% accuracy. Live text entry speeds for seven participants averaged 37 words per minute at 87% accuracy. Comparing silent spelling to current practice suggests that SilentSpeller may be a viable alternative for silent mobile text entry.2022NKNaoki Kimura et al.The University of TokyoElectrical Muscle Stimulation (EMS)Augmentative & Alternative Communication (AAC)CHI
ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing UsersRecent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH people and by a survey we conducted with 472 DHH participants. To evaluate ProtoSound, we characterized performance on two real-world sound datasets, showing significant improvement over state-of-the-art (e.g., +9.7% accuracy on the first dataset). We then deployed ProtoSound's end-user training and real-time recognition through a mobile application and recruited 19 hearing participants who listened to the real-world sounds and rated the accuracy across 56 locations (e.g., homes, restaurants, parks). Results show that ProtoSound personalized the model on-device in real-time and accurately learned sounds across diverse acoustic contexts. We close by discussing open challenges in personalizable sound recognition, including the need for better recording interfaces and algorithmic improvements.2022DJDhruv Jain et al.University of Washington, GoogleDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)Motor Impairment Assistive Input TechnologiesCHI
Haptics with Input: Back-EMF in Linear Resonant Actuators to Enable Touch, Pressure and Environmental AwarenessToday’s wearable and mobile devices typically use separate hardware components for sensing and actuation. In this work, we introduce new opportunities for the Linear Resonant Actuator (LRA), which is ubiquitous in such devices due to its capability for providing rich haptic feedback. By leveraging strategies to enable active and passive sensing capabilities with LRAs, we demonstrate their benefits and potential as self-contained I/O devices. Specifically, we use the back-EMF voltage to classify if the LRA is tapped, touched, as well as how much pressure is being applied. The back-EMF sensing is already integrated into many motor and LRA drivers. We developed a passive low-power tap sensing method that uses just 37.7 µA. Furthermore, we developed active touch and pressure sensing, which is low-power, quiet (2 dB), and minimizes vibration. The sensing method works with many types of LRAs. We show applications, such as pressure-sensing side-buttons on a mobile phone. We have also implemented our technique directly on an existing mobile phone’s LRA to detect if the phone is handheld or placed on a soft or hard surface. Finally, we show that this method can be used for haptic devices to determine if the LRA makes good contact with the skin. Our approach can add rich sensing capabilities to the ubiquitous LRA actuators without requiring additional sensors or hardware.2020ADArtem Dementyev et al.In-Vehicle Haptic, Audio & Multimodal FeedbackVibrotactile Feedback & Skin StimulationUIST
Wearable Subtitles: Augmenting Spoken Communication with Lightweight Eyewear for All-day CaptioningMobile solutions can help transform speech and sound into visual representations for people who are deaf or hard-ofhearing (DHH). However, where handheld phones present challenges, head-worn displays (HWDs) could further communication through privately transcribed text, handsfree use, improved mobility, and socially acceptable interactions. In this work, we introduce a lightweight HWD system specifically designed to augment communication through sound transcription for a full workday. Through a hybrid, low-power architecture, we enable up to 15 hours of continuous use. We describe a large survey (n=501) and three user studies with 24 deaf/hard-of-hearing participants which inform our development and help us refine our prototypes. Our studies and prior research identify critical challenges for the adoption of HWD systems which we address through extended battery life, lightweight and balanced mechanical design (54 g), fitting options, and form factors that are compatible with current social norms.2020AOAlex Olwal et al.In-Vehicle Haptic, Audio & Multimodal FeedbackEye Tracking & Gaze InteractionVoice User Interface (VUI) DesignUIST
E-Textile Microinteractions: Augmenting Twist with Flick, Slide and Grasp Gestures for Soft ElectronicsE-textile microinteractions advance cord-based interfaces by enabling the simultaneous use of precise continuous control and casual discrete gestures. We leverage the recently introduced I/O Braid sensing architecture to enable a series of user studies and experiments which help design suitable interactions and a real-time gesture recognition pipeline. Informed by a gesture elicitation study with 36 participants, we developed a user-dependent classifier for eight discrete gestures with 94% accuracy for 12 participants. In a formal evaluation we show that we can enable precise manipulation with the same architecture. Our quantitative targeting experiment suggests that twisting is faster than existing headphone button controls and is comparable in speed to a capacitive touch surface. Qualitative interview feedback indicates a preference for I/O Braid's interaction over that of in-line headphone controls. Our applications demonstrate how continuous and discrete gestures can be combined to form new, integrated e-textile microinteraction techniques for real-time continuous control, discrete actions and mode switching.2020AOAlex Olwal et al.Google ResearchHaptic WearablesHand Gesture RecognitionElectronic Textiles (E-textiles)CHI
SensorSnaps: Integrating Wireless Sensor Nodes into Fabric Snap Fasteners for Textile InterfacesAdding electronics to textiles can be time-consuming and requires technical expertise. We introduce SensorSnaps, low-power wireless sensor nodes that seamlessly integrate into caps of fabric snap fasteners. SensorSnaps provide a new technique to quickly and intuitively augment any location on the clothing with sensors. SensorSnaps securely attach and detach from ubiquitous commercial snap fasteners. With inertial measurement units, the SensorSnaps detect tap and rotation gestures, as well as body motion tracking. We optimized the power consumption for SensorSnaps to work continuously for 45 minutes and up to 4 hours in capacitive touch standby mode. We present applications where SensorSnaps are used as gesture interfaces for a music player controller, chorded keyboard, cursor control, and motion tracking suit. The user study showed that SensorSnap could be attached in around 71 seconds, similar to attaching off-the-shelf snaps and participants found the gestures easy to learn and perform. SensorSnaps will allow anyone to add sophisticated sensing capacities to ubiquitous snap fasteners.2019ADArtem Dementyev et al.Haptic WearablesShape-Changing Interfaces & Soft Robotic MaterialsOn-Skin Display & On-Skin InputUIST
I/O Braid: Scalable Touch-Sensitive Lighted Cords Using Spiraling, Repeating Sensing Textiles and Fiber OpticsWe introduce I/O Braid, an interactive textile cord with embedded sensing and visual feedback. I/O Braid senses proximity, touch, and twist through a spiraling, repeating braiding topology of touch matrices. This sensing topology is uniquely scalable, requiring only a few sensing lines to cover the whole length of a cord. The same topology allows us to embed fiber optic strands to integrate co-located visual feedback. We provide an overview of the enabling braiding techniques, design considerations, and approaches to gesture detection. These allow us to derive a set of interaction techniques, which we demonstrate with different form factors and capabilities. Our applications illustrate how I/O Braid can invisibly augment everyday objects, such as touch-sensitive headphones and interactive drawstrings on garments, while enabling discoverability and feedback through embedded light sources.2018AOAlex Olwal et al.Haptic WearablesShape-Changing Interfaces & Soft Robotic MaterialsUIST