A Framework for Efficient Development and Debugging of Role-Playing Agents with Large Language ModelsWe propose a framework that leverages large language models (LLMs) to semi-automate the development and debugging of role-playing agents, reducing the need for extensive manual effort. Role-playing agents powered by LLMs offer scalable solutions that enhance communication and interaction in various applications, such as employee training, healthcare, and software development. However, creating prompts manually is a time-consuming process, and sequential debugging increases the difficulty of anticipating conversation flow, resulting in increased cognitive load. Our framework addresses these challenges by generating and summarizing dialogue examples, providing a clearer overview of conversation flow and reduce mental workload. It also enhances role-playing quality by mitigating LLMs’ tendency to produce generic or vague responses. In a user study, the proposed method significantly improved perceived workload and five of the six NASA-TLX dimensions. Moreover, it can generate agents comparable to those created with expertly crafted prompts. This framework is model-agnostic, enabling integration of advancements in LLM capabilities and prompting techniques, and is applicable to diverse domains.2025HTHirohane Takagi et al.Agent Personality & AnthropomorphismHuman-LLM CollaborationAI-Assisted Creative WritingIUI
User-Guided Correction of Reconstruction Errors in Structure-from-MotionWe propose a user-guided method to correct reconstruction errors in Structure-from-Motion (SfM) processes. SfM takes a set of camera images as input and then estimates the cameras' poses and three-dimensional point clouds based on keypoint matching. However, scenes with repetitive or similar structures often result in false matches, leading to inaccuracies in camera pose estimation. While automatic methods for removing false matches exist, achieving perfect accuracy with them remains challenging. Conversely, human intervention can ensure high accuracy, but manual identification and elimination of false matches is a tedious and error-prone process. Our proposed method strikes a balance by introducing a more efficient user-guided approach. Users provide approximate camera poses, which the system then uses to detect false matches. Specifically, the system examines overlaps between view frustums of camera pairs post user adjustments, classifying pairs as false matches if no overlap is found. This method leverages the user's recollection of camera movements during scene capture to guide the reconstruction process. Evaluation with test cases and a user study confirm that our technique can efficiently remove false matches and enable accurate reconstruction of camera poses.2025SKSotaro Kanazawa et al.User Research Methods (Interviews, Surveys, Observation)Computational Methods in HCIIUI
VocabEncounter: NMT-powered Vocabulary Learning by Presenting Computer-Generated Usages of Foreign Words into Users' Daily LivesWe demonstrate that recent natural language processing (NLP) techniques introduce a new paradigm of vocabulary learning that benefits from both micro and usage-based learning by generating and presenting the usages of foreign words based on the learner's context. Then, without allocating dedicated time for studying, the user can become familiarized with how the words are used by seeing the example usages during daily activities, such as Web browsing. To achieve this, we introduce VocabEncounter, a vocabulary-learning system that suitably encapsulates the given words into materials the user is reading in near real time by leveraging recent NLP techniques. After confirming the system's human-comparable quality of generating translated phrases by involving crowdworkers, we conducted a series of user studies, which demonstrated its effectiveness on learning vocabulary and its favorable experiences. Our work shows how NLP-based generation techniques can transform our daily activities into a field for vocabulary learning.2022RARiku Arakawa et al.Carnegie Mellon UniversityGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationProgramming Education & Computational ThinkingCHI
Interactive Hyperparameter Optimization with Paintable TimelinesWe propose a method to integrate more interactivity into automatic hyperparameter optimization systems to leverage the user's prior knowledge on parameter distribution. In our method, the user continuously observes automatic optimization's progress and dynamically specifies where to search in the parameter space. We present a prototype implementation of an interactive dashboard for an optimizer to show our method's feasibility. The interactive dashboard's main feature is ``paintable timeline'' where the user can not only observe the past parameter values tested as in standard timeline but also specify the range of future parameters to be tested with simple painting operations. We show three examples where user intervention might improve the performance of automatic optimizations. We run a user study with experts and the results show that, with prior knowledge about parameter distribution of the target problem, interactive optimization can reach better results compared to fully automatic optimization.2021KHKeita Higuchi et al.Human-LLM CollaborationAutoML InterfacesDIS
Phonetroller: Visual Representations of Fingers for Precise Touch Input when using a Phone in VRSmartphone touch screens are potentially attractive for interaction in virtual reality (VR). However, the user cannot see the phone or their hands in a fully immersive VR setting, impeding their ability for precise touch input. We propose mounting a mirror above the phone screen such that the front-facing camera captures the thumbs on or near the screen. This enables the creation of semi-transparent overlays of thumb shadows and inference of fingertip hover points with deep learning, which help the user aim for targets on the phone. A study compares the effect of visual feedback on touch precision in a controlled task and qualitatively evaluates three example applications demonstrating the potential of the technique. The results show that the enabled style of feedback is effective for thumb-size targets, and that the VR experience can be enriched by using smartphones as VR controllers supporting precise touch input.2021FMFabrice Matulic et al.Preferred Networks Inc.Social & Collaborative VRImmersion & Presence ResearchCHI
PenSight: Enhanced Interaction with a Pen-Top CameraWe propose mounting a downward-facing camera above the top end of a digital tablet pen. This creates a unique and practical viewing angle for capturing the pen-holding hand and the immediate surroundings which can include the other hand. The fabrication of a prototype device is described and the enabled interaction design space is explored, including dominant and non-dominant hand pose recognition, tablet grip detection, hand gestures, capturing physical content in the environment, and detecting users and pens. A deep learning computer vision pipeline is developed for classification, regression, and keypoint detection to enable these interactions. Example applications demonstrate usage scenarios and a qualitative user evaluation confirms the potential of the approach.2020FMFabrice Matulic et al.Preferred Networks Inc.Hand Gesture RecognitionHuman Pose & Activity RecognitionPrototyping & User TestingCHI
Unimanual Pen+Touch Input Using Variations of Precision Grip PosturesWe introduce a new pen input space by forming postures with the same hand that also grips the pen while writing, drawing, or selecting. The postures contact the multitouch surface around the pen to enable detection without special sensors. A formative study investigates the effectiveness, accuracy, and comfort of 33 candidate postures in controlled tasks. The results indicate a useful subset of postures. Using raw capacitive sensor data captured in the study, a convolutional neural network is trained to recognize 10 postures in real time. This recognizer is used to create application demonstrations for pen-based document annotation and vector drawing. A small usability study shows the approach is feasible.2018DCDrini Cami et al.Hand Gesture RecognitionFull-Body Interaction & Embodied InputUIST