Outline and Detail: A Semantic-Driven Framework for Layered 2D Character Generation2D cartoon-style digital characters represent an important art form in games, animation, and virtual live streaming. However, traditional 2D character creation workflows involve tedious manual layering, complex skeleton rigging, and professional animation skills, posing challenges for independent studios and non-professional users. While existing AI generation technologies can quickly create visual content, they typically produce non-layered, difficult-to-edit composite images that cannot be integrated into current workflows. This paper presents Spiritus, a semantic-driven 2D character generation framework. Unlike existing text-based AI animation workflows, Spiritus integrates mixed text and sketch inputs, achieving character image generation and automatic component segmentation through an open mask library and semantic matching. We validated the system's effectiveness in character generation freedom, character animation quality, and technical barrier reduction through comparative evaluation of user experiment results. Finally, we explored the possibilities of applying generated characters to various workflows and scenarios, including game development, animation production, and interactive illustrations.2025QSQirui Sun et al.Music Composition & Sound Design ToolsVideo Production & Editing3D Modeling & AnimationUIST
Conversational Explanations: Discussing Explainable AI with Non-AI ExpertsExplainable AI (XAI) aims to provide insights into the decisions made by AI models. To date, most XAI approaches provide only one-time, static explanations, which cannot cater to users' diverse knowledge levels and information needs. Conversational explanations have been proposed as an effective method to customize XAI explanations. However, building conversational explanation systems is hindered by the scarcity of training data. Training with synthetic data faces two main challenges: lack of data diversity and hallucination in the generated data. To alleviate these issues, we introduce a repetition penalty to promote data diversity and exploit a hallucination detector to filter out untruthful synthetic conversation turns. We conducted both automatic and human evaluations on the proposed system, fEw-shot Multi-round ConvErsational Explanation (EMCEE). For automatic evaluation, EMCEE achieves relative improvements of 81.6% in BLEU and 80.5% in ROUGE compared to the baselines. EMCEE also mitigates the degeneration of data quality caused by training on synthetic data. In human evaluations (N=60), EMCEE outperforms baseline models and the control group in improving users' comprehension, acceptance, trust, and collaboration with static explanations by large margins. Through a fine-grained analysis of model responses, we further demonstrate that training on self-generated synthetic data improves the model’s ability to generate more truthful and understandable answers, leading to better user interactions. To the best of our knowledge, this is the first conversational explanation method that can answer free-form user questions following static explanations.2025TZTong Zhang et al.Explainable AI (XAI)IUI
AppAgent: Multimodal Agents as Smartphone UsersRecent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework allows the agent to mimic human-like interactions such as tapping and swiping through a simplified action space, eliminating the need for system back-end access and enhancing its versatility across various apps. Central to the agent's functionality is an innovative in-context learning method, where it either autonomously explores or learns from human demonstrations, creating a knowledge base used to execute complex tasks across diverse applications. We conducted extensive testing with our agent on over 50 tasks spanning 10 applications, ranging from social media to sophisticated image editing tools. Additionally, a user study confirmed the agent's superior performance and practicality in handling a diverse array of high-level tasks, demonstrating its effectiveness in real-world settings. Our project page is available at \url{https://appagent-official.github.io/}.2025CZChi Zhang et al.Westlake University, School of EngineeringHuman-LLM CollaborationCHI
Bridging Simulation and Reality: Augmented Virtuality for Mass Casualty Triage Training - From Landscape Analysis to Empirical InsightsLive drills are the gold standard for mass casualty incident (MCI) training but are often too resource-intensive for widespread implementation. Immersive technologies offer a promising alternative, but can they deliver comparable fidelity and effectiveness? Working with a local disaster response academy, this paper investigated the potential of Augmented Virtuality (AV) in MCI training through two phases. First, we conducted a landscape analysis of 126 papers across the virtuality continuum, revealing trends in population, training focus, and evaluation metrics. Second, we empirically evaluated an AV system for mass casualty triage training against traditional role-playing and Virtual Reality (VR) approaches, involving 60 trainees in an operational curriculum. Results indicated that both AV and VR surpassed traditional simulations, with AV's tactile integration significantly enhancing physical engagement, satisfaction, and triage accuracy. Through the lens of triage, we discussed the broader practical implications of integrating immersive technologies like AV into real-world MCI education.2025YCYang Chen et al.National University of Singapore, College of Design and EngineeringSocial & Collaborative VRVR Medical Training & RehabilitationCHI
Exploring the Design of Human Speech Indicators to Enhance Waiting Experience in Voice User InterfaceWaiting for system loading is a common scenario that often diminishes user experience, leading to dissatisfaction. Well-established visual indicators like progress bars can not directly apply to the interactions with voice assistants (VAs) like Siri. As VAs continue to rise in popularity, this research aims to explore the design of auditory indicators, particularly human speech, for optimizing waiting experiences in Voice User Interfaces (VUIs). We first organized focus groups (N=35) to identify design considerations for speech indicators, uncovering design opportunities in integrating explanations and humor. Subsequently, we conducted an empirical study (N=30) to evaluate the effects of speech indicators with two levels of explanation and humor on the waiting experience, measured by attention, perceived time, pleasure, and overall satisfaction, during both short and long loading durations. Our findings suggest significant potential for incorporating explanations and humor into VUIs, offering actionable insights for designing effective speech indicators that improve waiting experiences.2025WLWenan Li et al.Information Hub, The Hong Kong University of Science and Technology (GuangZhou)Voice User Interface (VUI) DesignAgent Personality & AnthropomorphismCHI
Enhancing UX Evaluation Through Collaboration with Conversational AI Assistants: Effects of Proactive Dialogue and TimingUsability testing is vital for enhancing the user experience (UX) of interactive systems. However, analyzing test videos is complex and resource-intensive. Recent AI advancements have spurred exploration into human-AI collaboration for UX analysis, particularly through natural language. Unlike user-initiated dialogue, our study investigated the potential of proactive conversational assistants to aid UX evaluators through automatic suggestions at three distinct times: before, in sync with, and after potential usability problems. We conducted a hybrid Wizard-of-Oz study involving 24 UX evaluators, using ChatGPT to generate automatic problem suggestions and a human actor to respond to impromptu questions. While timing did not significantly impact analytic performance, suggestions appearing after potential problems were preferred, enhancing trust and efficiency. Participants found the automatic suggestions useful, but they collectively identified more than twice as many problems, underscoring the irreplaceable role of human expertise. Our findings also offer insights into future human-AI collaborative tools for UX evaluation.2024EKEmily Kuang et al.Rochester Institute of TechnologyHuman-LLM CollaborationPrototyping & User TestingComputational Methods in HCICHI
Improving Attention Using Wearables via Haptic and Multimodal Rhythmic StimuliRhythmic light, sound and haptic stimuli can improve cognition through neural entrainment and by modifying autonomic nervous system function. However, the effects and user experience of using wearables for inducing such rhythmic stimuli have been under-investigated. We conducted a study with 20 participants to understand the effects of rhythmic stimulation wearables on attention. We found that combined sound and light stimuli from a glasses device provided the strongest improvement to attention but were the least usable and socially acceptable. Haptic vibration stimuli from a wristband also improved attention and were the most usable and socially acceptable. Our field study (N=12) with haptic stimuli from a smartwatch showed that such systems can be easy to use and were used frequently in a range of contexts but more exploration is needed to improve the comfort. Our work contributes to developing future wearables to support attention and cognition.2024NWNathan W Whitmore et al.Massachusetts Institute of TechnologyVibrotactile Feedback & Skin StimulationHaptic WearablesFoot & Wrist InteractionCHI
SoK: An Exhaustive Taxonomy of Display Issues for Mobile ApplicationsDisplay issues, often arising from design inconsistencies or software problems, can have a significant impact on both user experience and system functionality. This study focuses on three primary challenges in the field of display issues: the absence of a standardized classification system, the limitations of existing detection tools, and the inadequacy of available data. To systematically address these challenges, we introduce a Comprehensive Display Issue Analysis Framework (DIS). Utilizing this framework, we construct a comprehensive and industry-validated taxonomy for display issues. When evaluating the capabilities of existing detection tools and the completeness of available data against this taxonomy, we find that current mainstream tools can identify only 77\% of the cataloged display issues. This finding suggests that, although the field has received some attention from the industry, there is still room for further improvement and research. This study not only deepens our understanding of the classification of display issues and the capabilities of detection tools, but also provides valuable insights for future research and applications in this domain.2024LNLiming Nie et al.Interactive Data VisualizationPrototyping & User TestingIUI
AI and Disaster Risk: A Practitioner PerspectiveEmerging techniques developed by AI researchers promise to offer the capacity to support disaster risk management (DRM), through making data collection or analysis practices faster, less costly, or more accurate. However, in every socially consequential domain in which AI tools have been applied, these technologies have been demonstrated to have some degree of negative consequences. This paper explores an attempt to convene technical experts in the area of DRM to discuss potential negative impacts, their approaches toward mitigating these impacts as well as identifying some of the overarching challenges. In doing so, we contribute new findings about a domain that has received relatively little attention from critical and ethical AI researchers, and the opportunities and limitations that are presented by working with domain experts to evaluate the social consequences of emerging technologies.2022AMAparna Moitra et al.Disaster Response; Disaster ResponseCSCW
Becoming Interdisciplinary: Fostering Critical Engagement With Disaster DataInformation systems such as mapping platforms, algorithms, and databases are a central component of how society responds to the threats posed by disasters. However, these systems has come under increasing criticism in recent years for prioritizing technical disciplines over insights from the humanities and social science and failing to adequately incorporate the perspectives of at-risk or affected communities. This paper describes a unique month-long workshop that convened interdisciplinary experts to collaborate on a projects related to flood data. In addition to findings about the practical accomplishment of interdisciplinary collaboration, we offer three interrelated contributions. First, we position interdisciplinarity as a critical practice and offer a detailed example of how we staged this process. We then discuss the benefits to interdisciplinarity of expanding the range of temporal logics normally deployed in design workshops. Finally, we reflect on approaches to evaluating the event’s contributions toward sustained critique and reform of expert practice.2021RSRobert Soden et al.Data Work Across Contexts and DisciplinesCSCW
Real Differences between OT and CRDT in Correctness and Complexity for Consistency Maintenance in Co-EditorsOT (Operational Transformation) was invented for supporting real-time co-editors in the late 1980s and has evolved to become core techniques widely used in today’s working co-editors and adopted in industrial products. CRDT (Commutative Replicated Data Type) for co-editors was first proposed around 2006, under the name of WOOT (WithOut Operational Transformation). Follow-up CRDT variations are commonly labeled as "post-OT" techniques capable of making concurrent operations natively commutative in co-editors. On top of that, CRDT solutions have made broad claims of superiority over OT solutions, and often portrayed OT as an incorrect and inefficient technique. Over one decade later, however, CRDT is rarely found in working co-editors; OT remains the choice for building the vast majority of today’s co-editors. Contradictions between the reality and CRDT’s purported advantages have been the source of much confusion and debate in co-editing researcher and developer communities. To seek truth from facts, we set out to conduct a comprehensive and critical review on representative OT and CRDT solutions and working co-editors based on them. From this work, we have made important discoveries about OT and CRDT, and revealed facts and evidences that refute CRDT claims over OT on all accounts. These discoveries help explain the underlying reasons for the choice between OT and CRDT in the real world. We report these results in a series of three articles. In this article (the second in the series), we reveal the differences between OT and CRDT in their basic approaches to realizing the same general transformation and how such differences had resulted in different technical challenges and consequential correctness and complexity issues. Moreover, we reveal hidden complexity and algorithmic flaws with representative CRDT solutions, and discuss common myths and facts related to correctness and complexity of OT and CRDT. We hope the discoveries from this work help clear up common myths and confusions surrounding OT and CRDT, and accelerate progress in co-editing technology for real world applications.2020DSDavid Q. Sun et al.Collaboration: Creating and Writing TogetherCSCW
The Disaster and Climate Change Artathon: Staging Art/Science Collaborations in Crisis InformaticsInformation systems increasingly shape our knowledge of crises such as disasters and climate change. While these tools improve our capacity to understand, prepare for, and mitigate such challenges, critical questions are being raised about how their design shapes public imagination of these problems and delimits potential solutions. Prior work in human-computer interaction (HCI) has pointed to art/science collaboration as one approach for helping to explore such questions. As an attempt to draw on this potential, our team designed and facilitated a 2-day “artathon” that brought together artists and scientists to create new works of art based on disaster and climate data. Reflecting on the artathon and its outcomes, we contribute two sets of findings. First, we articulate opportunities, suggested by the artwork, for expanding research and design in crisis informatics. Second, we offer suggestions for HCI researchers seeking to stage successful art/science collaborations or similar inter-disciplinary events.2020RSRobert Soden et al.Sustainable HCIClimate Change Communication ToolsHuman-Nature Relationships (More-than-Human Design)DIS
Why do people watch others eat food? An Empirical Study on the Motivations and Practices of Mukbang ViewersWe present a mixed-methods study of viewers on their practices and motivations around watching mukbang video streams of people eating large quantities of food. Viewers' experiences provide insight on future technologies for multisensorial video streams and technology-supported commensality (eating with others). We surveyed 104 viewers and interviewed 15 of them about their attitudes and reflections on their mukbang viewing habits, their physiological aspects of watching someone eat, and their perceived social relationship with mukbangers. Based on our findings, we propose design implications for remote commensality, and for synchronized multisensorial video streaming content.2020LALaurensia Anjani et al.Nanyang Technological UniversityLive Streaming & Spectating ExperienceLive Streaming & Content CreatorsCHI
By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A SitesCommunity edits to questions and answers (called post edits) plays an important role in improving content quality in Stack Overflow. Our study of post edits in Stack Overflow shows that a large number of edits are about formatting, grammar and spelling. These post edits usually involve small-scale sentence edits and our survey of trusted contributors suggests that most of them care much or very much about such small sentence edits. To assist users in making small sentence edits, we develop an edit-assistance tool for identifying minor textual issues in posts and recommending sentence edits for correction. We formulate the sentence editing task as a machine translation problem, in which an original sentence is ``translated'' into an edited sentence. Our tool implements a character-level Recurrent Neural Network (RNN) encoder-decoder model, trained with about 6.8 millions original-edited sentence pairs from Stack Overflow post edits. We evaluate our edit assistance tool using a large-scale archival post edits, a field study of assisting a novice post editor, and a survey of trusted contributors. Our evaluation demonstrates the feasibility of training a deep learning model with post edits by the community and then using the trained model to assist post editing for the community.2018CCChunyang Chen et al.Collaboration in Online CommunitiesCSCW
Design Vocabulary for Human–IoT Systems CommunicationDigital devices and intelligent systems are becoming popular and ubiquitous all around us. However, they seldom provide sufficient feed-forwards and feedbacks to reassure users as to their current status and indicate what actions they are about to perform. In this study, we selected and analyzed nine concept videos on future IoT products/systems. Through systematic analysis of the interactions and communications of users with the machines and systems demonstrated in the films, we extracted 38 design vocabulary items and clustered them into 12 groups: Active, Request, Trigger functions, Approve, Reject, Notify, Recommend, Guide, Show problems, Express emotions, Exchange info, and Socialize. This framework can not only inspire designers to create self-explanatory intelligence, but also support developers to provide a language structure at different levels of the periphery of human attention. Through the enhancement of situated awareness, human–IoT system interaction can become more seamless and graceful.2018YCYaliang Chuang et al.Eindhoven University of TechnologyIoT Device PrivacyContext-Aware ComputingUbiquitous ComputingCHI