PlanFitting: Personalized Exercise Planning with Large Language Model-driven Conversational AgentCreating personalized and actionable exercise plans often requires iteration with experts, which can be costly and inaccessible to many individuals. This work explores the capabilities of Large Language Models (LLMs) in addressing these challenges. We present PlanFitting, an LLM-driven conversational agent that assists users in creating and refining personalized weekly exercise plans. By engaging users in free-form conversations, PlanFitting helps elicit users’ goals, availabilities, and potential obstacles, and enables individuals to generate personalized exercise plans aligned with established exercise guidelines. Our study—involving a user study, intrinsic evaluation, and expert evaluation—demonstrated PlanFitting’s ability to guide users to create tailored, actionable, and evidence-based plans. We discuss future design opportunities for LLM-driven conversational agents to create plans that better comply with exercise principles and accommodate personal constraints.2025DSDonghoon Shin et al.Human-LLM CollaborationFitness Tracking & Physical Activity MonitoringCUI
ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signingd/Deaf and hearing song-signers have become prevalent across video-sharing platforms, but translating songs into sign language remains cumbersome and inaccessible. Our formative study revealed the challenges song-signers face, including semantic, syntactic, expressive, and rhythmic considerations in translations. We present ELMI, an accessible song-signing tool that assists in translating lyrics into sign language. ELMI enables users to edit glosses line-by-line, with real-time synced lyric and music video snippets. Users can also chat with a large language model-driven AI to discuss meaning, glossing, emoting, and timing. Through an exploratory study with 13 song-signers, we examined how ELMI facilitates their workflows and how song-signers leverage and receive an LLM-driven chat for translation. Participants successfully adopted ELMI to song-signing, with active discussions throughout. They also reported improved confidence and independence in their translations, finding ELMI encouraging, constructive, and informative. We discuss research and design implications for accessible and culturally sensitive song-signing translation tools.2025SYSuhyeon Yoo et al.University of Toronto, Computer ScienceVoice User Interface (VUI) DesignConversational ChatbotsVoice AccessibilityCHI
ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language ModelsExpressing stressful experiences in words is proven to improve mental and physical health, but individuals often disengage with writing interventions as they struggle to organize their thoughts and emotions. Reflective prompts have been used to provide direction, and large language models (LLMs) have demonstrated the potential to provide tailored guidance. However, current systems often limit users' flexibility to direct their reflections. We thus present ExploreSelf, an LLM-driven application designed to empower users to control their reflective journey, providing adaptive support through dynamically generated questions. Through an exploratory study with 19 participants, we examine how participants explore and reflect on personal challenges using ExploreSelf. Our findings demonstrate that participants valued the flexible navigation of adaptive guidance to control their reflective journey, leading to deeper engagement and insight. Building on our findings, we discuss the implications of designing LLM-driven tools that facilitate user-driven and effective reflection of personal challenges.2025ISInhwa Song et al.KAIST, School of ComputingHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesCHI
A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle SexismIn subjective decision-making, where decisions are based on contextual interpretation, Large Language Models (LLMs) can be integrated to present users with additional rationales to consider. The diversity of these rationales is mediated by the ability to consider the perspectives of different social actors; however, it remains unclear whether and how models differ in the distribution of perspectives they provide. We compare the perspectives taken by humans and different LLMs when assessing subtle sexism scenarios. We show that these perspectives can be classified within a finite set (perpetrator, victim, decision-maker), consistently present in argumentations produced by humans and LLMs, but in different distributions and combinations, demonstrating differences and similarities with human responses, and between models. We argue for the need to systematically evaluate LLMs’ perspective-taking to identify the most suitable models for a given decision-making task. We discuss the implications for model evaluation.2025PAPaula Akemi Aoyagui et al.University of Toronto, Faculty of InformationHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasCHI
Making the Write Connections: Linking Writing Support Tools with Writer NeedsThis work sheds light on whether and how creative writers' needs are met by existing research and commercial writing support tools (WST). We conducted a need finding study to gain insight into the writers' process during creative writing through a qualitative analysis of the response from an online questionnaire and Reddit discussions on \textit{r/Writing}. Using a systematic analysis of 115 tools and 67 research papers, we map out the landscape of how digital tools facilitate the writing process. Our triangulation of data reveals that research predominantly focuses on the writing activity and overlooks pre-writing activities and the importance of visualization. We distill 10 key takeaways to inform future research on WST and point to opportunities surrounding underexplored areas. Our work offers a holistic and up-to-date account of how tools have transformed the writing process, guiding the design of future tools that address writers' evolving and unmet needs.2025ZZZixin Zhao et al.University of Toronto, Department of Computer ScienceAI-Assisted Creative WritingCHI
Understanding Public Agencies' Expectations and Realities of AI-Driven Chatbots for Public Health MonitoringAdvances in artificial intelligence (AI) offer the potential for chatbots to support public health monitoring by automating tasks traditionally performed by frontline workers. While introducing AI impacts public agency workers across decision-making, administration, and monitoring roles, the perceptions of workers regarding these technologies and their actual impact on labor are underexplored. We examine the case of CareCall, a large language model (LLM)-driven chatbot used to monitor socially isolated individuals, by interviewing 21 public agency workers across 13 sites involved in its adoption and rollout. We find that CareCall helped expand public reach but increased burdens on frontline workers due to insufficient resources and new labor demands, such as handling lapses in user engagement. We discuss how implementing LLM-driven chatbots in public health contexts can complicate decision-makers' articulation work and impose additional maintenance work on frontline workers. We recommend AI chatbots in this space leverage public infrastructure and incorporate fallback mechanisms.2025EJEunkyung Jo et al.University of California, IrvineHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesActivism & Political ParticipationCHI
Enhancing Pediatric Communication: The Role of an AI-Driven Chatbot in Facilitating Child-Parent-Provider InteractionCommunication with child patients is challenging due to their developing ability to express emotions and symptoms. Additionally, healthcare providers often have limited time to offer resources to parents. By leveraging AI to facilitate free-form conversations, our study aims to design an AI-driven chatbot to bridge these gaps in child-parent-provider communication. We conducted two studies: 1) design sessions with 12 children with cancer and their parents, which informed the development of our chatbot, ARCH, and 2) an interview study with 15 pediatric care experts to identify potential challenges and refine ARCH's role in pediatric communication. Our findings highlight three key roles for ARCH: providing an expressive outlet for children, offering reassurance to parents, and serving as an assessment tool for providers. We conclude by discussing design considerations for AI-driven chatbots in pediatric communication, such as creating communication spaces, balancing the expectations of children and parents, and addressing potential cultural differences.2025WSWoosuk Seo et al.University of Michigan, School of InformationConversational ChatbotsCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Mental Health Apps & Online Support CommunitiesCHI
AACessTalk: Fostering Communication between Minimally Verbal Autistic Children and Parents with Contextual Guidance and Card RecommendationAs minimally verbal autistic (MVA) children communicate with parents through few words and nonverbal cues, parents often struggle to encourage their children to express subtle emotions and needs and to grasp their nuanced signals. We present AACessTalk, a tablet-based, AI-mediated communication system that facilitates meaningful exchanges between an MVA child and a parent. AACessTalk provides real-time guides to the parent to engage the child in conversation and, in turn, recommends contextual vocabulary cards to the child. Through a two-week deployment study with 11 MVA child-parent dyads, we examine how AACessTalk fosters everyday conversation practice and mutual engagement. Our findings show high engagement from all dyads, leading to increased frequency of conversation and turn-taking. AACessTalk also encouraged parents to explore their own interaction strategies and empowered the children to have more agency in communication. We discuss the implications of designing technologies for balanced communication dynamics in parent-MVA child interaction.2025DCDasom Choi et al.KAIST, Department of Industrial DesignCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Augmentative & Alternative Communication (AAC)CHI
Textoshop: Interactions Inspired by Drawing Software to Facilitate Text EditingWe explore how interactions inspired by drawing software can help edit text. Making an analogy between visual and text editing, we consider words as pixels, sentences as regions, and tones as colours. For instance, direct manipulations move, shorten, expand, and reorder text; tools change number, tense, and grammar; colours map to tones explored along three dimensions in a tone picker; and layers help organize and version text. This analogy also leads to new workflows, such as boolean operations on text fragments to construct more elaborated text. A study shows participants were more successful at editing text and preferred using the proposed interface over existing solutions. Broadly, our work highlights the potential of interaction analogies to rethink existing workflows, while capitalizing on familiar features.2025DMDamien Masson et al.University of Toronto, Department of Computer ScienceHuman-LLM CollaborationAI-Assisted Creative WritingPrototyping & User TestingCHI
The Explanation That Hits Home: The Characteristics of Verbal Explanations That Affect Human Perception in Subjective Decision-MakingHuman-AI collaborative decision-making can achieve better outcomes than either party individually. The success of this collaboration can depend on whether the human decision-maker perceives the AI contribution as beneficial to the decision-making process. Beneficial AI explanations are often described as relevant, convincing, and trustworthy. Yet, we know little about the characteristics of explanations that result in these perceptions. Focusing on collaborative subjective decision-making, using the context of subtle sexism, where explanations can surface new interpretations, we conducted a user study (N=20) to explore the structural and content characteristics that affect perceptions of human- and AI-generated verbal (text and audio) explanations. We find four groups of characteristics (Tone, Grammatical Elements, Argumentative Sophistication and Relation to User), and that the effect of these characteristics on the perception of explanations for subtle sexism depends on the perceived author. Thus, we also identify which explanation characteristics participants use to identify the author of an explanation. Demonstrating the relationship between these characteristics and explanation perceptions, we present a categorized set of characteristics that system builders can leverage to produce the appropriate perception of an explanation for various sensitive contexts. We also highlight human perception biases and associated issues resulting from these perceptions2024SFSharon A Ferguson et al.Session 1c: Tech Adoption: From Delivery Robots to DumbphonesCSCW
DiaryMate: Understanding User Perceptions and Experience in Human-AI Collaboration for Personal JournalingWith their generative capabilities, large language models (LLMs) have transformed the role of technological writing assistants from simple editors to writing collaborators. Such a transition emphasizes the need for understanding user perception and experience, such as balancing user intent and the involvement of LLMs across various writing domains in designing writing assistants. In this study, we delve into the less explored domain of personal writing, focusing on the use of LLMs in introspective activities. Specifically, we designed DiaryMate, a system that assists users in journal writing with LLM. Through a 10-day field study (N=24), we observed that participants used the diverse sentences generated by the LLM to reflect on their past experiences from multiple perspectives. However, we also observed that they are over-relying on the LLM, often prioritizing its emotional expressions over their own. Drawing from these findings, we discuss design considerations when leveraging LLMs in a personal writing practice.2024TKTaewan Kim et al.KAISTHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesAI-Assisted Creative WritingCHI
GenQuery: Supporting Expressive Visual Search with Generative ModelsDesigners rely on visual search to explore and develop ideas in early design stages. However, designers can struggle to identify suitable text queries to initiate a search or to discover images for similarity-based search that can adequately express their intent. We propose GenQuery, a novel system that integrates generative models into the visual search process. GenQuery can automatically elaborate on users' queries and surface concrete search directions when users only have abstract ideas. To support precise expression of search intents, the system enables users to generatively modify images and use these in similarity-based search. In a comparative user study (N=16), designers felt that they could more accurately express their intents and find more satisfactory outcomes with GenQuery compared to a tool without generative features. Furthermore, the unpredictability of generations allowed participants to uncover more diverse outcomes. By supporting both convergence and divergence, GenQuery led to a more creative experience.2024KSKihoon Son et al.KAISTGenerative AI (Text, Image, Music, Video)Recommender System UXCHI
Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health InterventionRecent large language models (LLMs) offer the potential to support public health monitoring by facilitating health disclosure through open-ended conversations but rarely preserve the knowledge gained about individuals across repeated interactions. Augmenting LLMs with long-term memory (LTM) presents an opportunity to improve engagement and self-disclosure, but we lack an understanding of how LTM impacts people's interaction with LLM-driven chatbots in public health interventions. We examine the case of CareCall—an LLM-driven voice chatbot with LTM—through the analysis of 1,252 call logs and interviews with nine users. We found that LTM enhanced health disclosure and fostered positive perceptions of the chatbot by offering familiarity. However, we also observed challenges in promoting self-disclosure through LTM, particularly around addressing chronic health conditions and privacy concerns. We discuss considerations for LTM integration in LLM-driven chatbots for public health monitoring, including carefully deciding what topics need to be remembered in light of public health goals.2024EJEunkyung Jo et al.University of California, IrvineConversational ChatbotsHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesCHI
Redefining Activity Tracking Through Older Adults' Reflections on Meaningful ActivitiesActivity tracking has the potential to promote active lifestyles among older adults. However, current activity tracking technologies may inadvertently perpetuate ageism by focusing on age-related health risks. Advocating for a personalized approach in activity tracking technology, we sought to understand what activities older adults find meaningful to track and the underlying values of those activities. We conducted a reflective interview study following a 7-day activity journaling with 13 participants. We identified various underlying values motivating participants to track activities they deemed meaningful. These values, whether competing or aligned, shape the desirability of activities. Older adults appreciate low-exertion activities, but they are difficult to track. We discuss how these activities can become central in designing activity tracking systems. Our research offers insights for creating value-driven, personalized activity trackers that resonate more fully with the meaningful activities of older adults.2024YWYiwen Wang et al.University of MarylandFitness Tracking & Physical Activity MonitoringElderly Care & Dementia SupportCHI
ChaCha: Leveraging Large Language Models to Prompt Children to Share Their Emotions about Personal EventsChildren typically learn to identify and express their emotions by sharing stories and feelings with others, particularly family members. However, it is challenging for parents or siblings to have effective emotion communication with children since children are still developing their communication skills. We present ChaCha, a chatbot that encourages and guides children to share personal events and associated emotions. ChaCha combines a state machine and large language models (LLMs) to keep the dialogue on track while carrying on free-form conversations. Through an exploratory study with 20 children (aged 8-12), we examine how ChaCha prompts children to share personal events and guides them to describe associated emotions. Participants perceived ChaCha as a close friend and shared their stories on various topics, such as family trips and personal achievements. Based on the findings, we discuss opportunities for leveraging LLMs to design child-friendly chatbots to support children in sharing emotions.2024WSWoosuk Seo et al.University of MichiganConversational ChatbotsAgent Personality & AnthropomorphismHuman-LLM CollaborationCHI
MindfulDiary: Harnessing Large Language Model to Support Psychiatric Patients' JournalingLarge Language Models (LLMs) offer promising opportunities in mental health domains, although their inherent complexity and low controllability elicit concern regarding their applicability in clinical settings. We present MindfulDiary, an LLM-driven journaling app that helps psychiatric patients document daily experiences through conversation. Designed in collaboration with mental health professionals, MindfulDiary takes a state-based approach to safely comply with the experts' guidelines while carrying on free-form conversations. Through a four-week field study involving 28 patients with major depressive disorder and five psychiatrists, we examined how MindfulDiary facilitates patients' journaling practice and clinical care. The study revealed that MindfulDiary supported patients in consistently enriching their daily records and helped clinicians better empathize with their patients through an understanding of their thoughts and daily contexts. Drawing on these findings, we discuss the implications of leveraging LLMs in the mental health domain, bridging the technical feasibility and their integration into clinical settings.2024TKTaewan Kim et al.KAISTHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationMental Health Apps & Online Support CommunitiesCHI
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined CriteriaBy simply composing prompts, developers can prototype novel generative applications with Large Language Models (LLMs). To refine prototypes into products, however, developers must iteratively revise prompts by evaluating outputs to diagnose weaknesses. Formative interviews (N=8) revealed that developers invest significant effort in manually evaluating outputs as they assess context-specific and subjective criteria. We present EvalLM, an interactive system for iteratively refining prompts by evaluating multiple outputs on user-defined criteria. By describing criteria in natural language, users can employ the system's LLM-based evaluator to get an overview of where prompts excel or fail, and improve these based on the evaluator's feedback. A comparative study (N=12) showed that EvalLM, when compared to manual evaluation, helped participants compose more diverse criteria, examine twice as many outputs, and reach satisfactory prompts with 59% fewer revisions. Beyond prompts, our work can be extended to augment model evaluation and alignment in specific application contexts.2024TKTaesu Kim et al.KAISTHuman-LLM CollaborationPrototyping & User TestingCHI
Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local ExplanationsThe local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.2023TSTong Steven Sun et al.Human AI Collaboration IICSCW
AVscript: Accessible Video Editing with Audio-Visual ScriptsSighted and blind and low vision (BLV) creators alike use videos to communicate with broad audiences. Yet, video editing remains inaccessible to BLV creators. Our formative study revealed that current video editing tools make it difficult to access the visual content, assess the visual quality, and efficiently navigate the timeline. We present AVscript, an accessible text-based video editor. AVscript enables users to edit their video using a script that embeds the video's visual content, visual errors (e.g., dark or blurred footage), and speech. Users can also efficiently navigate between scenes and visual errors or locate objects in the frame or spoken words of interest. A comparison study (N=12) showed that AVscript significantly lowered BLV creators' mental demands while increasing confidence and independence in video editing. We further demonstrate the potential of AVscript through an exploratory study (N=3) where BLV creators edited their own footage.2023MHMina Huh et al.University of Texas, AustinVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Accessible GamingVideo Production & EditingCHI
DataHalo: A Customizable Notification Visualization System for Personalized and Longitudinal InteractionsPeople struggle with the overflow of smartphone notifications but often face two challenges: (1) prioritizing the informative notifications as they wish and (2) retaining the delivered information as long as they want to utilize it. In this paper, we present DataHalo, a customizable notification visualization system that represents notifications as prolonged ambient visualizations on the home screen. DataHalo supports keyword-based filtering and categorization, and draws graphical marks based on time-varying importance model to enable longitudinal interaction with the notifications. We evaluated DataHalo through a usability study ($N$ = 17), from which we improved the interface. We then conducted a three-week deployment study ($N$ = 12) to assess how people use DataHalo in their domestic contexts. Our study revealed that people generated various visualization settings for different kinds of apps. Drawing on both quantitative and qualitative findings, we discussed implications for supporting effective notification management through customizable ambient visualizations.2023GHGuhyun Han et al.Seoul National UniversityNotification & Interruption ManagementCHI