AI-Mediated Feedback Improves Student Revisions: A Randomized Trial with FeedbackWriter in a Large Undergraduate CourseDespite growing interest in using LLMs to generate feedback on students’ writing, little is known about how students respond to AI-mediated versus human-provided feedback. We address this gap through a randomized controlled trial in a large introductory economics course (N=354), where we introduce and deploy FeedbackWriter—a system that generates AI suggestions to teaching assistants (TAs) while they provide feedback on students’ knowledge-intensive essays. TAs have the full capacity to adopt, edit, or dismiss the suggestions. Students were randomly assigned to receive either handwritten feedback from TAs (baseline) or AI-mediated feedback where TAs received suggestions from FeedbackWriter. Students revise their drafts based on the feedback, which is further graded. In total, 1,366 essays were graded using the system. We found that students receiving AI-mediated feedback produced significantly higher-quality revisions, with gains increasing as TAs adopted more AI suggestions. TAs found the AI suggestions useful for spotting gaps and clarifying rubrics.2026XLXinyi Lu et al.University of MichiganHuman-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsAI-Assisted Writing & Text GenerationCHI
From Conversation to Human-AI Common Ground: Extracting Cognitive Workflows for Reuse in Sense-making TasksKnowledge workers increasingly rely on conversational AI for sense-making tasks (e.g., conducting market analysis), yet must repeatedly reconstruct context and intent to meet their goals. A formative study (N=10) showed that workflow reuse with AI often failed. Current tools either only remember preferences or enforce rigid, predefined workflows—neither adapts to evolving goals. We present ThinkFlow, a system that maintains a dynamic common ground through a cognitive workflow schema, enabling users to express intent and AI to adapt and reuse workflows across contexts. An expert-rating study shows that the schema can accurately capture the collocutor's reasoning process, and when reused for a similar task, improves the AI's responses compared to when the schema isn't present. A user study with eight knowledge workers demonstrates that ThinkFlow supports awareness of evolving workflows, intent expression, and flexible application across contexts.2026XCXinyue Chen et al.University of MichiganHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationParticipatory DesignCHI
MeetMap: Balancing AI Assistance and User Agency for Effective Real-Time Sense-Making in MeetingsVideo meeting platforms display conversations linearly through transcripts or summaries. However, ideas during a meeting do not linearly emerge. We leverage LLMs to create dialogue maps in real-time to help people visually structure and connect ideas. Balancing the need to reduce the cognitive load on users during the conversation and give users sufficient control when using AI-generated content, we explore two human-AI collaborative methods. In Human-Map, AI generates summaries of conversations as nodes, and users create dialogue maps with the nodes. In AI-Map, AI produces dialogue maps where users can make edits. We ran a within-subject experiment with ten pairs of users, comparing the two MeetMap variants and a baseline. Users preferred MeetMap to traditional methods for note-taking, which aligned better with their mental models of conversations. Users liked the ease of use for AI-Map due to the low effort demands and appreciated the hands-on opportunity in Human-Map for sense-making. This work informs the future design of AI-assisted tools for real-time cognitive scaffolding in meetings by emphasizing the necessity to balance AI assistance with synchronicity and user agency to enhance collaborative sense-making.2025XCXinyue Chen et al.Making Work Meetings BetterCSCW
More AI Assistance Reduces Cognitive Engagement: Examining the AI Assistance Dilemma in AI-Supported Note-TakingAs AI tools become increasingly integrated into cognitively demanding tasks, like note-taking, questions remain about whether they enhance or compromise cognitive engagement. This paper investigates the "AI Assistance Dilemma" in note-taking, examining how varying levels of AI support impact user engagement and comprehension. In a within-subject experiment, we asked participants (N=30) to take notes during lecture videos under three conditions: \AutomatedAI (high assistance with structured notes), \IntermediateAI (moderate assistance with real-time summary, and \MinimalAI (low assistance with transcript). Results reveal that Intermediate AI yields the highest post-test scores and Automated AI the lowest. Participants, however, preferred the automated setup for its perceived ease of use and perceived lower cognitive effort, suggesting a discrepancy between preferred convenience and cognitive benefit. Our study provides insights on designing AI assistance that preserves cognitive engagement, offering implications for designing moderate AI support in cognitive tasks.2025XCXinyue Chen et al.AI & WritingCSCW
Rubikon: Intelligent Tutoring for Rubik's Cube Learning Through AR-enabled Physical Task ReconfigurationLearning to solve a Rubik's Cube requires the learners to repeatedly practice a skill component, e.g., identifying a misplaced square and putting it back. However, for 3D physical tasks such as this, generating sufficient repeated practice opportunities for learners can be challenging, in part because it is difficult for novices to reconfigure the physical object to specific states. We propose Rubikon, an intelligent tutoring system for learning to solve the Rubik's Cube. Rubikon reduces the necessity for repeated manual configurations of the Rubik's Cube without compromising the tactile experience of handling a physical cube. The foundational design of Rubikon is an AR setup, where learners manipulate a physical cube while seeing an AR-rendered cube on a display. Rubikon automatically generates configurations of the Rubik's Cube to target learners' weaknesses and help them exercise diverse knowledge components. In a between-subjects experiment, we showed that Rubikon learners scored 25% higher on a post-test compared to baselines.2025HRHaocheng Ren et al.AR Navigation & Context AwarenessIntelligent Tutoring Systems & Learning AnalyticsDIS
eXplainMR: Generating Real-time Textual and Visual eXplanations to Facilitate UltraSonography Learning in MRMixed-Reality physical task guidance systems have the benefit of providing virtual instructions while enabling learners to interact with the tangible world. However, they are mostly built around single-path tasks and often employ visual cues for motion guidance without explanations on why an action was recommended. In this paper, we introduce eXplainMR, a mixed-reality tutoring system that teaches medical trainees to perform cardiac ultrasound. eXplainMR automatically generates subgoals for obtaining an ultrasound image that contains clinically relevant information, and textual and visual explanations for each recommended move based on the visual difference between the two consecutive subgoals. We performed a between-subject experiment (N=16) in one US teaching hospital comparing eXplainMR with a baseline MR system that offers commonly used arrow and shadow guidance. We found that after using eXplainMR, medical trainees demonstrated a better understanding of anatomy and showed more systematic reasoning when deciding on the next moves, which was facilitated by the real-time explanations provided in eXplainMR.2025JWJingying Wang et al.University of MichiganMixed Reality WorkspacesVR Medical Training & RehabilitationExplainable AI (XAI)CHI
TeachTune: Reviewing Pedagogical Agents Against Diverse Student Profiles with Simulated StudentsLarge language models (LLMs) can empower teachers to build pedagogical conversational agents (PCAs) customized for their students. As students have different prior knowledge and motivation levels, teachers must review the adaptivity of their PCAs to diverse students. Existing chatbot reviewing methods (e.g., direct chat and benchmarks) are either manually intensive for multiple iterations or limited to testing only single-turn interactions. We present TeachTune, where teachers can create simulated students and review PCAs by observing automated chats between PCAs and simulated students. Our technical pipeline instructs an LLM-based student to simulate prescribed knowledge levels and traits, helping teachers explore diverse conversation patterns. Our pipeline could produce simulated students whose behaviors correlate highly to their input knowledge and motivation levels within 5% and 10% accuracy gaps. Thirty science teachers designed PCAs in a between-subjects study, and using TeachTune resulted in a lower task load and higher student profile coverage over a baseline.2025HJHyoungwook Jin et al.KAIST, School of ComputingGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsCHI
3DPFIX: Improving Remote Novices' 3D Printing Troubleshooting through Human-AI Collaboration DesignThe widespread consumer-grade 3D printers and learning resources online enable novices to self-train in remote settings. While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help. We conducted a formative study with 76 active 3D printing users to learn how remote novices leverage online resources in troubleshooting and their challenges. We found that remote novices cannot fully utilize online resources. For example, the online archives provide general information in a static way, making it hard to search and relate their unique cases with existing descriptions. Online communities can potentially ease their struggles by providing more targeted suggestions, but a helper who can provide custom help is rather scarce, making it hard to obtain timely assistance. We propose 3DPFIX, an interactive 3D troubleshooting system powered by the pipeline to facilitate Human-AI Collaboration, designed to improve novices' 3D printing experiences and thus help them easily accumulate their domain knowledge. We built 3DPFIX that supports automated diagnosis and solution-seeking. 3DPFIX was built upon shared dialogues about failure cases from Q&A discourses accumulated in online communities. We leverage social annotations (i.e., comments) to build an annotated failure image dataset for AI classifiers and extract a solution pool. Our summative study revealed that using 3DPFIX helped participants spend significantly less effort in diagnosing failures and finding a more accurate solution than relying on their common practice. We also found that 3DPFIX users learn about 3D printing domain-specific knowledge. We discuss the implications of leveraging community-driven data in developing future Human-AI collaboration designs.2024NKNahyun Kwon et al.Session 1e: Human-AI CollaborationCSCW
Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery LearningVideos are prominent learning materials to prepare surgical trainees before they enter the operating room (OR). In this work, we explore techniques to enrich the video-based surgery learning experience. We propose Surgment, a system that helps expert surgeons create exercises with feedback based on surgery recordings. Surgment is powered by a few-shot-learning-based pipeline (SegGPT+SAM) to segment surgery scenes, achieving an accuracy of 92\%. The segmentation pipeline enables functionalities to create visual questions and feedback desired by surgeons from a formative study. Surgment enables surgeons to 1) retrieve frames of interest through sketches, and 2) design exercises that target specific anatomical components and offer visual feedback. In an evaluation study with 11 surgeons, participants applauded the search-by-sketch approach for identifying frames of interest and found the resulting image-based questions and feedback to be of high educational value.2024JWJingying Wang et al.University of MichiganSurgical Assistance & Medical TrainingPrototyping & User TestingCHI
Looking Together ≠ Seeing the Same Thing: Understanding Surgeons' Visual Needs During Intra-operative Coordination and InstructionShared gaze visualizations have been found to enhance collaboration and communication outcomes in diverse HCI subfields including collaborative work and learning. Given the importance of gaze in surgery operations, especially when a surgeon trainer and trainee need to coordinate their actions, research on the use of gaze to facilitate intra-operative coordination and instruction has been limited and shows mixed implications. We performed a field observation of 8 surgeries and an interview study with 14 surgeons to understand their visual needs during operations, informing ways to leverage and augment gaze to enhance intra-operative coordination and instruction. We found that trainees have varying needs in receiving visual guidance which are often unfulfilled by the trainers’ instructions. It is critical for surgeons to control the timing of the gaze-based visualizations and effectively interpret gaze data. We suggest overlay technologies, e.g., gaze-based summaries and depth sensing, to augment raw gaze in support of surgical coordination and instruction.2024VPVitaliy Popov et al.University of Michigan, University of MichiganEye Tracking & Gaze InteractionVR Medical Training & RehabilitationCHI
MeetScript: Designing Transcript-based Interactions to Support Active Participation in Group Video MeetingsWhile video conferencing is prevalent, concurrent participation channels are limited. People experience challenges keeping up with the discussion and misunderstanding frequently occurs. Through a formative study, we probed into the design space of providing real-time transcripts as an extra communication space for video meeting attendees. We then present MeetScript, a system that provides parallel participation channels through real-time interactive transcripts. MeetScript visualizes the discussion through a chat-alike interface and allows meeting attendees to make real-time collaborative annotations. Over time, MeetScript gradually hides extraneous content to retain the most essential information on the transcript, with the goal of reducing the cognitive load required on users to process the information in real time. In an experiment with 80 users in 22 teams, we compared MeetScript with two baseline conditions where participants used Zoom alone (business-as-usual), or Zoom with an adds-on transcription service (Otter.ai). We found that MeetScript significantly enhanced people's non-verbal participation and recollection of their teams' decision-making processes compared to the baselines. Users liked that MeetScript allowed them to easily navigate the transcript, and contextualize feedback and new ideas with existing ones.2023XCXinyue Chen et al.Social ConnectionsCSCW
ReadingQuizMaker: A Human-NLP Collaborative System to Support Instructors Design High Quality Reading Quiz QuestionsDespite that reading assignments are prevalent, methods to encourage students to actively read are limited. We propose a system ReadingQuizMaker that supports instructors to conveniently design high-quality questions to help students comprehend readings. ReadingQuizMaker adapts to instructors' natural workflows of creating questions, while providing NLP-based process-oriented support. ReadingQuizMaker enables instructors to decide when and which NLP models to use, select the input to the models, and edit the outcomes. In an evaluation study, instructors found the resulting questions to be comparable to their previously designed quizzes. Instructors praised ReadingQuizMaker for its ease of use, and considered the NLP suggestions to be satisfying and helpful. We compared ReadingQuizMaker with a control condition where instructors were given automatically generated questions to edit. Instructors showed a strong preference for the human-AI teaming approach provided by ReadingQuizMaker. Our findings suggest the importance of giving users control and showing an immediate preview of AI outcomes when providing AI support.2023XLXinyi Lu et al.University of MichiganGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationProgramming Education & Computational ThinkingCHI
Practice-Based Teacher Questioning Strategy Training with ELK: A Role-Playing Simulation for Eliciting Learner KnowledgePractice is essential for learning. However, for many interpersonal skills, there often are not enough opportunities and venues for novices to repeatedly practice. Role-playing simulations offer a promising framework to advance practice-based professional training for complex communication skills, in fields such as teaching. In this work, we introduce ELK (Eliciting Learner Knowledge), a role-playing simulation system that helps K-12 teachers develop effective questioning strategies to elicit learners' prior knowledge. We evaluate ELK with 75 pre-service teachers through a mixed-method study. We find that teachers demonstrate a modest increase in effective questioning strategies and develop sympathy towards students after using ELK for 3 rounds. We implement a supplementary activity in ELK in which users evaluate transcripts generated from past role-play sessions. We demonstrate that evaluating conversation moves is as effective for learning as role-playing, while without requiring the presence of a partner. We contribute design implications for role-play systems for communication strategy training.2021XWXu Wang et al.Learning and MentoringCSCW
Seeing Beyond Expert Blind Spots: Online Learning Design for Scale and QualityMaximizing system scalability and quality are sometimes at odds. This work provides an example showing scalability and quality can be achieved at the same time in instructional design, contrary to what instructors may believe or expect. We situate our study in the education of HCI methods, and provide suggestions to improve active learning within the HCI education community. While designing learning and assessment activities, many instructors face the choice of using open-ended or close-ended activities. Close-ended activities such as multiple-choice questions (MCQs) enable automated feedback to students. However, a survey with 22 HCI professors revealed a belief that MCQs are less valuable than open-ended questions, and thus, using them entails making a quality sacrifice in order to achieve scalability. A study with 178 students produced no evidence to support the teacher belief. This paper indicates more promise than concern in using MCQs for scalable instruction and assessment in at least some HCI domains.2021XWXu Wang et al.University of MichiganOnline Learning & MOOC PlatformsPrototyping & User TestingCHI