Agentic Audio Moderators vs Humans in Think-Aloud Usability Testing

articleCHI '26

Authors

WZ

Hong Kong Polytechnic University

GC

Hong Kong Polytechnic University

YW

Hong Kong Polytechnic University

PA

Southern University of Science and Technology

JD

Piipivo Technology

CL

Hong Kong Polytechnic University

Generative AI (Text, Image, Music, Video)AI-Assisted Decision-Making & AutomationExplainable AI (XAI)User Research Methods (Interviews, Surveys, Observation)UI/UX DesignersAI/ML Researchers & EngineersHCI Researchers

Paper Title

Agentic Audio Moderator vs Human Moderator in Think-Aloud Usability Testing: Results from a Randomized Controlled Trial

Publication Info

  • Topic area: Usability testing and human-computer interaction using AI moderators.
  • Keywords: AI moderation, think-aloud usability testing, human-computer interaction, agentic AI, user experience, social presence, anthropomorphism, verbalization behaviors, randomized controlled trial, UX research.

Background and Problem

  • Problem / challenge: Traditional usability testing with human moderators is resource-intensive, variable in quality, and lacks scalability. It is unclear whether AI moderators can fulfill the complex role of facilitating think-aloud usability testing while maintaining social presence and contextual adaptability.
  • Significance: Automating the moderation process could reduce costs, improve consistency, and scale usability testing, benefiting both academia and industry.
  • Motivation and related work: Prior research has explored AI in structured tasks like tutoring and co-design, but the role of AI moderators in usability testing remains underexplored. Existing studies highlight the importance of moderators in maintaining engagement, eliciting verbalizations, and creating a sense of social presence, which are challenging for AI to replicate.

Solution

  • Proposed approach: Development and evaluation of an agentic audio moderator for think-aloud usability testing, designed to balance structured guidance with adaptive interaction.
  • Novelty:
    1. Iterative design and development of an AI moderator informed by UX expert interviews and prior literature.
    2. First randomized controlled trial (RCT) comparing AI and human moderators in think-aloud usability testing.
    3. Identification of design implications for improving AI moderators in usability testing.
  • Procedure and key techniques:
    • Conducted semi-structured interviews with nine UX experts to identify design requirements.
    • Iteratively developed an AI moderator with five core features: natural language guidance, think-aloud facilitation, context-sensitive follow-up questions, human intervention protocols, and trust-building cues.
    • Evaluated the AI moderator in an RCT (N = 60) using a note-taking application, comparing it to a human moderator across task performance, verbalization behaviors, user experience, and social perceptions.

Results

  • Concrete findings:
    • No significant differences in participants’ task performance, verbalization behaviors, or physiological stress levels between AI and human moderators.
    • Participants rated human moderators significantly higher in anthropomorphism, social presence, trust-building, and context-aware questioning.
    • AI moderators provided more frequent and rapid prompts but were perceived as less natural and engaging.
  • Advantage over baselines:
    • AI moderators matched human moderators in facilitating task completion and eliciting verbalizations, demonstrating feasibility for structured usability tasks.
    • Human moderators outperformed AI in social and relational dimensions, such as emotional resonance and contextual sensitivity.
  • Experiments / evaluation:
    • RCT with 60 participants (30 per group), using a note-taking application and think-aloud methodology.
    • Measured task performance, verbalization behaviors, physiological arousal (GSR), and user perceptions via questionnaires (Godspeed, Social Presence Scale, Functional Behavior Ratings).
    • AI moderator voice was cloned from the human moderator to ensure parity.
  • Limitations and future work:
    • Limited to a single application domain and a homogeneous participant pool.
    • Technical constraints included occasional delays and overly active prompting by the AI.
    • Future work should explore hybrid human-AI moderation models, broader application domains, and equivalence testing to validate AI moderators as interchangeable with humans.

Summary

This study developed and evaluated an agentic audio moderator for think-aloud usability testing, comparing it to a human moderator in a randomized controlled trial. While the AI moderator effectively facilitated task completion and verbalization behaviors, it was rated lower in social presence and contextual sensitivity. The findings suggest that AI moderators are suitable for structured, low-stakes usability tasks but fall short in fostering trust and engagement. Future research should focus on hybrid moderation models and refining AI interaction strategies to better complement human moderators.

Quick Actions

AdRecommended

Learn AI Coding at CodeNow

open_in_newOpen DOI Link
DOI: https://doi.org/10.1145/3772318.3791653
At a Glance

Paper Snapshot

fact_check
dataset
Source
CHI
calendar_month
Year
2026
emoji_events
Award
No award tagged
group
Authors
6 authors
sell
Subtopics
Generative AI (Text, Image, Music, Video), AI-Assisted Decision-Making & Automation, Explainable AI (XAI), User Research Methods (Interviews, Surveys, Observation)
work
Professions
UI/UX Designers, AI/ML Researchers & Engineers, HCI Researchers
article
Content Status
Full text indexed
hub
Related Papers
10 related papers
Spread Ideas

Share This Paper

ios_share

https://hci.top/en/papers/chi/222659/2026

Agentic Audio Moderators vs Humans in Think-Aloud U…… | HCI.TOP