Agentic Audio Moderators vs Humans in Think-Aloud Usability Testing
Authors
Hong Kong Polytechnic University
Hong Kong Polytechnic University
Hong Kong Polytechnic University
Southern University of Science and Technology
Piipivo Technology
Hong Kong Polytechnic University
Paper Title
Agentic Audio Moderator vs Human Moderator in Think-Aloud Usability Testing: Results from a Randomized Controlled Trial
Publication Info
- Topic area: Usability testing and human-computer interaction using AI moderators.
- Keywords: AI moderation, think-aloud usability testing, human-computer interaction, agentic AI, user experience, social presence, anthropomorphism, verbalization behaviors, randomized controlled trial, UX research.
Background and Problem
- Problem / challenge: Traditional usability testing with human moderators is resource-intensive, variable in quality, and lacks scalability. It is unclear whether AI moderators can fulfill the complex role of facilitating think-aloud usability testing while maintaining social presence and contextual adaptability.
- Significance: Automating the moderation process could reduce costs, improve consistency, and scale usability testing, benefiting both academia and industry.
- Motivation and related work: Prior research has explored AI in structured tasks like tutoring and co-design, but the role of AI moderators in usability testing remains underexplored. Existing studies highlight the importance of moderators in maintaining engagement, eliciting verbalizations, and creating a sense of social presence, which are challenging for AI to replicate.
Solution
- Proposed approach: Development and evaluation of an agentic audio moderator for think-aloud usability testing, designed to balance structured guidance with adaptive interaction.
- Novelty:
- Iterative design and development of an AI moderator informed by UX expert interviews and prior literature.
- First randomized controlled trial (RCT) comparing AI and human moderators in think-aloud usability testing.
- Identification of design implications for improving AI moderators in usability testing.
- Procedure and key techniques:
- Conducted semi-structured interviews with nine UX experts to identify design requirements.
- Iteratively developed an AI moderator with five core features: natural language guidance, think-aloud facilitation, context-sensitive follow-up questions, human intervention protocols, and trust-building cues.
- Evaluated the AI moderator in an RCT (N = 60) using a note-taking application, comparing it to a human moderator across task performance, verbalization behaviors, user experience, and social perceptions.
Results
- Concrete findings:
- No significant differences in participants’ task performance, verbalization behaviors, or physiological stress levels between AI and human moderators.
- Participants rated human moderators significantly higher in anthropomorphism, social presence, trust-building, and context-aware questioning.
- AI moderators provided more frequent and rapid prompts but were perceived as less natural and engaging.
- Advantage over baselines:
- AI moderators matched human moderators in facilitating task completion and eliciting verbalizations, demonstrating feasibility for structured usability tasks.
- Human moderators outperformed AI in social and relational dimensions, such as emotional resonance and contextual sensitivity.
- Experiments / evaluation:
- RCT with 60 participants (30 per group), using a note-taking application and think-aloud methodology.
- Measured task performance, verbalization behaviors, physiological arousal (GSR), and user perceptions via questionnaires (Godspeed, Social Presence Scale, Functional Behavior Ratings).
- AI moderator voice was cloned from the human moderator to ensure parity.
- Limitations and future work:
- Limited to a single application domain and a homogeneous participant pool.
- Technical constraints included occasional delays and overly active prompting by the AI.
- Future work should explore hybrid human-AI moderation models, broader application domains, and equivalence testing to validate AI moderators as interchangeable with humans.
Summary
This study developed and evaluated an agentic audio moderator for think-aloud usability testing, comparing it to a human moderator in a randomized controlled trial. While the AI moderator effectively facilitated task completion and verbalization behaviors, it was rated lower in social presence and contextual sensitivity. The findings suggest that AI moderators are suitable for structured, low-stakes usability tasks but fall short in fostering trust and engagement. Future research should focus on hybrid moderation models and refining AI interaction strategies to better complement human moderators.
Quick Actions
Learn AI Coding at CodeNow
Paper Snapshot
Share This Paper
https://hci.top/en/papers/chi/222659/2026