Sensemaking in User-Driven Algorithm Auditing: A Case Study on Gender Bias in an Image Captioning Model

Behnoosh Mohammadzadeh; Jules Françoise; michele gouiffes; Baptiste Caramiaux

doi:10.1145/3772318.3790784

Sensemaking in User-Driven Algorithm Auditing: A Case Study on Gender Bias in an Image Captioning Model

Best Paper

articleCHI '26

Authors

BM

Université Paris-Saclay

JF

Université Paris-Saclay

MG

Université Paris-Saclay

BC

Sorbonne University

Explainable AI (XAI)Algorithmic Transparency & AuditabilityPrivacy by Design & User ControlAI/ML Researchers & EngineersUI/UX DesignersHCI Researchers

DOI

https://doi.org/10.1145/3772318.3790784open_in_new

Paper Title

Sensemaking in User-Driven Algorithm Auditing: A Case Study on Gender Bias in an Image Captioning Model

Publication Info

Topic area: User-driven algorithm auditing for detecting gender bias in AI systems.
Keywords: Sensemaking, algorithm auditing, gender bias, image captioning, user interfaces, non-expert users, visual-linguistic models, transparency, accountability, human-AI interaction.

Background and Problem

Problem / challenge: Algorithmic systems often exhibit biases, such as gender bias, but tools for non-expert users to audit these systems are limited. Existing tools focus on performance evaluation rather than open-ended exploration, leaving gaps in supporting iterative sensemaking processes.
Significance: Addressing bias in AI systems is critical for ensuring fairness, transparency, and accountability, particularly as these systems influence societal norms and decision-making.
Motivation and related work: Prior research has documented gender bias in image captioning models, such as reinforcing stereotypes and misclassifying roles based on gender. While expert-led audits have identified such biases, user-driven audits by non-experts remain underexplored. This study builds on the sensemaking framework to design tools that empower non-experts to uncover and reason about biases.

Solution

Proposed approach: Development and evaluation of three interfaces—Baseline, Image Masking Tool, and Text Filtering Tool—designed to support non-experts in auditing gender bias in image captioning models through iterative sensemaking.
Novelty:
1. Application of the sensemaking framework to user-driven algorithm auditing.
2. Design and evaluation of specialized tools (Masking and Filtering) to support hypothesis generation and evidence collection.
3. Empirical demonstration of how interface design shapes bias detection and user confidence.
4. Thematic analysis of gender bias patterns identified by non-expert auditors.
Procedure and key techniques:
- Conducted a between-subjects study with 60 participants using the Salesforce BLIP image captioning model and the Visogender dataset.
- Participants audited the model under one of three conditions: Baseline (open-ended exploration), Masking (manipulating visual inputs), and Filtering (querying captions by keywords).
- Data collected included bias cards (hypotheses and evidence), confidence ratings, and thematic analysis of identified biases.

Results

Concrete findings:
- Participants identified four main patterns of gender bias: reinforcement of stereotypes, prioritization of gender over profession, biased reliance on visual cues, and gendered language hierarchies.
- The Masking Tool revealed inconsistencies in role attribution based on visual cues, while the Filtering Tool exposed systemic linguistic asymmetries.
- Participants in the Filtering condition collected significantly more evidence per hypothesis (mean = 5.25 items) compared to the Masking condition.
- Confidence ratings correlated with the amount of evidence collected in tool-enabled conditions but not in the Baseline condition.
Advantage over baselines:
- Masking enabled fine-grained, counterfactual testing of visual cues, uncovering biases like role misattribution when gender cues were obscured.
- Filtering facilitated the detection of broader linguistic patterns, such as markedness and gendered descriptors.
- Both tools supported more diverse and systematic bias identification compared to the Baseline interface.
Experiments / evaluation:
- Participants: 60 non-experts (balanced gender, diverse educational backgrounds, no prior auditing experience).
- Dataset: 80 images from the Visogender dataset focusing on medical professions.
- Metrics: Number of bias cards, evidence items per hypothesis, thematic distribution of biases, and confidence ratings.
Limitations and future work:
- Limited to a fixed dataset and specific domain (medical professions).
- Did not include marginalized communities, potentially narrowing perspectives.
- Focused on individual audits; future work should explore collaborative and longitudinal auditing.
- Interaction logs and real-world settings could provide deeper insights into sensemaking processes.

Summary

This study demonstrates how interface design grounded in the sensemaking framework can empower non-experts to audit gender bias in AI systems. Through a case study on an image captioning model, participants using the Masking and Filtering tools identified diverse patterns of bias, such as stereotyped role assignments and linguistic asymmetries. The tools shaped the granularity of observations and confidence in hypotheses, highlighting the interplay between visual and linguistic signals in model behavior. Future research should expand this approach to other domains, support collaborative auditing, and integrate sensemaking tools into everyday AI interactions to foster transparency and accountability.

Quick Actions

Share

Share this page

ios_share

X in mail

https://hci.top/en/papers/chi/223522/2026

AdRecommended

Learn AI Coding at CodeNow

open_in_newOpen DOI Link

DOI: https://doi.org/10.1145/3772318.3790784

At a Glance

Paper Snapshot

fact_check

dataset

Source

CHI

calendar_month

Year

2026

emoji_events

Award

Best Paper

group

Authors

4 authors

sell

Subtopics

Explainable AI (XAI), Algorithmic Transparency & Auditability, Privacy by Design & User Control

work

Professions

AI/ML Researchers & Engineers, UI/UX Designers, HCI Researchers

article

Content Status

Full text indexed

hub