Seeing Eye to AI? Applying Deep-Feature-Based Similarity Metrics to Information VisualizationJudging the similarity of visualizations is crucial to various applications, such as visualization-based search and visualization recommendation systems. Recent studies show deep-feature-based similarity metrics correlate well with perceptual judgments of image similarity and serve as effective loss functions for tasks like image super-resolution and style transfer. We explore the application of such metrics to judgments of visualization similarity. We extend a similarity metric using five ML architectures and three pre-trained weight sets. We replicate results from previous crowdsourced studies on scatterplot and visual channel similarity perception. Notably, our metric using pre-trained ImageNet weights outperformed gradient-descent tuned MS-SSIM, a multi-scale similarity metric based on luminance, contrast, and structure. Our work contributes to understanding how deep-feature-based metrics can enhance similarity assessments in visualization, potentially improving visual analysis tools and techniques. Supplementary materials are available at https://osf.io/dj2ms/.2025SLSheng Long et al.Northwestern University, Computer ScienceRecommender System UXInteractive Data VisualizationVisualization Perception & CognitionCHI
ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language ModelsExpressing stressful experiences in words is proven to improve mental and physical health, but individuals often disengage with writing interventions as they struggle to organize their thoughts and emotions. Reflective prompts have been used to provide direction, and large language models (LLMs) have demonstrated the potential to provide tailored guidance. However, current systems often limit users' flexibility to direct their reflections. We thus present ExploreSelf, an LLM-driven application designed to empower users to control their reflective journey, providing adaptive support through dynamically generated questions. Through an exploratory study with 19 participants, we examine how participants explore and reflect on personal challenges using ExploreSelf. Our findings demonstrate that participants valued the flexible navigation of adaptive guidance to control their reflective journey, leading to deeper engagement and insight. Building on our findings, we discuss the implications of designing LLM-driven tools that facilitate user-driven and effective reflection of personal challenges.2025ISInhwa Song et al.KAIST, School of ComputingHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesCHI
Underspecified Human Decision Experiments Considered HarmfulDecision-making with information displays is a key focus of research in areas like human-AI collaboration and data visualization. However, what constitutes a decision problem, and what is required for an experiment to conclude that decisions are flawed, remain imprecise. We present a widely applicable definition of a decision problem synthesized from statistical decision theory and information economics. We claim that to attribute loss in human performance to bias, an experiment must provide the information that a rational agent would need to identify the normative decision. We evaluate whether recent empirical research on AI-assisted decisions achieves this standard. We find that only 10 (26\%) of 39 studies that claim to identify biased behavior presented participants with sufficient information to make this claim in at least one treatment condition. We motivate the value of studying well-defined decision problems by describing a characterization of performance losses they allow to be conceived.2025JHJessica Hullman et al.Northwestern University, Computer ScienceExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
Characterizing Photorealism and Artifacts in Diffusion Model-Generated ImagesDiffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.2025NKNegar Kamali et al.Northwestern University, Computer ScienceGenerative AI (Text, Image, Music, Video)Explainable AI (XAI)Deepfake & Synthetic Media DetectionCHI
Designing Shared Information Displays for Agents of Varying Strategic SophisticationData-driven predictions are often perceived as inaccurate in hindsight due to behavioral responses~\cite{perdomo2020performative}. We consider the role of interface design choices on how individuals respond to predictions presented on a shared information display in a strategic setting. We introduce a novel staged experimental design to investigate the effects of interface design features, such as the visualization of prediction uncertainty and prediction error, within a repeated congestion game. In this game, participants assume the role of taxi drivers and use a shared information display to decide where to search for their next ride. Our experimental design endows agents with varying level-$k$ depths of thinking~\cite{camerer2004cognitive}, allowing some agents to possess greater sophistication in anticipating the decisions of others using the same information display. Through several large pre-registered experiments, we identify trade-offs between displays that are optimal for individual decisions and those that best serve the collective social welfare of the system. Additionally, we note that the influence of display characteristics varies based on an agent's strategic sophistication. We observe that design choices promoting individual-level decision-making can lead to suboptimal system outcomes, as manifested by a lower realization of potential social welfare. However, this decline in social welfare is offset by a slight reduction in distribution shift, narrowing the gap between predicted and realized system outcomes. This may enhance the perceived reliability and trustworthiness of the information display post hoc. Our findings pave the way for new research questions concerning the design of effective prediction interfaces in strategic environments.2024DZDongping Zhang et al.Session 4f: Multiplayer Gaming and CommunicationCSCW
Erie: A Declarative Grammar for Data SonificationData sonification—mapping data variables to auditory variables, such as pitch or volume—is used for data accessibility, scientific exploration, and data-driven art (e.g., museum exhibitions) among others. While a substantial amount of research has been made on effective and intuitive sonification design, software support is not commensurate, limiting researchers from fully exploring its capabilities. We contribute Erie, a declarative grammar for data sonification, that enables abstractly expressing auditory mappings. Erie supports specifying extensible tone designs (e.g., periodic wave, sampling, frequency/amplitude modulation synthesizers), various encoding channels, auditory legends, and composition options like sequencing and overlaying. Using standard Web Audio and Web Speech APIs, we provide an Erie compiler for web environments. We demonstrate the expressiveness and feasibility of Erie by replicating research prototypes presented by prior work and provide a sonification design gallery. We discuss future steps to extend Erie toward other audio computing environments and support interactive data sonification.2024HKHyeok Kim et al.Northwestern UniversityInteractive Data VisualizationMusic Composition & Sound Design ToolsCHI
Milliways: Taming Multiverses through Principled Evaluation of Data Analysis PathsMultiverse analyses involve conducting all combinations of reasonable choices in a data analysis process. A reader of a study containing a multiverse analysis might question—are all the choices included in the multiverse reasonable and equally justifiable? How much do results vary if we make different choices in the analysis process? In this work, we identify principles for validating the composition of, and interpreting the uncertainty in, the results of a multiverse analysis. We present Milliways, a novel interactive visualisation system to support principled evaluation of multiverse analyses. Milliways provides interlinked panels presenting result distributions, individual analysis composition, multiverse code specification, and data summaries. Milliways supports interactions to sort, filter and aggregate results based on the analysis specification to identify decisions in the analysis process to which the results are sensitive. To represent the two qualitatively different types of uncertainty that arise in multiverse analyses—probabilistic uncertainty from estimating unknown quantities of interest such as regression coefficients, and possibilistic uncertainty from choices in the data analysis—Milliways uses consonance curves and probability boxes. Through an evaluative study with five users familiar with multiverse analysis, we demonstrate how Milliways can support multiverse analysis tasks, including a principled assessment of the results of a multiverse analysis.2024ASAbhraneel Sarma et al.Northwestern UniversityInteractive Data VisualizationUncertainty VisualizationCHI
Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling As deep neural networks are more commonly deployed in high-stakes domains, their black-box nature makes uncertainty quantification challenging. We investigate the effects of presenting conformal prediction sets---a distribution-free class of methods for generating prediction sets with specified coverage---to express uncertainty in AI-advised decision-making. Through a large online experiment, we compare the utility of conformal prediction sets to displays of Top-$1$ and Top-$k$ predictions for AI-advised image labeling. In a pre-registered analysis, we find that the utility of prediction sets for accuracy varies with the difficulty of the task: while they result in accuracy on par with or less than Top-$1$ and Top-$k$ displays for easy images, prediction sets excel at assisting humans in labeling out-of-distribution (OOD) images, especially when the set size is small. Our results empirically pinpoint practical challenges of conformal prediction sets and provide implications on how to incorporate them for real-world decision-making.2024DZDongping Zhang et al.Northwestern UniversityExplainable AI (XAI)Recommender System UXUncertainty VisualizationCHI
MetaExplorer: Facilitating Reasoning with Epistemic Uncertainty in Meta-analysisScientists often use meta-analysis to characterize the impact of an intervention on some outcome of interest across a body of literature. However, threats to the utility and validity of meta-analytic estimates arise when scientists average over potentially important variations in context like different research designs. Uncertainty about quality and commensurability of evidence casts doubt on results from meta-analysis, yet existing software tools for meta-analysis do not provide an explicit software representation of these concerns. We present MetaExplorer, a prototype system for meta-analysis that we developed using iterative design with meta-analysis experts to provide a guided process for eliciting assessments of uncertainty and reasoning about how to incorporate them during statistical inference. Our qualitative evaluation of MetaExplorer with experienced meta-analysts shows that imposing a structured workflow both elevates the perceived importance of epistemic concerns and presents opportunities for tools to engage users in dialogue around goals and standards for evidence aggregation.2023AKAlex Kale et al.University of ChicagoUncertainty VisualizationComputational Methods in HCICHI
multiverse: Multiplexing Alternative Data Analyses in R NotebooksThere are myriad ways to analyse a dataset. But which one to trust? In the face of such uncertainty, analysts may adopt multiverse analysis: running all reasonable analyses on the dataset. Yet this is cognitively and technically difficult with existing tools—how does one specify and execute all combinations of reasonable analyses of a dataset?—and often requires discarding existing workflows. We present multiverse, a tool for implementing multiverse analyses in R with expressive syntax supporting existing computational notebook workflows. multiverse supports building up a multiverse through local changes to a single analysis and optimises execution by pruning redundant computations. We evaluate how multiverse supports programming multiverse analyses using (a) principles of cognitive ergonomics to compare with two existing multiverse tools; and (b) case studies based on semi-structured interviews with researchers who have successfully implemented an end-to-end analysis using multiverse. We identify design tradeoffs (e.g. increased flexibility versus learnability), and suggest future directions for multiverse tool design.2023ASAbhraneel Sarma et al.Northwestern UniversityInteractive Data VisualizationComputational Methods in HCICHI
Cicero: A Declarative Grammar for Responsive VisualizationDesigning responsive visualizations can be cast as applying transformations to a source view to render it suitable for a different screen size. However, designing responsive visualizations is often tedious as authors must manually apply and reason about candidate transformations. We present Cicero, a declarative grammar for concisely specifying responsive visualization transformations which paves the way for more intelligent responsive visualization authoring tools. Cicero's flexible specifier syntax allows authors to select visualization elements to transform, independent of the source view's structure. Cicero encodes a concise set of actions to encode a diverse set of transformations in both desktop-first and mobile-first design processes. Authors can ultimately reuse design-agnostic transformations across different visualizations. To demonstrate the utility of Cicero, we develop a compiler to an extended version of Vega-Lite, and provide principles for our compiler. We further discuss the incorporation of Cicero into responsive visualization authoring tools, such as a design recommender.2022HKHyeok Kim et al.Northwestern UniversityInteractive Data VisualizationGeospatial & Map VisualizationCHI
Human Factors in Model Interpretability: Industry Practices, Challenges, and NeedsAs the use of machine learning (ML) models in product development and data-driven decision-making processes became pervasive in many domains, people's focus on building a well-performing model has increasingly shifted to understanding how their model works. While scholarly interest in model interpretability has grown rapidly in research communities like HCI, ML, and beyond, little is known about how practitioners perceive and aim to provide interpretability in the context of their existing workflows. This lack of understanding of interpretability as practiced may prevent interpretability research from addressing important needs, or lead to unrealistic solutions. To bridge this gap, we conduct 22 semi-structured interviews with industry practitioners to understand how they conceive of and design for interpretability while they plan, build, and use their models. Based on a qualitative analysis of our results, we differentiate interpretability roles, processes, goals and strategies as they exist within organizations making heavy use of ML models. The characterization of interpretability work that emerges from our analysis suggests that model interpretability frequently involves cooperation and mental model comparison between people in different roles, often aimed at building trust not only between people and models, but also between people within the organization. We present implications for design that discuss gaps between the interpretability challenges that practitioners face in their practice and approaches proposed in the literature, highlighting possible research direction that can better address real-world needs.2020SHSungsoo Ray Hong et al.Interpreting and Explaining AICSCW
Decision-Making Under Uncertainty in Research Synthesis: Designing for the Garden of Forking PathsTo make evidence-based recommendations to decision-makers, researchers conducting systematic reviews and meta-analyses must navigate a garden of forking paths: a series of analytical decision-points, each of which has the potential to influence findings. To identify challenges and opportunities related to designing systems to help researchers manage uncertainty around which of multiple analyses is best, we interviewed 11 professional researchers who conduct research synthesis to inform decision-making within three organizations. We conducted a qualitative analysis identifying 480 analytical decisions made by researchers throughout the scientific process. We present descriptions of current practices in applied research synthesis and corresponding design challenges: making it more feasible for researchers to try and compare analyses, shifting researchers' attention from rationales for decisions to impacts on results, and supporting communication techniques that acknowledge decision-makers' aversions to uncertainty. We identify opportunities to design systems which help researchers explore, reason about, and communicate uncertainty in decision-making about possible analyses in research synthesis.2019AKAlex Kale et al.University of WashingtonInteractive Data VisualizationUncertainty VisualizationCHI
A Bayesian Cognition Approach to Improve Data VisualizationPeople naturally bring their prior beliefs to bear on how they interpret the new information, yet few formal models exist for accounting for the influence of users' prior beliefs in interactions with data presentations like visualizations. We demonstrate a Bayesian cognitive model for understanding how people interpret visualizations in light of prior beliefs and show how this model provides a guide for improving visualization evaluation. In a first study, we show how applying a Bayesian cognition model to a simple visualization scenario indicates that people's judgments are consistent with a hypothesis that they are doing approximate Bayesian inference. In a second study, we evaluate how sensitive our observations of Bayesian behavior are to different techniques for eliciting people subjective distributions, and to different datasets. We find that people don't behave consistently with Bayesian predictions for large sample size datasets, and this difference cannot be explained by elicitation technique. In a final study, we show how normative Bayesian inference can be used as an evaluation framework for visualizations, including of uncertainty.2019YKYea-Seul Kim et al.University of WashingtonInteractive Data VisualizationUncertainty VisualizationVisualization Perception & CognitionCHI
Some Prior(s) Experience Necessary: Templates for Getting Started With Bayesian AnalysisBayesian statistical analysis has gained attention in recent years, including in HCI. The Bayesian approach has several advantages over traditional statistics, including producing results with more intuitive interpretations. Despite growing interest, few papers in CHI use Bayesian analysis. Existing tools to learn Bayesian statistics require significant time investment, making it difficult to casually explore Bayesian methods. Here, we present a tool that lowers the barrier to exploration: a set of R code templates that guide Bayesian novices through their first analysis. The templates are tailored to CHI, supporting analyses found to be most common in recent CHI papers. In a user study, we found that the templates were easy to understand and use. However, we found that participants without a statistical background were not confident in their use. Together our contributions provide a concise analysis tool and empirical results for understanding and addressing barriers to using Bayesian analysis in HCI.2019CPChanda Phelan et al.University of MichiganTelemedicine & Remote Patient MonitoringComputational Methods in HCICHI
Vocal Shortcuts for Creative ExpertsVocal shortcuts, short spoken phrases to control interfaces, have the potential to reduce cognitive and physical costs of interactions. They may benefit expert users of creative applications (e.g., designers, illustrators) by helping them maintain creative focus. To aid the design of vocal shortcuts and gather use cases and design guidelines for speech interaction, we interviewed ten creative experts. Based on our findings, we built VoiceCuts, a prototype implementation of vocal shortcuts in the context of an existing creative application. In contrast to other speech interfaces, VoiceCuts targets experts' unique needs by handling short and partial commands and leverages document model and application context to disambiguate user utterances. We report on the viability and limitations of our approach based on feedback from creative experts.2019YKYea-Seul Kim et al.University of WashingtonVoice User Interface (VUI) DesignMusic Composition & Sound Design ToolsCHI