HAGI: Head-Assisted Gaze Imputation for Mobile Eye TrackersMobile eye tracking plays a vital role in capturing human visual attention across both real-world and extended reality (XR) environments, making it an essential tool for applications ranging from behavioural research to human-computer interaction. However, missing values due to blinks, pupil detection errors, or illumination changes pose significant challenges for further gaze data analysis. To address this challenge, we introduce HAGI – a multi-modal diffusion-based approach for gaze data imputation that, for the first time, uses the integrated head orientation sensors to exploit the inherent correlation between head and eye movements. Our method includes a head-movement feature extraction module alongside a novel hybrid feature fusion mechanism that effectively integrates gaze and head motion features at multiple levels. Additionally, we introduce a tailored loss function to enhance gaze imputation accuracy further. Extensive evaluations on the large-scale Nymeria, Ego-Exo4D, and HOT3D datasets demonstrate that HAGI consistently outperforms conventional interpolation methods and deep learning-based time-series imputation baselines, reducing mean angular error by up to 22%. Furthermore, statistical analyses confirm that HAGI produces gaze velocity distributions that more closely match actual human gaze behaviour than baselines, ensuring more realistic gaze imputations. Our method paves the way for more complete and accurate eye gaze recordings in real-world settings and has significant potential for enhancing gaze-based analysis and interaction across various application domains.2025CJChuhan Jiao et al.Eye Tracking & Gaze InteractionHuman Pose & Activity RecognitionContext-Aware ComputingUIST
SummAct: Uncovering User Intentions Through Interactive Behaviour SummarisationRecent work has highlighted the potential of modelling interactive behaviour analogously to natural language. We propose interactive behaviour summarisation as a novel computational task and demonstrate its usefulness for automatically uncovering latent user goals while interacting with graphical user interfaces. We introduce SummAct – a novel hierarchical method to summarise low-level input actions into high-level goals to tackle this task. SummAct first identifies sub-goals from user actions using a large language model and in-context learning. In a second step, high-level goals are obtained by fine-tuning the model using a novel UI element weighting mechanism to preserve detailed context information embedded within UI elements during summarisation. Through a series of evaluations, we demonstrate that SummAct significantly outperforms baseline methods across desktop and mobile user interfaces and interactive tasks by up to 21.9%. We further introduce two exciting example use cases enabled by our method: interactive behaviour forecasting and automatic behaviour synonym identification.2025GZGuanhua Zhang et al.University of Stuttgart, Institute for Visualisation and Interactive SystemsHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
Chartist: Task-driven Eye Movement Control for Chart ReadingTo design data visualizations that are easy to comprehend, we need to understand how people with different interests read them. Computational models of predicting scanpaths on charts could complement empirical studies by offering estimates of user performance inexpensively; however, previous models have been limited to gaze patterns and overlooked the effects of tasks. Here, we contribute Chartist, a computational model that simulates how users move their eyes to extract information from the chart in order to perform analysis tasks, including value retrieval, filtering, and finding extremes. The novel contribution lies in a two-level hierarchical control architecture. At the high level, the model uses LLMs to comprehend the information gained so far and applies this representation to select a goal for the lower-level controllers, which, in turn, move the eyes in accordance with a sampling policy learned via reinforcement learning. The model is capable of predicting human-like task-driven scanpaths across various tasks. It can be applied in fields such as explainable AI, visualization design evaluation, and optimization. While it displays limitations in terms of generalizability and accuracy, it takes modeling in a promising direction, toward understanding human behaviors in interacting with charts.2025DSDanqing Shi et al.Aalto UniversityInteractive Data VisualizationComputational Methods in HCICHI
Mindful Explanations: Prevalence and Impact of Mind Attribution in XAI ResearchWhen users perceive AI systems as mindful, independent agents, they hold them responsible instead of the AI experts who created and designed these systems. So far, it has not been studied whether explanations support this shift in responsibility through the use of mind-attributing verbs like "to think". To better understand the prevalence of mind-attributing explanations we analyse AI explanations in 3,533 explainable AI (XAI) research articles from the Semantic Scholar Open Research Corpus (S2ORC). Using methods from semantic shift detection, we identify three dominant types of mind attribution: (1) metaphorical (e.g. "to learn" or "to predict"), (2) awareness (e.g. "to consider"), and (3) agency (e.g. "to make decisions"). We then analyse the impact of mind-attributing explanations on awareness and responsibility in a vignette-based experiment with 199 participants. We find that participants who were given a mind-attributing explanation were more likely to rate the AI system as aware of the harm it caused. Moreover, the mind-attributing explanation had a responsibility-concealing effect: Considering the AI experts' involvement lead to reduced ratings of AI responsibility for participants who were given a non-mind-attributing or no explanation. In contrast, participants who read the mind-attributing explanation still held the AI system responsible despite considering the AI experts' involvement. Taken together, our work underlines the need to carefully phrase explanations about AI systems in scientific writing to reduce mind attribution and clearly communicate human responsibility.2024SHSusanne Hindennach et al.Session 3e: Trust and Understanding in Explainable AICSCW
DisMouse: Disentangling Information from Mouse Movement DataMouse movement data contain rich information about users, performed tasks, and user interfaces, but separating the respective components remains challenging and unexplored. As a first step to address this challenge, we propose DisMouse – the first method to disentangle user-specific and user-independent information and stochastic variations from mouse movement data. At the core of our method is an autoencoder trained in a semi-supervised fashion, consisting of a self-supervised denoising diffusion process and a supervised contrastive user identification module. Through evaluations on three datasets, we show that DisMouse 1) captures complementary information of mouse input, hence providing an interpretable framework for modelling mouse movements, 2) can be used to produce refined features, thus enabling various applications such as personalised and variable mouse data generation, and 3) generalises across different datasets. Taken together, our results underline the significant potential of disentangled representation learning for explainable, controllable, and generalised mouse behaviour modelling.2024GZGuanhua Zhang et al.Explainable AI (XAI)Computational Methods in HCIUIST
Mouse2Vec: Learning Reusable Semantic Representations of Mouse BehaviourThe mouse is a pervasive input device used for a wide range of interactive applications. However, computational modelling of mouse behaviour typically requires time-consuming design and extraction of handcrafted features, or approaches that are application-specific. We instead propose Mouse2Vec – a novel self-supervised method designed to learn semantic representations of mouse behaviour that are reusable across users and applications. Mouse2Vec uses a Transformer-based encoder-decoder architecture, which is specifically geared for mouse data: During pretraining, the encoder learns an embedding of input mouse trajectories while the decoder reconstructs the input and simultaneously detects mouse click events. We show that the representations learned by our method can identify interpretable mouse behaviour clusters and retrieve similar mouse trajectories. We also demonstrate on three sample downstream tasks that the representations can be practically used to augment mouse data for training supervised methods and serve as an effective feature extractor.2024GZGuanhua Zhang et al.University of StuttgartVisualization Perception & CognitionComputational Methods in HCICHI
SalChartQA: Question-driven Saliency on Information VisualisationsUnderstanding the link between visual attention and users' information needs when visually exploring information visualisations is under-explored due to a lack of large and diverse datasets to facilitate these analyses. To fill this gap we introduce SalChartQA -- a novel crowd-sourced dataset that uses the BubbleView interface to track user attention and a question-answering (QA) paradigm to induce different information needs in users. SalChartQA contains 74,340 answers to 6,000 questions on 3,000 visualisations. Informed by our analyses demonstrating the close correlation between information needs and visual saliency, we propose the first computational method to predict question-driven saliency on visualisations. Our method outperforms state-of-the-art saliency models for several metrics, such as the correlation coefficient and the Kullback-Leibler divergence. These results show the importance of information needs for shaping attentive behaviour and pave the way for new applications, such as task-driven optimisation of visualisations or explainable AI in chart question-answering.2024YWYao Wang et al.University of StuttgartExplainable AI (XAI)Interactive Data VisualizationVisualization Perception & CognitionCHI
SUPREYES: SUPer Resolutin for EYES Using Implicit Neural Representation LearningWe introduce SUPREYES – a novel self-supervised method to increase the spatio-temporal resolution of gaze data recorded using low(er)-resolution eye trackers. Despite continuing advances in eye tracking technology, the vast majority of current eye trackers – particularly mobile ones and those integrated into mobile devices – suffer from low-resolution gaze data, thus fundamentally limiting their practical usefulness. SUPREYES learns a continuous implicit neural representation from low-resolution gaze data to up-sample the gaze data to arbitrary resolutions. We compare our method with commonly used interpolation methods on arbitrary scale super-resolution and demonstrate that SUPREYES outperforms these baselines by a significant margin. We also test on the sample downstream task of gaze-based user identification and show that our method improves the performance of original low-resolution gaze data and outperforms other baselines. These results are promising as they open up a new direction for increasing eye tracking fidelity as well as enabling new gaze-based applications without the need for new eye tracking equipment.2023CJChuhan Jiao et al.Eye Tracking & Gaze InteractionHuman Pose & Activity RecognitionUIST
Usable and Fast Interactive Mental Face ReconstructionWe introduce an end-to-end interactive system for mental face reconstruction - the challenging task of visually reconstructing a face image a person only has in their mind. In contrast to existing methods that suffer from low usability and high mental load, our approach only requires the user to rank images over multiple iterations according to the perceived similarity with their mental image. Based on these rankings, our mental face reconstruction system extracts image features in each iteration, combines them into a joint feature vector, and then uses a generative model to visually reconstruct the mental image. To avoid the need for collecting large amounts of human training data, we further propose a computational user model that can simulate human ranking behaviour using data from an online crowd-sourcing study (N=215). Results from a 12-participant user study show that our method can reconstruct mental images that are visually similar to existing approaches but has significantly higher usability, lower perceived workload, and is 40% faster. In addition, results from a third 22-participant lineup study in which we validated our reconstructions on a face ranking task show a identification rate of 55.3%, which is in line with prior work. These results represent an important step towards new interactive intelligent systems that can robustly and effortlessly reconstruct a user's mental image.2023FSFlorian Strohm et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationExplainable AI (XAI)UIST
Impact of Privacy Protection Methods of Lifelogs on Remembered Memories Lifelogging is traditionally used for memory augmentation. However, recent research shows that users' trust in the completeness and accuracy of lifelogs might skew their memories. Privacy-protection alterations such as body blurring and content deletion are commonly applied to photos to circumvent capturing sensitive information. However, their impact on how users remember memories remain unclear. To this end, we conduct a white-hat memory attack and report on an iterative experiment (N=21) to compare the impact of viewing 1) unaltered lifelogs, 2) blurred lifelogs, and 3) a subset of the lifelogs after deleting private ones, on confidently remembering memories. Findings indicate that all the privacy methods impact memories' quality similarly and that users tend to change their answers in recognition more than recall scenarios. Results also show that users have high confidence in their remembered content across all privacy methods. Our work raises awareness about the mindful designing of technological interventions.2023PEPassant ElAgroudy et al.German Research Centre for Artificial Intelligence (DFKI), LMU MunichPrivacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Designing for Noticeability: The Impact of Visual Importance on Desktop NotificationsDesktop notifications should be noticeable but are also subject to a number of design choices, e.g. concerning their size, placement, or opacity. It is currently unknown, however, how these choices interact with the desktop background and their influence on noticeability. To address this limitation, we introduce a software tool to automatically synthesize realistically looking desktop images for major operating systems and applications. Using these images, we present a user study (N=34) to investigate the noticeability of notifications during a primary task. We are first to show that visual importance of the background at the notification location significantly impacts whether users detect notifications. We analyse the utility of visual importance to compensate for suboptimal design choices with respect to noticeability, e.g. small notification size. Finally, we introduce noticeability maps - 2D maps encoding the predicted noticeability across the desktop and inform designers how to trade-off notification design and noticeability.2022PMPhilipp Müller et al.DFKI GmbHNotification & Interruption ManagementCHI
A Critical Assessment of the Use of SSQ as a Measure of General Discomfort in VR Head-Mounted DisplaysBased on a systematic literature review of more than 300 papers published over the last 10 years, we provide indicators that the simulator sickness questionnaire (SSQ) is extensively used and widely accepted as a general discomfort measure in virtual reality (VR) research – although it actually only accounts for one category of symptoms. This results in important other categories (digital eye strain (DES) and ergonomics) being largely neglected. To contribute to a more comprehensive picture of discomfort in VR head-mounted displays, we further conducted an online study (N=352) on the severity and relevance of all three symptom categories. Most importantly, our results reveal that symptoms of simulator sickness are significantly less severe and of lower prevalence than those of DES and ergonomics. In light of these findings, we critically discuss the current use of SSQ as the only discomfort measure and propose a more comprehensive factor model that also includes DES and ergonomics.2021THTeresa Hirzle et al.Ulm UniversityMotion Sickness & Passenger ExperienceImmersion & Presence ResearchCHI
Quantification of Users' Visual Attention During Everyday Mobile Device InteractionsWe present the first real-world dataset and quantitative evaluation of visual attention of mobile device users in-situ, i.e. while using their devices during everyday routine. Understanding user attention is a core research challenge in mobile HCI but previous approaches relied on usage logs or self-reports that are only proxies and consequently do neither reflect attention completely nor accurately. Our evaluations are based on Everyday Mobile Visual Attention (EMVA) a new 32-participant dataset containing around 472 hours of video snippets recorded over more than two weeks in real life using the front-facing camera as well as associated usage logs, interaction events, and sensor data. Using an eye contact detection method, we are first to quantify the highly dynamic nature of everyday visual attention across users, mobile applications, and usage contexts. We discuss key insights from our analyses that highlight the potential and inform the design of future mobile attentive user interfaces.2020MBMihai Bâce et al.ETH ZürichEye Tracking & Gaze InteractionContext-Aware ComputingCHI
A Design Space for Gaze Interaction on Head-mounted DisplaysAugmented and virtual reality (AR/VR) has entered the mass market and, with it, will soon eye tracking as a core technology for next generation head-mounted displays (HMDs). In contrast to existing gaze interfaces, the 3D nature of AR and VR requires estimating a user's gaze in 3D. While first applications, such as foveated rendering, hint at the compelling potential of combining HMDs and gaze, a systematic analysis is missing. To fill this gap, we present the first design space for gaze interaction on HMDs. Our design space covers human depth perception and technical requirements in two dimensions aiming to identify challenges and opportunities for interaction design. As such, our design space provides a comprehensive overview and serves as an important guideline for researchers and practitioners working on gaze interaction on HMDs. We further demonstrate how our design space is used in practice by presenting two interactive applications: EyeHealth and XRay-Vision.2019THTeresa Hirzle et al.Ulm UniversityEye Tracking & Gaze InteractionSocial & Collaborative VRAR Navigation & Context AwarenessCHI
Evaluation of Appearance-Based Methods and Implications for Gaze-Based ApplicationsAppearance-based gaze estimation methods that only require an off-the-shelf camera have significantly improved but they are still not yet widely used in the human-computer interaction (HCI) community. This is partly because it remains unclear how they perform compared to model-based approaches as well as dominant, special-purpose eye tracking equipment. To address this limitation, we evaluate the performance of state-of-the-art appearance-based gaze estimation for interaction scenarios with and without personal calibration, indoors and outdoors, for different sensing distances, as well as for users with and without glasses. We discuss the obtained findings and their implications for the most important gaze-based applications, namely explicit eye input, attentive user interfaces, gaze-based user modelling, and passive eye monitoring. To democratise the use of appearance-based gaze estimation and interaction in HCI, we finally present OpenGaze (www.opengaze.org), the first software toolkit for appearance-based gaze estimation and interaction.2019XZXucong Zhang et al.Saarland Informatics CampusEye Tracking & Gaze InteractionCHI