PrivWeb: Unobtrusive and Content-aware Privacy Protection For Web AgentsWhile web agents gained popularity by automating web interactions, their requirement for interface access introduces privacy risks that are understudied, particularly from users' perspective. Through a formative study (N=15), we found that users frequently misunderstand agent data practices, and desire unobtrusive, transparent data management. To achieve this, we developed PrivWeb, a trusted add-on on web agents that utilizes a localized LLM to anonymize private information on interfaces based on user preferences. It employs a tiered delegation to balance automation and intrusiveness, using ambient notifications for low-sensitivity data and enforces a mandatory pause for high-sensitivity data. The user study (N=14) across travel, information retrieval, shopping, and entertainment tasks showed that PrivWeb enhances perceived privacy protection and trust compared to transparency-only baselines, without increasing cognitive load. Crucially, we identified user delegation strategies: they prefer to manually execute sensitive steps for high-sensitivity data, while granting agent access to low-sensitivity data.2026SZShuning Zhang et al.Tsinghua UniversityPrivacy by Design & User ControlPrivacy Perception & Decision-MakingHuman-LLM CollaborationCHI
VisGuardian: A Lightweight Group-based Visual Privacy Control Technique For Smart Glasses in Home EnvironmentsAlways-on sensing of AI applications on AR glasses makes traditional permission techniques inefficient for context-dependent private visual data within home environments. Home presents a challenging privacy context due to massive sensitive objects and the intimate nature of daily routines. We propose VisGuardian, a fine-grained content-based visual permission technique for AR glasses. VisGuardian features a group-based control mechanism that enables users to efficiently manage permissions for multiple private objects. VisGuardian detects objects using YOLO and adopts a pre-classified schema to group them. By selecting a single object, users can obscure groups of related objects based on criteria including privacy sensitivity, object category, or spatial proximity. A technical evaluation shows VisGuardian achieves mAP50 of 0.6704 with only 14.0 ms latency and a 1.7% increase in battery consumption per hour. Furthermore, a user study (N=24) comparing VisGuardian to slider-based and object-based baselines found it to be significantly faster for setting permissions and was preferred by users for its efficiency, effectiveness, and ease of use.2026SZShuning Zhang et al.Tsinghua UniversitySmart Home Privacy & SecurityPrivacy by Design & User ControlAR Navigation & Context AwarenessCHI
"Privacy across the boundary": Examining Perceived Privacy Risk Across Data Transmission and Sharing Ranges of Smart Home Personal AssistantsAs Smart Home Personal Assistants (SPAs) evolve into social agents, understanding user privacy necessitates interpersonal communication frameworks, such as Privacy Boundary Theory (PBT). To ground our investigation, our three-phase preliminary study (1) identified transmission and sharing ranges as key boundary-related risk factors, (2) categorized relevant SPA functions and data types, and (3) analyzed commercial practices, revealing widespread data sharing and non-transparent safeguards. A subsequent mixed-methods study (N=412 survey, N=40 interviews among the survey participants) assessed users' perceived privacy risks across data types, transmission ranges and sharing ranges. Results demonstrate a significant, non-linear escalation in perceived risk when data crosses two critical boundaries: the `public network' (transmission) and `third parties' (sharing). This boundary effect holds across data types and demographics. Furthermore, risk perception is modulated by data attributes, and contextual privacy calculus. Conversely, anonymization show limited efficacy especially for third-party sharing, a finding attributed to user distrust. These findings empirically ground PBT in SPA context and inform design of boundary-aware privacy protection.2026SZShuning Zhang et al.Tsinghua UniversityPrivacy by Design & User ControlSmart Home Privacy & SecurityCHI
Characterizing Unintended Consequences of GUI Agents For Web BrowsingThe integration of LLMs into GUI agents promises to revolutionize web browsing automation, yet the practical user experience remains challenging. This paper systematically characterizes user-reported issues with GUI agents by focusing on three dimensions: phenomena, influences, and user-centric mitigation. We adopted a two-phase method combining social media analysis (N=221 posts) and semi-structured interviews (N=21). Our findings reveal a taxonomy of complaints unique to GUI agents, including deficits in grounding abstract intent into concrete interface affordances, the inability to adapt to dynamic visual states, and the execution of erroneous actions. These lead to influences distinct from text-based hallucinations, ranging from task abandonment to security risks like uncontrolled file system access. In response, users are forced to employ ad-hoc mitigation strategies, including ecological sandboxing, and cursor shadowing to correct GUI agents behaviors. We contribute: (1) a comprehensive characterization of complaints specific to GUI agents interaction, (2) an analysis of how these phenomena degrade interaction integrity, and (3) design implications for creating consequence-aware agents.2026SZShuning Zhang et al.Tsinghua UniversityHuman-LLM CollaborationExplainable AI (XAI)Privacy by Design & User ControlCHI
Collab: Fostering Critical Identification of Deepfake Videos on Social Media via Synergistic AnnotationIdentifying deepfake videos on social media platforms is challenged by dynamic spatio-temporal artifacts and inadequate user tools. This hinders both critical viewing by users and scalable moderation on platforms. Here, we present Collab, a web plugin enabling users to collaboratively annotate deepfake videos. Collab integrates three key components: (i) an intuitive interface for spatio-temporal labeling where users provide confidence scores and rationales, facilitating detailed input even from non-experts, (ii) a novel confidence-weighted spatio-temporal Intersection-over-Union (IoU) algorithm to aggregate diverse user annotations into accurate aggregations, and (iii) a hierarchical demonstration strategy presenting aggregated results to guide attention toward contentious regions and foster critical evaluation. A seven-day online study (N=90), where participants annotated suspicious videos when viewing an online experimental platforms, compared Collab against two conditions without aggregation or demonstration respectively. Collab significantly improved identification accuracy and enhanced reflection compared to non-demonstration condition, while outperforming non-aggregation condition for its novelty and effectiveness.2026SZShuning Zhang et al.Tsinghua UniversityDeepfake & Synthetic Media DetectionContent Moderation & Platform GovernanceMisinformation & Fact-CheckingCHI
Request a Note: How the Request Function Shapes X's Community Notes SystemX's Community Notes is a crowdsourced fact-checking system. To improve its scalability, X introduced ``Request Community Note'' feature, enabling users to solicit fact-checks from contributors on specific posts. Yet, its implications for the system---what gets checked, by whom, and with what quality---remain unclear. Using 98,685 requested posts and their associated notes, we evaluate how requests shape the Community Notes system. We find that requested posts with higher GPT-estimated misleadingness and from authors with greater misinformation exposure are more likely to receive notes. Conversely, requested political posts (vs. non-political) are less likely to receive notes. We also observe partisan asymmetries: posts from Republicans are more likely to receive notes than those from Democrats. Although only 12% of requested posts receive request-fostered notes from top contributors, these notes are rated as more helpful and less polarized than others, partly reflecting top contributors' selective fact-checking of misleading posts. Our findings highlight both the limitations and promise of requests for scaling high-quality community-based fact-checking.2026YCYuwei Chuai et al.University of LuxembourgContent Moderation & Platform GovernanceMisinformation & Fact-CheckingVolunteer Coordination & Crowdsourced Disaster ReliefCHI
A Scoping Review and Guidelines on Privacy Policy's Visualization from an HCI PerspectivePrivacy Policies are a cornerstone of informed consent, yet a persistent gap exists between their legal intent and practical efficacy. Despite decades of research proposing various visualizations, user comprehension remains low, and designs rarely see widespread adoption. To understand this landscape and chart a path forward, we synthesized 65 top-tier papers using a framework adapted from user-centered design lifecycles. Our analysis presented four findings of the field's evolution: (1) trade-off between information load and decision efficacy, which shows a shift from augmenting disclosures to cognitive load management, (2) co-evolutionary dynamic of design and automation, revealing that designs such as context-awareness drove automation needs, while LLM breakthroughs enable the semantic interpretation required to realize those designs, (3) tension between generality and specificity, highlighting the divergence between standardized solutions and the increasing necessity for specialized interaction in IoT and immersive environments, and (4) balancing stakeholder opinions, where visualization efficacy is constrained by the interplay of regulatory mandates, developer capabilities and provider incentives.2026SZShuning Zhang et al.Tsinghua UniversityPrivacy Perception & Decision-MakingPrivacy by Design & User ControlExplainable AI (XAI)CHI
Exploring Collaboration Patterns and Strategies in Human-AI Co-creation through the Lens of Agency: A Scoping Review of the Top-tier HCI LiteratureAs Artificial Intelligence (AI) increasingly becomes an active collaborator in co-creation, understanding the distribution and dynamic of agency is paramount. The Human-Computer Interaction (HCI) perspective is crucial for this analysis, as it uniquely reveals the interaction dynamics and specific control mechanisms that dictate how agency manifests in practice. Despite this importance, a systematic synthesis mapping agency configurations and control mechanisms within the HCI/CSCW literature is lacking. Addressing this gap, we reviewed 134 papers from top-tier HCI/CSCW venues (e.g., CHI, UIST, CSCW) over the past 20 years. This review yields four primary contributions: (1) an integrated theoretical framework structuring agency patterns, control mechanisms, and interaction contexts, (2) a comprehensive operational catalog of control mechanisms detailing how agency is implemented; (3) an actionable cross-context map linking agency configurations to diverse co-creative practices; and (4) grounded implications and guidance for future CSCW research and the design of co-creative systems, addressing aspects like trust and ethics.2025SZShuning Zhang et al.Getting Things Done With AICSCW
PrivCAPTCHA: Interactive CAPTCHA to Facilitate Effective Comprehension of APP Privacy PolicyTraditional app privacy policies are often lengthy and non-interactive, leading users to skip them and remain uninformed. To address this, we proposed PrivCAP, a technique to enhance user comprehension by presenting policies in a concise, interactive format. PrivCAP adopted a CAPTCHA-based design, requiring users to interact with clickable chunks of concise policy content, thus reducing physical and cognitive load. A formative study (N=38) demonstrated that participants valued informed consent alongside concerns over data collection and sharing, marking the first such evaluation among Chinese users. This study further found a preference for concise visualizations and interactable formats. PrivCAP, leveraging few-shot prompting on Large Language Models (LLMs), accurately translates privacy policies into clickable, chunked formats optimized for smartphone screens. In an evaluation (N=28), PrivCAP outperformed traditional policy presentations in improving user understanding, reducing cognitive load, and maintaining efficiency, with participants favoring its engaging design and reporting more informed decision-making.2025SZShuning Zhang et al.Tsinghua University, Institute for Network Sciences and CyberspaceVR Medical Training & RehabilitationPrivacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Actual Achieved Gain and Optimal Perceived Gain: Modeling Human Take-over Decisions Towards Automated Vehicles' SuggestionsDriver decision quality in take-overs is critical for effective human-Autonomous Driving System (ADS) collaboration. However, current research lacks detailed analysis of its variations. This paper introduces two metrics--Actual Achieved Gain (AAG) and Optimal Perceived Gain (OPG)--to assess decision quality, with OPG representing optimal decisions and AAG reflecting actual outcomes. Both are calculated as weighted averages of perceived gains and losses, influenced by ADS accuracy. Study 1 (N=315) used a 21-point Thurstone scale to measure perceived gains and losses—key components of AAG and OPG—across typical tasks: route selection, overtaking, and collision avoidance. Studies 2 (N=54) and 3 (N=54) modeled decision quality under varying ADS accuracy and decision time. Results show with sufficient time (>3.5s), AAG converges towards OPG, indicating rational decision-making, while limited time leads to intuitive and deterministic choices. Study 3 also linked AAG-OPG deviations to irrational behaviors. An intervention study (N=8) and a pilot (N=4) employing voice alarms and multi-modal alarms based on these deviations demonstrated AAG's potential to improve decision quality.2025SZShuning Zhang et al.Tsinghua University, Institute for Network Sciences and CyberspaceAutomated Driving Interface & Takeover DesignHead-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)AI-Assisted Decision-Making & AutomationCHI
Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System DesignThe recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computer interaction, by altering how intelligent systems perceive, understand, and respond to contextual information (i.e., context awareness). Despite such advancements, there is a significant gap in comprehensive reviews examining these advances, especially from a multimodal data perspective, which is crucial for refining system design. This paper addresses a key aspect of this gap by conducting a systematic survey of data modality-driven Vision-based Multimodal Interfaces (VMIs). VMIs are essential for integrating multimodal data, enabling more precise interpretation of user intentions and complex interactions across physical and digital environments. Unlike previous task- or scenario-driven surveys, this study highlights the critical role of the visual modality in processing contextual information and facilitating multimodal interaction. Adopting a design framework moving from the whole to the details and back, it classifies VMIs across dimensions, providing insights for developing effective, context-aware systems.2025YHYongquan 'Owen' Hu et al.University of New South WalesContext-Aware ComputingUbiquitous ComputingCHI
ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsBy combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user’s multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease. ReactGenie translates multimodal user commands into NLPL (Natural Language Programming Language), a programming language we created, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed NLPL and composes primitives in the computational model to implement complex user commands. As a result, ReactGenie allows easy implementation and unprecedented richness in commands for end-users of multimodal apps. Our evaluation showed that 12 developers can learn and build a non-trivial ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end-users can complete tasks faster and with less task load using ReactGenie apps.2024JYJackie (Junrui) Yang et al.Stanford UniversityVoice User Interface (VUI) DesignGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
Squeez'In: Private Authentication on Smartphones based on Squeezing GesturesIn this paper, we proposed \emph{Squeez'In}, a technique on smartphones that enabled private authentication by holding and squeezing the phone with a unique pattern. We first explored the design space of practical squeezing gestures for authentication by analyzing the participants' self-designed gestures and squeezing behavior. Results showed that varying-length gestures with two levels of touch pressure and duration were the most natural and unambiguous. We then implemented \emph{Squeez'In} on an off-the-shelf capacitive sensing smartphone, and employed an SVM-GBDT model for recognizing gestures and user-specific behavioral patterns, achieving 99.3\% accuracy and 0.93 F1-score when tested on 21 users. A following 14-day study validated the memorability and long-term stability of \proj. During usability evaluation, compared with gesture and pin code, \emph{Squeez'In} achieved significantly faster authentication speed and higher user preference in terms of privacy and security.2023XYXin Yi et al.Tsinghua UniversityForce Feedback & Pseudo-Haptic WeightPasswords & AuthenticationCHI