Exploring Collaboration Patterns and Strategies in Human-AI Co-creation through the Lens of Agency: A Scoping Review of the Top-tier HCI LiteratureAs Artificial Intelligence (AI) increasingly becomes an active collaborator in co-creation, understanding the distribution and dynamic of agency is paramount. The Human-Computer Interaction (HCI) perspective is crucial for this analysis, as it uniquely reveals the interaction dynamics and specific control mechanisms that dictate how agency manifests in practice. Despite this importance, a systematic synthesis mapping agency configurations and control mechanisms within the HCI/CSCW literature is lacking. Addressing this gap, we reviewed 134 papers from top-tier HCI/CSCW venues (e.g., CHI, UIST, CSCW) over the past 20 years. This review yields four primary contributions: (1) an integrated theoretical framework structuring agency patterns, control mechanisms, and interaction contexts, (2) a comprehensive operational catalog of control mechanisms detailing how agency is implemented; (3) an actionable cross-context map linking agency configurations to diverse co-creative practices; and (4) grounded implications and guidance for future CSCW research and the design of co-creative systems, addressing aspects like trust and ethics.2025SZShuning Zhang et al.Getting Things Done With AICSCW
PrivCAPTCHA: Interactive CAPTCHA to Facilitate Effective Comprehension of APP Privacy PolicyTraditional app privacy policies are often lengthy and non-interactive, leading users to skip them and remain uninformed. To address this, we proposed PrivCAP, a technique to enhance user comprehension by presenting policies in a concise, interactive format. PrivCAP adopted a CAPTCHA-based design, requiring users to interact with clickable chunks of concise policy content, thus reducing physical and cognitive load. A formative study (N=38) demonstrated that participants valued informed consent alongside concerns over data collection and sharing, marking the first such evaluation among Chinese users. This study further found a preference for concise visualizations and interactable formats. PrivCAP, leveraging few-shot prompting on Large Language Models (LLMs), accurately translates privacy policies into clickable, chunked formats optimized for smartphone screens. In an evaluation (N=28), PrivCAP outperformed traditional policy presentations in improving user understanding, reducing cognitive load, and maintaining efficiency, with participants favoring its engaging design and reporting more informed decision-making.2025SZShuning Zhang et al.Tsinghua University, Institute for Network Sciences and CyberspaceVR Medical Training & RehabilitationPrivacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Actual Achieved Gain and Optimal Perceived Gain: Modeling Human Take-over Decisions Towards Automated Vehicles' SuggestionsDriver decision quality in take-overs is critical for effective human-Autonomous Driving System (ADS) collaboration. However, current research lacks detailed analysis of its variations. This paper introduces two metrics--Actual Achieved Gain (AAG) and Optimal Perceived Gain (OPG)--to assess decision quality, with OPG representing optimal decisions and AAG reflecting actual outcomes. Both are calculated as weighted averages of perceived gains and losses, influenced by ADS accuracy. Study 1 (N=315) used a 21-point Thurstone scale to measure perceived gains and losses—key components of AAG and OPG—across typical tasks: route selection, overtaking, and collision avoidance. Studies 2 (N=54) and 3 (N=54) modeled decision quality under varying ADS accuracy and decision time. Results show with sufficient time (>3.5s), AAG converges towards OPG, indicating rational decision-making, while limited time leads to intuitive and deterministic choices. Study 3 also linked AAG-OPG deviations to irrational behaviors. An intervention study (N=8) and a pilot (N=4) employing voice alarms and multi-modal alarms based on these deviations demonstrated AAG's potential to improve decision quality.2025SZHaihua Zhang et al.Tsinghua University, Institute for Network Sciences and CyberspaceAutomated Driving Interface & Takeover DesignHead-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)AI-Assisted Decision-Making & AutomationCHI
Raise Your Eyebrows Higher: Facilitating Emotional Communication in Social Virtual Reality Through Region-Specific Facial Expression ExaggerationWhile exaggerated facial expressions in cartoon avatars can enhance emotional communication in social virtual reality (VR), they risk triggering the uncanny valley effect. Our research reveals that this effect varies significantly across different emotions. In Study 1 (N=30), participants evaluated scaled facial expressions during simulated VR conversations. We found that expression exaggeration had opposing effects: it decreased facial realism for joy, surprise, and disgust due to overly dramatic mouth movements, while enhancing realism for fear, sadness, and anger—emotions that rely on upper facial expressions typically constrained by HMD pressure. Based on these findings, we developed a region-specific facial expression exaggeration strategy that enhances under-expressed upper facial features while maintaining natural lower facial movements. Study 2 (N=20) validated this approach, demonstrating enhanced emotional intensity and contagion for negative emotions while mitigating the uncanny valley effect. Our research provides practical guidelines for optimizing avatar-mediated emotional communication in social VR environments.2025XWXueyang Wang et al.Tsinghua University, Institute for Network Sciences and CyberspaceSocial & Collaborative VRImmersion & Presence ResearchIdentity & Avatars in XRCHI
Evaluating the Privacy Valuation of Personal Data on SmartphonesFan 等人研究智能手机用户对个人数据隐私的价值评估问题。2024LFLihua Fan et al.Privacy Perception & Decision-MakingUbiComp
The EarSAVAS Dataset: Enabling Subject-Aware Vocal Activity Sensing on EarablesZhang 等人构建 EarSAVAS 数据集,支持智能耳穿戴设备进行主体感知的语音活动检测,推动相关算法研究。2024XZXiyuxing Zhang et al.Biosensors & Physiological MonitoringUbiComp
From 2D to 3D: Facilitating Single-Finger Mid-Air Typing on QWERTY Keyboards with Probabilistic Touch ModelingMid-air text entry on virtual keyboards suffers from the lack of tactile feedback, which brings challenges to both tap detection and input prediction. In this paper, we explored the feasibility of single-finger typing on virtual QWERTY keyboards in mid-air. We first conducted a study to examine users' 3D typing behavior on different sizes of virtual keyboards. Results showed that the participants perceived the vertical projection of the lowest point on the keyboard during a tap as the target location and inferring taps based on the intersection between the finger and the keyboard was not applicable. Aiming at this challenge, we derived a novel input prediction algorithm that took the uncertainty in tap detection into a calculation as probability, and performed probabilistic decoding that could tolerate false detection. We analyzed the performance of the algorithm through a full-factorial simulation. Results showed that the SVM-based probabilistic touch detection together with a 2D elastic probabilistic decoding algorithm (elasticity = 2) could achieve the optimal top-5 accuracy of 94.2%. In the evaluation user study, the participants reached a single-finger typing speed of 26.1 WPM with 3.2% uncorrected word-level error rate, which was significantly better than both tap-based and gesture-based baseline techniques. Also, the proposed technique received the highest preference score from the users, proving its usability in real text entry tasks. https://dl.acm.org/doi/10.1145/35808292023XYXin Yi et al.Mid-Air Haptics (Ultrasonic)Hand Gesture RecognitionVoice User Interface (VUI) DesignUbiComp
Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution ImagesA computer vision system using low-resolution image sensors can provide intelligent services (e.g., activity recognition) but preserve unnecessary visual privacy information from the hardware level. However, preserving visual privacy and enabling accurate machine recognition have adversarial needs on image resolution. Modeling the trade-off of privacy preservation and machine recognition performance can guide future privacy-preserving computer vision systems using low-resolution image sensors. In this paper, using the at-home activity of daily livings (ADLs) as the scenario, we first obtained the most important visual privacy features through a user survey. Then we quantified and analyzed the effects of image resolution on human and machine recognition performance in activity recognition and privacy awareness tasks. We also investigated how modern image super-resolution techniques influence these effects. Based on the results, we proposed a method for modeling the trade-off of privacy preservation and activity recognition on low-resolution images.2023YWYuntao Wang et al.Tsinghua UniversityHuman Pose & Activity RecognitionPrivacy Perception & Decision-MakingCHI
Squeez'In: Private Authentication on Smartphones based on Squeezing GesturesIn this paper, we proposed \emph{Squeez'In}, a technique on smartphones that enabled private authentication by holding and squeezing the phone with a unique pattern. We first explored the design space of practical squeezing gestures for authentication by analyzing the participants' self-designed gestures and squeezing behavior. Results showed that varying-length gestures with two levels of touch pressure and duration were the most natural and unambiguous. We then implemented \emph{Squeez'In} on an off-the-shelf capacitive sensing smartphone, and employed an SVM-GBDT model for recognizing gestures and user-specific behavioral patterns, achieving 99.3\% accuracy and 0.93 F1-score when tested on 21 users. A following 14-day study validated the memorability and long-term stability of \proj. During usability evaluation, compared with gesture and pin code, \emph{Squeez'In} achieved significantly faster authentication speed and higher user preference in terms of privacy and security.2023XYXin Yi et al.Tsinghua UniversityForce Feedback & Pseudo-Haptic WeightPasswords & AuthenticationCHI
DEEP: 3D Gaze Pointing in Virtual Reality Leveraging Eyelid MovementGaze-based target suffers from low input precision and target occlusion. In this paper, we explored to leverage the continuous eyelid movement to support high-efficient and occlusion-robust dwell-based gaze pointing in virtual reality. We first conducted two user studies to examine the users' eyelid movement pattern both in unintentional and intentional conditions. The results proved the feasibility of leveraging intentional eyelid movement that was distinguishable with natural movements for input. We also tested the participants' dwelling pattern for targets with different sizes and locations. Based on these results, we propose DEEP, a novel technique that enables the users to see through occlusions by controlling the aperture angle of their eyelids and dwell to select the targets with the help of a probabilistic input prediction model. Evaluation results showed that DEEP with dynamic depth and location selection incorporation significantly outperformed its static variants, as well as a naive dwelling baseline technique. Even for 100% occluded targets, it could achieve an average selection speed of 2.5s with an error rate of 2.3%.2022XYXin Yi et al.Eye Tracking & Gaze InteractionImmersion & Presence ResearchUIST
SemanticAdapt: Optimization-based Adaptation of Mixed Reality Layouts Leveraging Virtual-Physical Semantic ConnectionsWe present an optimization-based approach that automatically adapts Mixed Reality (MR) interfaces to different physical environments. Current MR layouts, including the position and scale of virtual interface elements, need to be manually adapted by users whenever they move between environments, and whenever they switch tasks. This process is tedious and time consuming, and arguably needs to be automated for MR systems to be beneficial for end users. We contribute an approach that formulates this challenge as a combinatorial optimization problem and automatically decides the placement of virtual interface elements in new environments. To achieve this, we exploit the semantic association between the virtual interface elements and physical objects in an environment. Our optimization furthermore considers the utility of elements for users' current task, layout factors, and spatio-temporal consistency to previous layouts. All those factors are combined in a single linear program, which is used to adapt the layout of MR interfaces in real time. We demonstrate a set of application scenarios, showcasing the versatility and applicability of our approach. Finally, we show that compared to a naive adaptive baseline approach that does not take semantic associations into account, our approach decreased the number of manual interface adaptations by 33\%.2021YCYifei Cheng et al.AR Navigation & Context AwarenessMixed Reality WorkspacesContext-Aware ComputingUIST
Facilitating Text Entry on Smartphones with QWERTY Keyboard for Users with Parkinson’s DiseaseQWERTY is the primary smartphone text input keyboard configuration. However, insertion and substitution errors caused by hand tremors, often experienced by users with Parkinson's disease, can severely affect typing efficiency and user experience. In this paper, we investigated Parkinson's users' typing behavior on smartphones. In particular, we identified and compared the typing characteristics generated by users with and without Parkinson's symptoms. We then proposed an elastic probabilistic model for input prediction. By incorporating both spatial and temporal features, this model generalized the classical statistical decoding algorithm to correct insertion, substitution and omission errors, while maintaining direct physical interpretation. User study results confirmed that the proposed algorithm outperformed baseline techniques: users reached 22.8 WPM typing speed with a significantly lower error rate and higher user-perceived performance and preference. We concluded that our method could effectively improve the text entry experience on smartphones for users with Parkinson's disease.2021YWYuntao Wang et al.Tsinghua University, University of WashingtonMotor Impairment Assistive Input TechnologiesShape-Changing Materials & 4D PrintingCHI
EarBuddy: Enabling On-Face Interaction via Wireless EarbudsPast research regarding on-body interaction typically requires custom sensors, limiting their scalability and generalizability. We propose EarBuddy, a real-time system that leverages the microphone in commercial wireless earbuds to detect tapping and sliding gestures near the face and ears. We develop a design space to generate 27 valid gestures and conducted a user study (N=16) to select the eight gestures that were optimal for both human preference and microphone detectability. We collected a dataset on those eight gestures (N=20) and trained deep learning models for gesture detection and classification. Our optimized classifier achieved an accuracy of 95.3%. Finally, we conducted a user study (N=12) to evaluate EarBuddy's usability. Our results show that EarBuddy can facilitate novel interaction and that users feel very positively about the system. EarBuddy provides a new eyes-free, socially acceptable input method that is compatible with commercial wireless earbuds and has the potential for scalability and generalizability2020XXXuhai Xu et al.University of Washington & Tsinghua UniversityHaptic WearablesFoot & Wrist InteractionCHI
PalmBoard: Leveraging Implicit Touch Pressure in Statistical Decoding for Indirect Text EntryWe investigated how to incorporate implicit touch pressure, finger pressure applied to a touch surface during typing, to improve text entry performance via statistical decoding. We focused on one-handed touch-typing on indirect interface as an example scenario. We first collected typing data on a pressure-sensitive touchpad, and analyzed users' typing behavior such as touch point distribution, key-to-finger mappings, and pressure images. Our investigation revealed distinct pressure patterns for different keys. Based on the findings, we performed a series of simulations to iteratively optimize the statistical decoding algorithm. Our investigation led to a Markov-Bayesian decoder incorporating pressure image data into decoding. It improved the top-1 accuracy from 53% to 74% over a naive Bayesian decoder. We then implemented PalmBoard, a text entry method that implemented the Markov-Bayesian decoder and effectively supported one-handed touch-typing on indirect interfaces. A user study showed participants achieved an average speed of 32.8 WPM with 0.6% error rate. Expert typists could achieve 40.2 WPM with 30 minutes of practice. Overall, our investigation showed that incorporating implicit touch pressure is effective in improving text entry decoding.2020XYXin Yi et al.Tsinghua University & Key Laboratory of Pervasive Computing, Ministry of EducationForce Feedback & Pseudo-Haptic WeightCHI
VIPBoard: Improving Screen-Reader Keyboard for Visually Impaired People with Character-Level Auto CorrectionModern touchscreen keyboards are all powered by the word-level auto-correction ability to handle input errors. Unfortunately, visually impaired users are deprived of such benefit because a screen-reader keyboard offers only character-level input and provides no correction ability. In this paper, we present VIPBoard, a smart keyboard for visually impaired people, which aims at improving the underlying keyboard algorithm without altering the current input interaction. Upon each tap, VIPBoard predicts the probability of each key considering both touch location and language model, and reads the most likely key, which saves the calibration time when the touchdown point misses the target key. Meanwhile, the keyboard layout automatically scales according to users' touch point location, which enables them to select other keys easily. A user study shows that compared with the current keyboard technique, VIPBoard can reduce touch error rate by 63.0% and increase text entry speed by 12.6%.2019WSWeinan Shi et al.Tsinghua University & Ministry of EducationVoice AccessibilityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
VirtualGrasp: Leveraging Experience of Interacting with Physical Objects to Facilitate Digital Object RetrievalWe propose VirtualGrasp, a novel gestural approach to retrieve virtual objects in virtual reality. Using VirtualGrasp, a user retrieves an object by performing a barehanded gesture as if grasping its physical counterpart. The object-gesture mapping under this metaphor is of high intuitiveness, which enables users to easily discover, remember the gestures to retrieve the objects. We conducted three user studies to demonstrate the feasibility and effectiveness of the approach. Progressively, we investigated the consensus of the object-gesture mapping across users, the expressivity of grasping gestures, and the learnability and performance of the approach. Results showed that users achieved high agreement on the mapping, with an average agreement score [35] of 0.68 (SD=0.27). Without exposure to the gestures, users successfully retrieved 76% objects with VirtualGrasp. A week after learning the mapping, they could recall the gestures for 93% objects.2018YYYukang Yan et al.Tsinghua UniversityHand Gesture Recognition3D Modeling & AnimationCHI