The Power of Speech in the Wild: Discriminative Power of Daily Voice Diaries in Understanding Auditory Verbal Hallucinations Using Deep Learning"Mobile phone sensing is increasingly being used in clinical research studies to assess a variety of mental health conditions (e.g., depression, psychosis). However, in-the-wild speech analysis -- beyond conversation detecting -- is a missing component of these mobile sensing platforms and studies. We augment an existing mobile sensing platform with a daily voice diary to assess and predict the severity of auditory verbal hallucinations (i.e., hearing sounds or voices in the absence of any speaker), a condition that affects people with and without psychiatric or neurological diagnoses. We collect 4809 audio diaries from N=384 subjects over a one-month-long study period. We investigate the performance of various deep-learning architectures using different combinations of sensor behavioral streams (e.g., voice, sleep, mobility, phone usage, etc.) and show the discriminative power of solely using audio recordings of speech as well as automatically generated transcripts of the recordings; specifically, our deep learning model achieves a weighted f-1 score of 0.78 solely from daily voice diaries. Our results surprisingly indicate that a simple periodic voice diary combined with deep learning is sufficient enough of a signal to assess complex psychiatric symptoms (e.g., auditory verbal hallucinations) collected from people in the wild as they go about their daily lives." https://doi.org/10.1145/36108902023WWWeichen Wang et al.Brain-Computer Interface (BCI) & NeurofeedbackMental Health Apps & Online Support CommunitiesUbiComp
SmartASL: “Point-of-Care” Comprehensive ASL Interpreter Using WearablesSign language builds up an important bridge between the d/Deaf and hard-of-hearing (DHH) and hearing people. Regrettably, most hearing people face challenges in comprehending sign language, necessitating sign language translation. However, state-of-the-art wearable-based techniques mainly concentrate on recognizing manual markers (e.g., hand gestures), while frequently overlooking non-manual markers, such as negative head shaking, question markers, and mouthing. This oversight results in the loss of substantial grammatical and semantic information in sign language. To address this limitation, we introduce SmartASL, a novel proof-of-concept system that can 1) recognize both manual and non-manual markers simultaneously using a combination of earbuds and a wrist-worn IMU, and 2) translate the recognized American Sign Language (ASL) glosses into spoken language. Our experiments demonstrate the SmartASL system's significant potential to accurately recognize the manual and non-manual markers in ASL, effectively bridging the communication gaps between ASL signers and hearing people using commercially available devices. https://dl.acm.org/doi/10.1145/35962552023YJYINCHENG JIN et al.Foot & Wrist InteractionDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)Augmentative & Alternative Communication (AAC)UbiComp
GLOBEM: Cross-Dataset Generalization of Longitudinal Human Behavior ModelingThere is a growing body of research revealing that longitudinal passive sensing data from smartphones and wearable devices can capture daily behavior signals for human behavior modeling, such as depression detection. Most prior studies build and evaluate machine learning models using data collected from a single population. However, to ensure that a behavior model can work for a larger group of users, its generalizability needs to be verified on multiple datasets from different populations. We present the first work evaluating cross-dataset generalizability of longitudinal behavior models, using depression detection as an application. We collect multiple longitudinal passive mobile sensing datasets with over 500 users from two institutes over a two-year span, leading to four institute-year datasets. Using the datasets, we closely re-implement and evaluated nine prior depression detection algorithms. Our experiment reveals the lack of model generalizability of these methods. We also implement eight recently popular domain generalization algorithms from the machine learning community. Our results indicate that these methods also do not generalize well on our datasets, with barely any advantage over the naive baseline of guessing the majority. We then present two new algorithms with better generalizability. Our new algorithm, Reorder, significantly and consistently outperforms existing methods on most cross-dataset generalization setups. However, the overall advantage is incremental and still has great room for improvement. Our analysis reveals that the individual differences (both within and between populations) may play the most important role in the cross-dataset generalization challenge. Finally, we provide an open-source benchmark platform GLOBEM- short for Generalization of Longitudinal BEhavior Modeling - to consolidate all 19 algorithms. GLOBEM can support researchers in using, developing, and evaluating different longitudinal behavior modeling methods. We call for researchers' attention to model generalizability evaluation for future longitudinal human behavior modeling studies. https://dl.acm.org/doi/10.1145/35694852023XXXuhai Xu et al.Human Pose & Activity RecognitionMental Health Apps & Online Support CommunitiesBiosensors & Physiological MonitoringUbiComp
Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution ImagesA computer vision system using low-resolution image sensors can provide intelligent services (e.g., activity recognition) but preserve unnecessary visual privacy information from the hardware level. However, preserving visual privacy and enabling accurate machine recognition have adversarial needs on image resolution. Modeling the trade-off of privacy preservation and machine recognition performance can guide future privacy-preserving computer vision systems using low-resolution image sensors. In this paper, using the at-home activity of daily livings (ADLs) as the scenario, we first obtained the most important visual privacy features through a user survey. Then we quantified and analyzed the effects of image resolution on human and machine recognition performance in activity recognition and privacy awareness tasks. We also investigated how modern image super-resolution techniques influence these effects. Based on the results, we proposed a method for modeling the trade-off of privacy preservation and activity recognition on low-resolution images.2023YWYuntao Wang et al.Tsinghua UniversityHuman Pose & Activity RecognitionPrivacy Perception & Decision-MakingCHI
Reviewing and Reflecting Smart Home Research from the Human-Centered PerspectiveWhile there has been rapid growth in smart home research from a technical perspective – focusing on home automation, devices, software, and protocols – few review papers examine the human-centered perspective. A human-centered focus is crucial for achieving the goals of providing natural, convenient, comfortable, friendly, and safe user experiences in the smart home. To understand key innovations in human-centered smart home research, we analyzed keyword changes over time via 19,091 papers from 2000 to 2022, then selected 55 papers from high-impact venues in the last five years, and summarized them through a combination of qualitative and quantitative methods. Our analysis revealed five research trends with unique characteristics and interdependence. Drawing on this review, we elaborate on the future of smart home design research with respect to multidisciplinary development, stakeholder involvement, and the shift of design implications.2023YYYuan Yao et al.School of Architecture and Design, Academy of Arts & DesignUniversal & Inclusive DesignSmart Home Interaction DesignCHI
XAIR: A Framework of Explainable AI in Augmented RealityExplainable AI (XAI) has established itself as an important component of AI-driven interactive systems. With Augmented Reality (AR) becoming more integrated in daily lives, the role of XAI also becomes essential in AR because end-users will frequently interact with intelligent services. However, it is unclear how to design effective XAI experiences for AR. We propose XAIR, a design framework that addresses when, what, and how to provide explanations of AI output in AR. The framework was based on a multi-disciplinary literature review of XAI and HCI research, a large-scale survey probing 500+ end-users’ preferences for AR-based explanations, and three workshops with 12 experts collecting their insights about XAI design in AR. XAIR's utility and effectiveness was verified via a study with 10 designers and another study with 12 end-users. XAIR can provide guidelines for designers, inspiring them to identify new design opportunities and achieve effective XAI designs in AR.2023XXXuhai Xu et al.Reality Labs Research, University of WashingtonAR Navigation & Context AwarenessExplainable AI (XAI)CHI
TypeOut: Leveraging Just-in-Time Self-Affirmation for Smartphone Overuse ReductionSmartphone overuse is related to a variety of issues such as lack of sleep and anxiety. We explore the application of Self-Affirmation Theory on smartphone overuse intervention in a just-in-time manner. We present \projectname{}, a just-in-time intervention technique that integrates two components: an in-situ typing-based unlock process to improve user engagement, and self-affirmation-based typing content to enhance effectiveness. We hypothesize that the integration of typing and self-affirmation content can better reduce smartphone overuse. We conducted a 10-week within-subject field experiment (N=54) and compared \projectname{} against two baselines: one only showing the self-affirmation content (a common notification-based intervention), and one only requiring typing non-semantic content (a state-of-the-art method). \projectname{} reduces app usage by over 50\%, and both app opening frequency and usage duration by over 25\%, all significantly outperforming baselines. \projectname{} can potentially be used in other domains where an intervention may benefit from integrating self-affirmation exercises with an engaging just-in-time mechanism.2022XXXuhai Xu et al.University of WashingtonMental Health Apps & Online Support CommunitiesNotification & Interruption ManagementCHI
ReflecTrack: Enabling 3D Acoustic Position Tracking Using Commodity Dual-Microphone Smartphones3D position tracking on smartphones has the potential to unlock a variety of novel applications, but has not been made widely available due to limitations in smartphone sensors. In this paper, we propose ReflecTrack, a novel 3D acoustic position tracking method for commodity dual-microphone smartphones. A ubiquitous speaker (e.g., smartwatch or earbud) generates inaudible Frequency Modulated Continuous Wave (FMCW) acoustic signals that are picked up by both smartphone microphones. To enable 3D tracking with two microphones, we introduce a reflective surface that can be easily found in everyday objects near the smartphone. Thus, the microphones can receive sound from the speaker and echoes from the surface for FMCW-based acoustic ranging. To simultaneously estimate the distances from the direct and reflective paths, we propose the echo-aware FMCW technique with a new signal pattern and target detection process. Our user study shows that ReflecTrack achieves a median error of 28.4 mm in the 60cm*60cm*60cm space and 22.1 mm in the 30cm*30cm*30cm space for 3D positioning. We demonstrate the easy accessibility of ReflecTrack using everyday surfaces and objects with several typical applications of 3D position tracking, including 3D input for smartphones, fine-grained gesture recognition, and motion tracking in smartphone-based VR systems.2021YZYuzhou Zhuang et al.Full-Body Interaction & Embodied InputBiosensors & Physiological MonitoringUIST
Understanding the Design Space of Mouth MicrogesturesAs wearable devices move toward the face (i.e. smart earbuds, glasses), there is an increasing need to facilitate intuitive interactions with these devices. Current sensing techniques can already detect many mouth-based gestures; however, users’ preferences of these gestures are not fully understood. In this paper, we investigate the design space and usability of mouth-based microgestures. We first conducted brainstorming sessions (N=16) and compiled an extensive set of 86 user-defined gestures. Then, with an online survey (N=50), we assessed the physical and mental demand of our gesture set and identified a subset of 14 gestures that can be performed easily and naturally. Finally, we conducted a remote Wizard-of-Oz usability study (N=11) mapping gestures to various daily smartphone operations under a sitting and walking context. From these studies, we develop a taxonomy for mouth gestures, finalize a practical gesture set for common applications, and provide design guidelines for future mouth-based gesture interactions.2021VCVictor Chen et al.Haptic WearablesHand Gesture RecognitionDIS
HulaMove: Using Commodity IMU for Waist InteractionWe present HulaMove, a novel interaction technique that leverages the movement of the waist as a new eyes-free and hands-free input method for both the physical world and the virtual world. We first conducted a user study (N=12) to understand users’ ability to control their waist. We found that users could easily discriminate eight shifting directions and two rotating orientations, and quickly confirm actions by returning to the original position (quick return). We developed a design space with eight gestures for waist interaction based on the results and implemented an IMU-based real-time system. Using a hierarchical machine learning model, our system could recognize waist gestures at an accuracy of 97.5%. Finally, we conducted a second user study (N=12) for usability testing in both real-world scenarios and virtual reality settings. Our usability study indicated that HulaMove significantly reduced interaction time by 41.8% compared to a touch screen method, and greatly improved users’ sense of presence in the virtual world. This novel technique provides an additional input method when users’ eyes or hands are busy, accelerates users’ daily operations, and augments their immersive experience in the virtual world.2021XXXuhai Xu et al.University of WashingtonFull-Body Interaction & Embodied InputImmersion & Presence ResearchCHI
Voicemoji: Emoji Entry Using Voice for Visually Impaired PeopleKeyboard-based emoji entry can be challenging for people with visual impairments: users have to sequentially navigate emoji lists using screen readers to find their desired emojis, which is a slow and tedious process. In this work, we explore the design and benefits of emoji entry with speech input, a popular text entry method among people with visual impairments. After conducting interviews to understand blind or low vision (BLV) users’ current emoji input experiences, we developed Voicemoji, which (1) outputs relevant emojis in response to voice commands, and (2) provides context-sensitive emoji suggestions through speech output. We also conducted a multi-stage evaluation study with six BLV participants from the United States and six BLV participants from China, finding that Voicemoji significantly reduced entry time by 91.2% and was preferred by all participants over the Apple iOS keyboard. Based on our findings, we present Voicemoji as a feasible solution for voice-based emoji entry.2021MZMingrui Ray Zhang et al.University of WashingtonIntelligent Voice Assistants (Alexa, Siri, etc.)Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
LightWrite: Teach Handwriting to The Visually Impaired with A SmartphoneLearning to write is challenging for blind and low vision (BLV) people because of the lack of visual feedback. Regardless of the drastic advancement of digital technology, handwriting is still an essential part of daily life. Although tools designed for teaching BLV to write exist, many are expensive and require the help of sighted teachers. We propose LightWrite, a low-cost, easy-to-access smartphone application that uses voice-based descriptive instruction and feedback to teach BLV users to write English lowercase letters and Arabian digits in a specifically designed font. A two-stage study with 15 BLV users with little prior writing knowledge shows that LightWrite can successfully teach users to learn handwriting characters in an average of 1.09 minutes for each letter. After initial training and 20-minute daily practice for 5 days, participants were able to write an average of 19.9 out of 26 letters that are recognizable by sighted raters.2021ZWZihan Wu et al.Tsinghua University, University of MichiganVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
"Learning for the Rise of China": Exploring Uses and Gratifications of State-Owned Online PlatformOn January 1, 2019, the Chinese government launched the online platform XueXi QiangGuo, which translates into "Learning for the Rise of China." Within two months, XueXi became the top-downloaded item of the month on Apple's App Store in China. In response, we conducted semi-structured interviews with 28 active XueXi users in China to investigate their uses and gratifications of this state-owned online platform. Our results reveal seven key motivations: compliance, self-status seeking, general information seeking, job support, entertainment, patriotism, and learning. This state-owned platform introduced a new model for official information dissemination and political communication through direct surveillance and monitoring, leveraging and fostering emotional attachment, and offering heterogeneous apolitical content. We discuss the intended and unintended ramifications of these components, highlighting the importance of future CSCW research to critically engage with pluralist political narratives situated in varied societies, especially those outside the reach of Western democracy.2020ALJiaan Lu et al.Civic Engagement and PoliticsCSCW