PromptCharm: Text-to-Image Generation through Multi-modal Prompting and RefinementThe recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. Crafting text prompts that align with the model's interpretation and the user's intent thus becomes crucial. However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts. To address these challenges, we propose PromptCharm, a mixed-initiative system that facilitates text-to-image creation through multi-modal prompt engineering and refinement. To assist novice users in prompting, PromptCharm first automatically refines and optimizes the user's initial prompt. Furthermore, PromptCharm supports the user in exploring and selecting different image styles within a large database. To assist users in effectively refining their prompts and images, PromptCharm renders model explanations by visualizing the model's attention values. If the user notices any unsatisfactory areas in the generated images, they can further refine the images through model attention adjustment or image inpainting within the rich feedback loop of PromptCharm. To evaluate the effectiveness and usability of PromptCharm, we conducted a controlled user study with 12 participants and an exploratory user study with another 12 participants. These two studies show that participants using PromptCharm were able to create images with higher quality and better aligned with the user's expectations compared with using two variants of PromptCharm that lacked interaction or visualization support.2024ZWZhijie Wang et al.University of AlbertaGenerative AI (Text, Image, Music, Video)Explainable AI (XAI)AI-Assisted Creative WritingCHI
Toward Supporting Adaptation: Exploring Affect’s Role in Cognitive Load when Using a Literacy GameEducational technologies have been argued to enhance specific aspects of affect, such as motivation, and through that learner experiences and outcomes. Until recently, affect has been considered separately from cognition. In this study, we investigated how learner affect (valence and activation) was tied to learner cognitive load and behaviours during game-based literacy activities. We employed experience sampling as part of a lab-based case study where 35 English language learners used an adaptive educational game. The results indicated that both positive and negative affect predicted learner cognitive load, with negative affect predicting extraneous (unnecessary) load. These results and the newly identified interaction patterns that accompanied learner affect and cognitive-load trajectories provide insight into the role of affect during learning. They show a need for considering affect when studying cognitive load and have implications for how systems should adapt to learners.2024MCMinghao Cai et al.University of AlbertaCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Serious & Functional GamesSTEM Education & Science CommunicationCHI
“It’s Not an Issue of Malice, but of Ignorance”: Towards Inclusive Video Conferencing for Presenters Who are d/Deaf or Hard of Hearing"As video conferencing (VC) has become necessary for many professional, educational, and social tasks, people who are d/Deaf and hard of hearing (DHH) face distinct accessibility barriers. We conducted studies to understand the challenges faced by DHH people during VCs and found that they struggled to easily present or communicate effectively due to accessibility limitations of VC platforms. These limitations include the lack of tools for DHH speakers to discreetly communicate their accommodation needs to the group. Based on these findings, we prototyped a suite of tools, called Erato that enables DHH speakers to be aware of their performance while speaking and remind participants of proper etiquette. We evaluated Erato by running a mock classroom case study over VC for three sessions. All participants felt more confident in their speaking ability and paid closer attention to making the classroom more inclusive while using our tool. We share implications of these results for the design of VC interfaces and human-the-the-loop assistive systems that can support users who are DHH to communicate effectively and advocate for their accessibility needs." https://doi.org/10.1145/36109012023JDJosh Urban Davis et al.Deaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)Universal & Inclusive DesignUbiComp
Exploring Collaborative Culture Sharing Dynamics in Immigrant Families through Digital Crafting and StorytellingFamilies strengthen bonds by collectively constructing social identity through sharing stories, language, and culture. For immigrant families, language and culture barriers disrupt the mechanisms for maintaining intergenerational connection. Immigrant grandparents and grandchildren are particularly at risk of disconnect. In this paper, we investigate existing design guidelines using a tool (StoryTapestry) to explore the storytelling and crafting process of South-Asian immigrant grandparents and grandchildren. In this exploration, pairs used culturally-relevant images to create digital visual artifacts that tell their stories. Grandparent-grandchild pairs from 10 South-Asian immigrant families participated in this exploration of how the digital process fosters positive social connection, culture sharing, and co-construction. A thematic analysis revealed how collaborative digital crafting encourages the crossing of language and culture barriers, knowledge sharing, and creativity. We contribute an understanding of interaction dynamics and socio-technical implications of intergenerational and cross-cultural collaboration by demonstrating (1) that collaborative digital crafting can reverse traditional educator and learner roles to create culture sharing opportunities, (2) that grandparents play a central role in maintaining social interaction, (3) that structure can guide grandparent-grandchild pairs to a shared goal, and (4) that flexibility encourages engagement from children. We synthesize ideas from migration and collaboration research, and we discuss how the culture, language, and generational dynamics in our study extend what is known about each of these spaces. Together, our design implications offer insight into building digital tools that promote engagement, knowledge sharing, and collaboration between immigrant grandparents and grandchildren navigating social disconnect post-migration.2023ALAmna Liaqat et al.Collaboration IICSCW
DeepSeer: Interactive RNN Explanation and Debugging via State AbstractionRecurrent Neural Networks (RNNs) have been widely used in Natural Language Processing (NLP) tasks given its superior performance on processing sequential data. However, it is challenging to interpret and debug RNNs due to the inherent complexity and the lack of transparency of RNNs. While many explainable AI (XAI) techniques have been proposed for RNNs, most of them only support local explanations rather than global explanations. In this paper, we present DeepSeer, an interactive system that provides both global and local explanations of RNN behavior in multiple tightly-coordinated views for model understanding and debugging. The core of DeepSeer is a state abstraction method that bundles semantically similar hidden states in an RNN model and abstracts the model as a finite state machine. Users can explore the global model behavior by inspecting text patterns associated with each state and the transitions between states. Users can also dive into individual predictions by inspecting the state trace and intermediate prediction results of a given input. A between-subjects user study with 28 participants shows that, compared with a popular XAI technique, LIME, participants using DeepSeer made a deeper and more comprehensive assessment of RNN model behavior, identified the root causes of incorrect predictions more accurately, and came up with more actionable plans to improve the model performance.2023ZWZhijie Wang et al.University of AlbertaExplainable AI (XAI)Computational Methods in HCICHI
Dynamics of eye-hand coordination are flexibly preserved in eye-cursor coordination during an online, digital, object interaction taskDo patterns of eye-hand coordination observed during real-world object interactions apply to digital, screen-based object interactions? We adapted a real-world object interaction task (physically transferring cups in sequence about a tabletop) into a two-dimensional screen-based task (dragging-and-dropping circles in sequence with a cursor). We collected gaze (with webcam eye-tracking) and cursor position data from 51 fully-remote, crowd-sourced participants who performed the task on their own computer. We applied real-world time-series data segmentation strategies to resolve the self-paced movement sequence into phases of object interaction and rigorously cleaned the webcam eye-tracking data. In this preliminary investigation, we found that: 1) real-world eye-hand coordination patterns persist and adapt in this digital context, and 2) remote, online, cursor-tracking and webcam eye-tracking are useful tools for capturing visuomotor behaviours during this ecologically-valid human-computer interaction task. We discuss how these findings might inform design principles and further investigations into natural behaviours that persist in digital environments.2023JBJennifer K Bertrand et al.University of Alberta, University of AlbertaEye Tracking & Gaze InteractionComputational Methods in HCICHI
Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation By Industry ProfessionalsLanguage models are increasingly attracting interest from writers. However, such models lack long-range semantic coherence, limiting their usefulness for longform creative writing. We address this limitation by applying language models hierarchically, in a system we call Dramatron. By building structural context via prompt chaining, Dramatron can generate coherent scripts and screenplays complete with title, characters, story beats, location descriptions, and dialogue. We illustrate Dramatron’s usefulness as an interactive co-creative system with a user study of $15$ theatre and film industry professionals. Participants co-wrote theatre scripts and screenplays with Dramatron and engaged in open-ended interviews. We report reflections both from our interviewees and from independent reviewers who critiqued performances of several of the scripts to illustrate how both Dramatron and hierarchical text generation could be useful for human-machine co-creativity. Finally, we discuss the suitability of Dramatron for co-creativity, ethical considerations---including plagiarism and bias---and participatory models for the design and deployment of such tools.2023PMPiotr Mirowski et al.DeepMindAI-Assisted Creative WritingCreative Collaboration & Feedback SystemsCHI
DeepLens: Interactive Out-of-distribution Data Detection in NLP ModelsMachine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DeepLens that has no interaction or visualization support.2023DSDa Song et al.University of AlbertaExplainable AI (XAI)Algorithmic Transparency & AuditabilityComputational Methods in HCICHI
Videoconference and Embodied VR: Communication Patterns Across Task and MediumVideoconference has become the dominant technology for remote meetings. Embodied Virtual Reality is a potential alternative that employs motion tracking in order to place people in a shared virtual environment as avatars. This paper describes a 210 participant study focused on behavioral measures that compares multiparty interaction in videoconference and embodied VR across a range of task types: a factual intellective task, a subjective judgment task and two negotiation tasks, one with visual grounding. It uses state-of-the-art body, face and finger tracking to drive the avatars in VR and a carefully matched videoconferencing implementation. Significant behavioral differences are observed. These include increased activity in videoconference related to maintaining the social connection: more person directed gaze and increased verbal and nonverbal backchannel behavior. Videoconference also had reduced conversational overlap, increased self-adaptor gestures and reduced deictic gestures as compared with embodied VR. Potential explanations and implications are discussed.2021AAAhsan Abdullah et al.VR and Immersive InterfacesCSCW
Tele-Immersive Improv: Effects of Immersive Visualisations on Rehearsing and Performing Theatre OnlinePerformers acutely need but lack tools to remotely rehearse and create live theatre, particularly due to global restrictions on social interactions during the Covid-19 pandemic. No studies, however, have heretofore examined how remote video-collaboration affects performance. This paper presents the findings of a field study with 16 domain experts over six weeks investigating how tele-immersion affects the rehearsal and performance of improvisational theatre. To conduct the study, an original media server was developed for co-locating remote performers into shared virtual 3D environments which were accessed through popular video conferencing software. The results of this qualitative study indicate that tele-immersive environments uniquely provide performers with a strong sense of co- presence, feelings of physical connection, and an increased ability to enter the social-flow states required for improvisational theatre. Based on our observations, we put forward design recommendations for video collaboration tools tailored to the unique demands of live performance.2021BBBoyd Branch et al.University of KentSocial & Collaborative VRImmersion & Presence ResearchInteractive Narrative & Immersive StorytellingCHI
Touch-Supported Voice Recording to Facilitate Forced Alignment of Text and Speech in an E-Reading InterfaceReading a book together with a family member who has impaired vision or other difficulties reading is an important social bonding activity. However, for the person being read to, there is little support in making these experiences repeatable. While audio can easily be recorded, synchronizing it with the text for later playback requires the use of forced alignment algorithms, which do not perform well on amateur read-aloud speech. We propose a human-in-the-loop approach to augmenting such algorithms in the form of touch metaphors during collocated read-aloud sessions using tablet e-readers. The metaphor is implemented as a finger-follows-text tracker. We explore how this could better handle the variability of amateur reading, which poses accuracy challenges for existing forced alignment techniques. Data collected from users reading aloud as assisted by touch metaphors show increases in the accuracy of forced alignment algorithms and reveal opportunities for how to better support reading aloud.2018BABenett Axtell et al.Voice AccessibilityCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Augmentative & Alternative Communication (AAC)IUI