Creativity Supportive Ecosystems: A Framework for Understanding Function and Disruption in Online Art WorldsThe online art world is a double-edged sword: the Internet’s vibrant culture of open, cooperative art-sharing also attracts nonconsensual reuse and appropriation. Artists continually navigate supportive and challenging interactions on social platforms, including community-shifting disruptions; the reuse of creative work for training generative AI is only the latest such disruption. Research into creativity support tools (CSTs) often centers artifact-making, leaving the HCI community with few strategies to understand the downstream impacts CSTs can make on artifact-sharing. Seeking a framework that captures this, we develop the creativity supportive ecosystem through interviews with 20 online artists, and 8 data “stewards” with experience reusing creative data for training GenAI. We use the CSE to describe how creative communities perceive and respond to disruption, identifying opportunities to empower artists in their collective negotiations with disruptive technologies like GenAI: by centering artists as producers of value, identifying creative and alternative data practices, and empowering inter-community flexibility.2025SAShm Garanganao Almeda et al.University of California: Berkeley, Berkeley Institute of Design LabGenerative AI (Text, Image, Music, Video)AI Ethics, Fairness & AccountabilityPrivacy by Design & User ControlCHI
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human PreferencesDue to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators simply inherit all the problems of the LLMs they evaluate, requiring further human validation. We present a mixed-initiative approach to “validate the validators”— aligning LLM-generated evaluation functions (be it prompts or code) with human requirements. Our interface, EvalGen, provides automated assistance to users in generating evaluation criteria and implementing assertions. While generating candidate implementations (Python functions, LLM grader prompts), EvalGen asks humans to grade a subset of LLM outputs; this feedback is used to select implementations that better align with user grades. A qualitative study finds overall support for EvalGen but underscores the subjectivity and iterative nature of alignment. In particular, we identify a phenomenon we dub criteria drift: users need criteria to grade outputs, but grading outputs helps users define criteria. What is more, some criteria appear dependent on the specific LLM outputs observed (rather than independent and definable a priori), raising serious questions for approaches that assume the independence of evaluation from observation of model outputs. We present our interface and implementation details, a comparison of our algorithm with a baseline approach, and implications for the design of future LLM evaluation assistants.2024SSShreya Shankar et al.Human-LLM CollaborationUIST
What's the Game, then? Opportunities and Challenges for Runtime Behavior GenerationProcedural content generation (PCG), the process of algorithmically creating game components instead of manually, has been a common tool of game development for decades. Recent advances in large language models (LLMs) enable the generation of game behaviors based on player input at runtime. Such code generation brings with it the possibility of entirely new gameplay interactions that may be difficult to integrate with typical game development workflows. We explore these implications through GROMIT, a novel LLM-based runtime behavior generation system for Unity. When triggered by a player action, GROMIT generates a relevant behavior which is compiled without developer intervention and incorporated into the game. We create three demonstration scenarios with GROMIT to investigate how such a technology might be used in game development. In a system evaluation we find that our implementation is able to produce behaviors that result in significant downstream impacts to gameplay. We then conduct an interview study with n=13 game developers using GROMIT as a probe to elicit their current opinion on runtime behavior generation tools, and enumerate the specific themes curtailing the wider use of such tools. We find that the main themes of concern are quality considerations, community expectations, and fit with developer workflows, and that several of the subthemes are unique to runtime behavior generation specifically. We outline a future work agenda to address these concerns, including the need for additional guardrail systems for behavior generation.2024NJNicholas Jennings et al.Generative AI (Text, Image, Music, Video)AI-Assisted Decision-Making & AutomationGame UX & Player BehaviorUIST
UICrit: Enhancing Automated Design Evaluation with a UI Critique DatasetAutomated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven designers, each with at least a year of professional design experience. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55\% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.2024PDPeitong Duan et al.Human-LLM CollaborationExplainable AI (XAI)UIST
Prompting for Discovery: Flexible Sense-Making for AI Art-Making with DreamsheetsDesign space exploration (DSE) for Text-to-Image (TTI) models entails navigating a vast, opaque space of possible image outputs, through a commensurately vast input space of hyperparameters and prompt text. Perceptually small movements in prompt-space can surface unexpectedly disparate images. How can interfaces support end-users in reliably steering prompt-space explorations towards interesting results? Our design probe, DreamSheets, supports user-composed exploration strategies with LLM-assisted prompt construction and large-scale simultaneous display of generated results, hosted in a spreadsheet interface. Two studies, a preliminary lab study and an extended two-week study where five expert artists developed custom TTI sheet-systems, reveal various strategies for targeted TTI design space exploration---such as using templated text generation to define and layer semantic ``axes'' for exploration. We identified patterns in exploratory structures across our participants' sheet-systems: configurable exploration ``units'' that we distill into a UI mockup, and generalizable UI components to guide future interfaces.2024SAShm Garanganao Almeda et al.University of California: BerkeleyGenerative AI (Text, Image, Music, Video)AI-Assisted Creative WritingGraphic Design & Typography ToolsCHI
Interactive Flexible Style Transfer for Vector GraphicsVector graphics are an industry-standard way to represent and share visual designs. Designers frequently source and incorporate styles from existing designs into their work. Unfortunately, popular design tools are not well suited for this task. We present VST, Vector Style Transfer, a novel design tool for flexibly transferring visual styles between vector graphics. The core of VST lies in leveraging automation while respecting designers' tastes and the subjectivity inherent to style transfer. In VST, designers tune a cross-design element correspondence and customize which style attributes to change. We report results from a user study in which designers used VST to control style transfer between several designs, including designs participants created with external tools beforehand. VST shows that enabling design correspondence tuning and customization is one way to support interactive, flexible style transfer.2023JWJeremy Warner et al.Graphic Design & Typography ToolsCustomizable & Personalized ObjectsUIST
Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3Prompting Large Language Models (LLMs) is an exciting new approach to designing chatbots. But can it improve LLM’s user experience (UX) reliably enough to power chatbot products? Our attempt to design a robust chatbot by prompting GPT-3/4 alone suggests: not yet. Prompts made achieving “80%” UX goals easy, but not the remaining 20%. Fixing the few remaining interaction breakdowns resembled herding cats: We could not address one UX issue or test one design solution at a time; instead, we had to handle everything everywhere all at once. Moreover, because no prompt could make GPT reliably say “I don’t know” when it should, the user-GPT conversations had no guardrails after a breakdown occurred, often leading to UX downward spirals. These risks incentivized us to design highly prescriptive prompts and scripted bots, counter to the promises of LLM-powered chatbots. This paper describes this case study, unpacks prompting’s fickleness and its impact on UX design processes, and discusses implications for LLM-based design methods and tools.2023JZJ.D. Zamfirescu-Pereira et al.Conversational ChatbotsHuman-LLM CollaborationDIS
Dual Body Bimanual Coordination in Immersive EnvironmentsA common way to enable immersion in VR is to render a virtual body that mirrors the user's physical movements. VR allows us to design interaction schemes that go beyond direct avatar embodiments. In particular, there is a growing body of literature investigating the simultaneous control of multiple bodies in VR. We contribute to this literature by investigating the important case where multiple bodies perform a coordinated interaction with each other. Such actions directly question what kind of embodiment users experience. Concretely, we investigate people's abilities to perform coordinated bimanual selection and handoff tasks between a first-person and third-person body through a user study with 19 participants. Results provide quantitative \& qualitative evidence for people's ability to perform complex coordinated tasks through two bodies. Furthermore we characterize participant performance in different task and interaction configurations, summarize the strategies they employed, and discuss qualities of user's proprioception.2023JSJames Smith et al.Full-Body Interaction & Embodied InputImmersion & Presence ResearchIdentity & Avatars in XRDIS
NFT Art World: The Influence of Decentralized Systems on the Development of Novel Online Creative Communities and Cooperative PracticesReporting on the Non-Fungible Token (NFT) ecosystem overwhelmingly focuses on the community that drove its growth and price volatility, gaining widespread media attention in 2021. This overlooks the communities developing novel creative practices on NFT platforms. Interviews with 16 creatives utilizing NFTs reveal a vast Art World: networks of distinct communities maturing into cooperative ecosystems with unique artistic subcultures, philosophies, and interactions. We observe unique qualities of these decentralized distribution platforms and identify patterns of activity comparable to those of traditional art worlds. We identify how aspects of these systems might subvert, or replicate, existing systems of power, value, and access. The impacts of policy and platform design on online creative communities in the NFT Art World carry valuable lessons for developers of digital interventions into the creative industry, exemplifying pertinent considerations for the future of creative labor and cooperation online.2023SAShm Garanganao Almeda et al.Creative Coding & Computational ArtCrowdsourcing Task Design & Quality ControlDIS
Weaving Schematics and Code: Interactive Visual Editing for Hardware Description LanguagesIn many engineering disciplines such as circuit board, chip, and mechanical design, a hardware description language (HDL) approach provides important benefits over direct manipulation interfaces by supporting concepts like abstraction and generator meta-programming. While several such HDLs have emerged recently and promised power and flexibility, they also present challenges -- especially to designers familiar with current graphical workflows. In this work, we investigate an IDE approach to provide a graphical editor for a board-level circuit design HDL. Unlike GUI builders which convert an entire diagram to code, we instead propose generating equivalent HDL from individual graphical edit actions. By keeping code as the primary design input, we preserve the full power of the underlying HDL, while remaining useful even to advanced users. We discuss our concept, design considerations such as performance, system implementation, and report on the results of an exploratory remote user study with four experienced hardware designers.2021RLRichard Lin et al.Interactive Data VisualizationDesktop 3D Printing & Personal FabricationCircuit Making & Hardware PrototypingUIST
LabelAR: A Spatial Guidance Interface for Fast Computer Vision Image CollectionComputer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well. We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. Across 12 users, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.2019MLMichael Laielli et al.AR Navigation & Context AwarenessGenerative AI (Text, Image, Music, Video)UIST
Loki: Facilitating Remote Instruction of Physical Tasks Using Bi-Directional Mixed-Reality TelepresenceRemotely instructing and guiding users in physical tasks has offered the promise of training surgeons remotely, guiding complex repair tasks by an expert, or enabling novel learning workflows. While it has been the subject of many research projects, current approaches are often limited in the communication bandwidth (lacking context, spatial information) or interactivity (unidirectional, asynchronous) between expert and learner. To address some of these issues we explore the design space of bi-directional mixed-reality telepresence systems for teaching physical tasks, and present Loki, a novel system which explores the various dimensions of this space. Loki leverages video, audio and spatial capture along with mixed reality presentation methods to allow users to explore and annotate the local and remote environments, and record and review their own performance as well as their peer’s. In this paper, we explore the design space of mixed reality telepresence for physical tasks, contribute the system design of Loki which enables transitions between the elements of this design space, and validate its utility through a varied set of scenarios2019BKBalasaravanan Thoravi Kumaravel et al.Mixed Reality WorkspacesCollaborative Learning & Peer TeachingTeleoperation & TelepresenceUIST
Interactive Extraction of Examples from Existing CodeProgrammers frequently learn from examples produced and shared by other programmers. However, it can be challenging and time-consuming to produce concise, working code examples. We conducted a formative study where 12 participants made examples based on their own code. This revealed a key hurdle: making meaningful simplifications without introducing errors. Based on this insight, we designed a mixed-initiative tool, CodeScoop, to help programmers extract executable, simplified code from existing code. CodeScoop enables programmers to "scoop" out a relevant subset of code. Techniques include selectively including control structures and recording an execution trace that allows authors to substitute literal values for code and variables. In a controlled study with 19 participants, CodeScoop helped programmers extract executable code examples with the intended behavior more easily than with a standard code editor.2018AHAndrew Head et al.UC BerkeleyOpen-Source Collaboration & Code ReviewComputational Methods in HCICHI