Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language ModelsSensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria. Prior research and our formative study found that people would benefit from reading an overview of an information space upfront, including the criteria others previously found useful. However, existing sensemaking tools struggle with the "cold-start" problem -- not only requiring significant input from previous users to generate and share these overviews, but also that such overviews may turn out to be biased and incomplete. In this work, we introduce a novel system, Selenite, which leverages Large Language Models (LLMs) as reasoning machines and knowledge retrievers to automatically produce a comprehensive overview of options and criteria to jumpstart users' sensemaking processes. Subsequently, Selenite also adapts as people use it, helping users find, read, and navigate unfamiliar information in a systematic yet personalized manner. Through three studies, we found that Selenite produced accurate and high-quality overviews reliably, significantly accelerated users' information processing, and effectively improved their overall comprehension and sensemaking experience.2024MLMichael Xieyang Liu et al.Carnegie Mellon UniversityHuman-LLM CollaborationExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
Meta-Manager: A Tool for Collecting and Exploring Meta Information about CodeModern software engineering is in a state of flux. With more development utilizing AI code generation tools and the continued reliance on online programming resources, understanding code and the original intent behind it is becoming more important than it ever has been. To this end, we have developed the "Meta-Manager'", a Visual Studio Code extension, with a supplementary browser extension, that automatically collects and organizes changes made to code while keeping track of the provenance of each part of the code, including code that has been AI-generated or copy-pasted from popular programming resources online. These sources and subsequent changes are represented in the editor and may be explored using searching and filtering mechanisms to help developers answer historically hard-to-answer questions about code, its provenance, and its design rationale. In our evaluation of Meta-Manager, we found developers were successfully able to use it to answer otherwise unanswerable questions about an unfamiliar code base.2024AHAmber Horvath et al.Carnegie Mellon UniversityInteractive Data VisualizationComputational Methods in HCICHI
Understanding Documentation Use Through Log Analysis: A Case Study of Four Cloud ServicesAlmost no modern software system is written from scratch, and developers are required to effectively learn to use third-party libraries and software services. Thus, many practitioners and researchers have looked for ways to create effective documentation that supports developers' learning. However, few efforts have focused on how people actually use the documentation. In this paper, we report on an exploratory, multi-phase, mixed methods empirical study of documentation page-view logs from four cloud-based industrial services. By analyzing page-view logs for over 100,000 users, we find diverse patterns of documentation page visits. Moreover, we show statistically that which documentation pages people visit often correlates with user characteristics such as past experience with the specific product, on the one hand, and with future adoption of the API on the other hand. We discuss the implications of these results on documentation design and propose documentation page-view log analysis as a feasible technique for design audits of documentation, from ones written for software developers to ones designed to support end users (e.g., Adobe Photoshop).2024DNDaye Nam et al.Carnegie Mellon UniversityKnowledge Worker Tools & WorkflowsPrototyping & User TestingCHI
FrameKit: A Tool for Authoring Adaptive UIs Using KeyframesAdaptive user interfaces (AUIs) can improve user experience by adapting how information and functionality are presented in a user interface. However, the dynamic nature and potentially numerous variations of AUIs make them challenging to author. In this paper, we present a generalized framework for defining adaptation as interpolations between UIs and introduce a computational approach for intelligently generating new variations of a UI from a small set of designs. Based on this approach, we built FrameKit, an authoring tool with a programming-by-example interface that retains flexibility and control afforded by manual authoring while reducing effort through automatic generation. We demonstrate that FrameKit can support adaptations that typically require domain-specific toolkits, such as those found in context-aware applications, responsive UIs, and ability-based adaptation. We evaluated FrameKit with ten front-end developers, who successfully authored AUIs after a short tutorial session and suggested that FrameKit provides an effective mental model for AUI authoring.2024JWJason Wu et al.Human-LLM CollaborationPrototyping & User TestingIUI
ONYX: Assisting Users in Teaching Natural Language Interfaces Through Multi-Modal Interactive Task LearningUsers are increasingly empowered to personalize natural language interfaces (NLIs) by teaching how to handle new natural language (NL) inputs. However, our formative study found that when teaching new NL inputs, users require assistance in clarifying ambiguities that arise and want insight into which parts of the input the NLI understands. In this paper we introduce ONYX, an intelligent agent that interactively learns new NL inputs by combining NL programming and programming-by-demonstration, also known as multi-modal interactive task learning. To address the aforementioned challenges, ONYX provides suggestions on how ONYX could handle new NL inputs based on previously learned concepts or user-defined procedures, and poses follow-up questions to clarify ambiguities in user demonstrations, using visual and textual aids to clarify the connections. Our evaluation shows that users provided with ONYX’s new features achieved significantly higher accuracy in teaching new NL inputs (median: 93.3%) in contrast to those without (median: 73.3%).2023MRMarcel Ruoff et al.Karlsruhe Institute of Technology (KIT)Voice User Interface (VUI) DesignHuman-LLM CollaborationExplainable AI (XAI)CHI
Wigglite: Low-cost Information Collection and TriageConsumers conducting comparison shopping, researchers making sense of competitive space, and developers looking for code snippets online all face the challenge of capturing the information they find for later use without interrupting their current flow. In addition, during many learning and exploration tasks, people need to externalize their mental context, such as estimating how urgent a topic is to follow up on, or rating a piece of evidence as a "pro" or "con," which helps scaffold subsequent deeper exploration. However, current approaches incur a high cost, often requiring users to select, copy, context switch, paste, and annotate information in a separate document without offering specific affordances that capture their mental context. In this work, we explore a new interaction technique called "wiggling," which can be used to fluidly collect, organize, and rate information during early sensemaking stages with a single gesture. Wiggling involves rapid back-and-forth movements of a pointer or up-and-down scrolling on a smartphone, which can indicate the information to be collected and its valence, using a single, light-weight gesture that does not interfere with other interactions that are already available. Through implementation and user evaluation, we found that wiggling helped participants accurately collect information and encode their mental context with a 58% reduction in operational cost while being 24% faster compared to a common baseline.2022MLMichael Xieyang Liu et al.Prototyping & User TestingUIST
Using Annotations for Sensemaking About CodeDevelopers spend significant amounts of time finding, relating, navigating, and, more broadly, making sense of code. While sensemaking, developers must keep track of many pieces of information including the objectives of their task, the code locations of interest, their questions and hypotheses about the behavior of the code, and more. Despite this process being such an integral aspect of software development, there is little tooling support for externalizing and keeping track of developers' information, which led us to develop Catseye -- an annotation tool for lightweight notetaking about code. Catseye has advantages over traditional methods of externalizing code-related information, such as commenting, in that the annotations retain the original context of the code while not actually modifying the underlying source code, and can support richer interactions such as lightweight versioning, following-up on the annotation content, and can be used as navigational aids. In our investigation of developers' notetaking processes using Catseye, we found developers were able to successfully use annotations to support their code sensemaking when completing a debugging task, with participants in a small study who used Catseye fixing more bugs, on average, than the baseline.2022AHAmber Horvath et al.Knowledge Worker Tools & WorkflowsComputational Methods in HCIUIST
Crystalline: Lowering the Cost for Developers to Collect and Organize Information for Decision MakingDevelopers perform online sensemaking on a daily basis, such as researching and choosing libraries and APIs. Prior research has introduced tools that help developers capture information from various sources and organize it into structures useful for subsequent decision-making. However, it remains a laborious process for developers to manually identify and clip content, maintaining its provenance and synthesizing it with other content. In this work, we introduce a new system called Crystalline that automatically collects and organizes information into tabular structures as the user searches and browses the web. It leverages natural language processing to automatically group similar criteria together to reduce clutter, and uses passive behavioral signals such as mouse movement and dwell time to infer what information to collect and how to visualize and prioritize it. Our user study suggests that developers are able to create comparison tables about 20% faster with a 60% reduction in operational cost without sacrificing the quality of the tables.2022MLMichael Xieyang Liu et al.Carnegie Mellon UniversityHuman-LLM CollaborationInteractive Data VisualizationComputational Methods in HCICHI
Understanding How Programmers Can Use Annotations on DocumentationModern software development requires developers to find and effectively utilize new APIs and their documentation, but documentation has many well-known issues. Despite this, developers eventually overcome these issues but have no way of sharing what they learned. We investigate sharing this documentation-specific information through annotations, which have advantages over developer forums as the information is contextualized, not disruptive, and is short, thus easy to author. Developers can also author annotations to support their own comprehension. In order to support the documentation usage behaviors we found, we built the Adamite annotation tool, which provides features such as multiple anchors, annotation types, and pinning. In our user study, we found that developers are able to create annotations that are useful to themselves and are able to utilize annotations created by other developers when learning a new API, with readers of the annotations completing 67% more of the task, on average, than the baseline.2022AHAmber Horvath et al.Carnegie Mellon University, Carnegie Mellon UniversityProgramming Education & Computational ThinkingKnowledge Worker Tools & WorkflowsCHI
To Reuse or Not To Reuse? A Framework and System for Evaluating Summarized KnowledgeAs the amount of information online continues to grow, a correspondingly important opportunity is for individuals to reuse knowledge which has been summarized by others rather than starting from scratch. However, appropriate reuse requires judging the relevance, trustworthiness, and thoroughness of others' knowledge in relation to an individual's goals and context. In this work, we explore augmenting judgements of the appropriateness of reusing knowledge in the domain of programming, specifically of reusing artifacts that result from other developers' searching and decision making. Through an analysis of prior research on sensemaking and trust, along with new interviews with developers, we synthesized a framework for reuse judgements. The interviews also validated that developers express a desire for help with judging whether to reuse an existing decision. From this framework, we developed a set of techniques for capturing the initial decision maker's behavior and visualizing signals calculated based on the behavior, to facilitate subsequent consumers' reuse decisions, instantiated in a prototype system called Strata. Results of a user study suggest that the system significantly improves the accuracy, depth, and speed of reusing decisions. These results have implications for systems involving user-generated content in which other users need to evaluate the relevance and trustworthiness of that content.2021MLMichael Xieyang Liu et al.Algorithms and Decision MakingCSCW
Tabs.do: Task-Centric Browser Tab ManagementDespite the increasing complexity and scale of people’s online activities, browser interfaces have stayed largely the same since tabs were introduced in major browsers nearly 20 years ago. The gap between simple tab-based browser interfaces and the complexity of users’ tasks can lead to serious adverse effects -- commonly referred to as "tab overload." This paper introduces a Chrome extension called Tabs.do, which explores bringing a task-centric approach to the browser, helping users to group their tabs into tasks and then organize, prioritize, and switch between those tasks fluidly. To lower the cost of importing, Tabs.do uses machine learning to make intelligent suggestions for grouping users’ open tabs into task bundles by exploiting behavioral and semantic features. We conducted a field deployment study where participants used Tabs.do with their real-life tasks in the wild, and showed that Tabs.do can decrease tab clutter, enabled users to create rich task structures with lightweight interactions, and allowed participants to context-switch among tasks more efficiently.2021JCJoseph Chee Chang et al.Knowledge Worker Tools & WorkflowsNotification & Interruption ManagementUIST
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsRepresenting the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screen- and app-specific metadata. Through several sample downstream tasks, we demonstrate Screen2Vec's key useful properties: representing between-screen similarity through nearest neighbors, composability, and capability to represent user tasks.2021TLToby Jia-Jun Li et al.Carnegie Mellon UniversityExplainable AI (XAI)Prototyping & User TestingComputational Methods in HCICHI
Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented DialogsA major problem in task-oriented conversational agents is the lack of support for the repair of conversational breakdowns. Prior studies have shown that current repair strategies for these kinds of errors are often ineffective due to: (1) the lack of transparency about the state of the system’s understanding of the users utterance; and (2) the system’s limited capabilities to understand the user’s verbal attempts to repair natural language understanding errors. This paper introduces SOVITE, a new multi-modal (speech plus direct manipulation) interface that helps users discover, identify the causes of, and recover from conversational breakdowns using the resources of existing mobile app GUIs for grounding. SOVITE displays the system’s understanding of user intents using GUI screenshots, allows users to refer to third-party apps and their GUI screens in conversations as inputs for intent disambiguation, and enables users to repair breakdowns using direct manipulation on these screenshots. The results from a remote user study with 10 users using SOVITE in 7 scenarios suggested that SOVITE’s approach is usable and effective.2020TLToby Jia-Jun Li et al.Voice User Interface (VUI) DesignConversational ChatbotsUIST
Privacy-Preserving Script Sharing in GUI-based Programming-by-Demonstration SystemsAn important concern in end user development (EUD) is accidentally embedding private information in program artifacts when sharing them. This issue is particularly important in GUI-based programming-by-demonstration (PBD) systems due to the lack of direct developer control of script contents. Prior studies reported that these privacy concerns were the main barrier to script sharing in EUD. We present a new approach that can identify and obfuscate potential personal information in GUI-based PBD scripts based on the uniqueness of information entries with respect to the corresponding app GUI context. Compared with the prior approaches, ours supports broader types of personal information beyond explicitly pre-specified ones, requires minimal user effort, addresses the threat of re-identification attacks, and can work with third-party apps from any task domain. Our approach also recovers obfuscated fields locally on the script consumer’s side to preserve the shared scripts’ transparency, readability, robustness, and generalizability. Our evaluation shows that our approach (1) accurately identifies the potential personal information in scripts across different apps in diverse task domains; (2) allows end-user developers to feel comfortable sharing their own scripts; and (3) enables script consumers to understand the operation of shared scripts despite the obfuscated fields.2020TLToby Jia-Jun Li et al.Privacy and SecurityCSCW
Unakite: Scaffolding Developers’ Decision-Making Using the WebDevelopers spend a significant portion of their time searching for solutions and methods online. While numerous tools have been developed to support this exploratory process, in many cases the answers to developers’ questions involve trade-offs among multiple valid options and not just a single solution. Through interviews, we discovered that developers express a desire for help with decision-making and understanding trade-offs. Through an analysis of Stack Overflow posts, we observed that many answers describe such trade-offs. These findings suggest that tools designed to help a developer capture information and make decisions about trade-offs can provide crucial benefits for both the developers and others who want to understand their design rationale. In this work, we probe this hypothesis with a prototype system named Unakite that captures, organizes, and keeps track of information about trade-offs and builds a comparison table, which can be saved for later as the design rationale. Our evaluation results show that Unakite reduces the cost of collecting tradeoff-related information by 45%, and that the resulting comparison table speeds up a subsequent developer’s ability to understand the trade-offs by about a factor of 3.2019MLMichael Xieyang Liu et al.Explainable AI (XAI)AI-Assisted Decision-Making & AutomationComputational Methods in HCIUIST
PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and DemonstrationsNatural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multi-modal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability.2019TLToby Jia-Jun Li et al.Conversational ChatbotsHuman-LLM CollaborationUIST
The Story in the Notebook: Exploratory Data Science using a Literate Programming ToolLiterate programming tools are used by millions of programmers today, and are intended to facilitate presenting data analyses in the form of a narrative. We interviewed 21 data scientists to study coding behaviors in a literate programming environment and how data scientists kept track of variants they explored. For participants who tried to keep a detailed history of their experimentation, both informal and formal versioning attempts led to problems, such as reduced notebook readability. During iteration, participants actively curated their notebooks into narratives, although primarily through cell structure rather than markdown explanations. Next, we surveyed 45 data scientists and asked them to envision how they might use their past history in an future version control system. Based on these results, we give design guidance for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.2018MKMary Beth Kery et al.Carnegie Mellon UniversityInteractive Data VisualizationComputational Methods in HCICHI