University of Michigan School of Information
GPT-4 | Automated Vehicles | Privacy Concerns: UMSI Research Roundup
Wednesday, 12/06/2023
University of Michigan School of Information faculty and PhD students are creating and sharing knowledge that helps build a better world. Here are some of their recent publications.
RESEARCH
Cross-Contextual Examination of Older Adults’ Privacy Concerns, Behaviors, and Vulnerabilities
Proceedings on Privacy Enhancing Technologies, January 2024
Yixin Zou, Kaiwen Sun, Tanisha Afnan, Ruba Abu-Salma, Robin Brewer, Florian Schaub
A growing body of research has examined the privacy concerns and behaviors of older adults, often within specific contexts. It remains unclear to what extent older adults’ privacy concerns and behaviors vary across contexts and whether old age is the primary factor influencing privacy vulnerabilities. To address this gap, we conducted semi-structured interviews with 43 older adults (aged 65 to 89) in the United States. Our interviews were grounded in five scenarios: account and device sharing, healthcare, online advertising, social networking, and cybercrime. Our cross-contextual analysis showed that cybercrime was a recurring and pressing concern across scenarios; privacy concerns and protective behaviors were rarely mentioned in the healthcare scenario. Across all scenarios, participants’ threat models and strategies revolved around data collection rather than other stages in which privacy harms may occur; they employed various active strategies to safeguard their privacy while trusting service providers to protect their information. Our findings underscore the need to revisit the discussion around privacy vulnerability and aging. Vulnerability levels among our participants varied widely and were often influenced by factors beyond age, such as tech savviness and income. We discuss opportunities for privacy interventions, technologies, and education that promote positive aging and recognize diversity among older adults.
Investigating Student Mistakes in Introductory Data Science Programming
SIGCSE, March 2024
Anjali Singh, Anna Fariha, Christopher Brooks, Gustavo Soares, Austin Henley, Ashish Tiwari, Chethan M, Heeryung Choi, Sumit Gulwani
Data Science (DS) has emerged as a new academic discipline where students are introduced to data-centric thinking and generating data-driven insights through programming. Unlike traditional introductory programming education, which focuses on program syntax and core Computer Science (CS) topics (e.g., algorithms and data structures), introductory DS education emphasizes skills such as studying the data at hand to gain insights and making effective use of programming libraries (e.g., re, NumPy, pandas, scikit-learn). To better understand learners’ needs and pain points when they are introduced to DS programming, we investigated a large online course on data manipulation designed for graduate students who do not have a CS or Statistics undergraduate degree. We qualitatively analyzed incorrect student code submissions for computational notebook-based programming assignments in Python. We identified common mistakes and grouped them into the following themes: (1) programming language and environment misconceptions, (2) logical mistakes due to data or problem-statement misunderstanding or incorrectly dealing with missing values, (3) semantic mistakes from incorrect usage of DS libraries, and (4) suboptimal coding. Our work provides instructors valuable insights to understand student needs in introductory DS courses and improve course pedagogy, along with recommendations for developing assessment and feedback tools to better support students in large courses.
Bridging Learnersourcing and AI: Exploring the Dynamics of Student-AI Collaborative Feedback Generation
arXiv, November 2023
Anjali Singh, Christopher Brooks, Xu Wang, Warren Li, Juho Kim, Deepti Wilson
This paper explores the space of optimizing feedback mechanisms in complex domains such as data science, by combining two prevailing approaches: Artificial Intelligence (AI) and learnersourcing. Towards addressing the challenges posed by each approach, this work compares traditional learnersourcing with an AI-supported approach. We report on the results of a randomized controlled experiment conducted with 72 Master’s level students in a data visualization course, comparing two conditions: students writing hints independently versus revising hints generated by GPT-4. The study aimed to evaluate the quality of learnersourced hints, examine the impact of student performance on hint quality, gauge learner preference for writing hints with or without AI support, and explore the potential of the student-AI collaborative exercise in fostering critical thinking about LLMs. Based on our findings, we provide insights for designing learnersourcing activities leveraging AI support and optimizing students’ learning as they interact with LLMs.
Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation
arXiv, October 2023
Tung Phung, Victor-Alexandru Pădurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, Gustavo Soares
Generative AI and large language models hold great promise in enhancing programming education by automatically generating individualized feedback for students. We investigate the role of generative AI models in providing human tutor-style programming hints to help students resolve errors in their buggy programs. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios; however, their overall quality is still inferior to human tutors and not yet ready for real-world deployment. In this paper, we seek to push the limits of generative AI models toward providing high-quality programming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a first step, our technique leverages GPT-4 as a “tutor” model to generate hints – it boosts the generative quality by using symbolic information of failing test cases and fixes in prompts. As a next step, our technique leverages GPT-3.5, a weaker model, as a “student” model to further validate the hint quality – it performs an automatic quality validation by simulating the potential utility of providing this feedback. We show the efficacy of our technique via extensive evaluation using three real-world datasets of Python programs covering a variety of concepts ranging from basic algorithms to regular expressions and data analysis using pandas library.
Finding the Right Voice: Exploring the Impact of Gender Similarity and Gender-Role Congruity on the Efficacy of Automated Vehicle Explanations
Association for the Advancement of Artificial Intelligence, 2023
Qiaoning Zhang, X. Jessie Yang, Lionel P. Robert Jr.
Automated Vehicles (AVs), acting as social robots, hold potential benefits for society. Prior research highlights how AV explanations can enhance passenger trust by clarifying the vehicle’s reasoning and actions. However, an underexplored area is the impact of voice gender in AV explanations on this trust dynamic. To bridge this gap, our study, inspired by the gender-role congruity and similarity attraction theories, investigates the impacts of AV voice gender on user trust. The anticipated findings from our research are poised to play a critical role in designing AV explanations that enhance trust, thereby advancing the human-AV interaction landscape.
Meta Semantic Template for Evaluation of Large Language Models
arXiv, October 2023
Yachuan Liu, Liang Chen, Jindong Wang, Qiaozhu Mei, Xing Xie
Do large language models (LLMs) genuinely understand the semantics of the language, or just memorize the training data? The recent concern on potential data contamination of LLMs has raised awareness of the community to conduct research on LLMs evaluation. In this paper, we propose MSTEMP, an approach that creates meta semantic templates to evaluate the semantic understanding ability of LLMs. The core of MSTEMP is not to perform evaluation directly on existing benchmark datasets, but to generate new out-of-distribution (OOD) evaluation sets using existing datasets as seeds. Specifically, for a given sentence, MSTEMP leverages another language model to generate new samples while preserving its semantics. The new samples are called semantic templates to the original sentence. Then, MSTEMP generates evaluation samples via sentence parsing and random word replacement on the semantic templates. MSTEMP is highly flexible, dynamic, and cost-effective. Our initial experiments show that MSTEMP-generated samples can significantly reduce the performance of LLMs using existing datasets as seeds. We hope this initial work can shed light on future research of LLMs evaluation.
Can LLMs Effectively Leverage Graph Structural Information: When and Why
arXiv, September 2023
Jin Huang, Xingjian Zhang, Qiaozhu Mei, Jiaqi Ma
This paper studies Large Language Models (LLMs) augmented with structured data–particularly graphs–a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks with textual features. To address the “when” question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the “why” questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node.
Reframe: An Augmented Reality Storyboarding Tool for Character-Driven Analysis of Security & Privacy Concerns
UIST: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, October 2023
Shwetha Rajaram, Franziska Roesner, Michael Nebeling
While current augmented reality (AR) authoring tools lower the technical barrier for novice AR designers, they lack explicit guidance to consider potentially harmful aspects of AR with respect to security & privacy (S&P). To address potential threats in the earliest stages of AR design, we developed Reframe, a digital storyboarding tool for designers with no formal training to analyze S&P threats. We accomplish this through a frame-based authoring approach, which captures and enhances storyboard elements that are relevant for threat modeling, and character-driven analysis tools, which personify S&P threats from an underlying threat model to provide simple abstractions for novice AR designers. Based on evaluations with novice AR designers and S&P experts, we find that Reframe enables designers to analyze threats and propose mitigation techniques that experts consider good quality. We discuss how Reframe can facilitate collaboration between designers and S&P professionals and propose extensions to Reframe to incorporate additional threat models.
Applying AI and Guidelines to Assist Medical Students in Recognizing Patients With Heart Failure: Protocol for a Randomized Trial
JMIR Research Protocols, October 2023
Hyeon Joo, Michael R Mathis, Marty Tam, Cornelius James, Peijin Han, Rajesh S Mangrulkar, Charles P Friedman, VG Vinod Vydiswaran
Background:The integration of artificial intelligence (AI) into clinical practice is transforming both clinical practice and medical education. AI-based systems aim to improve the efficacy of clinical tasks, enhancing diagnostic accuracy and tailoring treatment delivery. As it becomes increasingly prevalent in health care for high-quality patient care, it is critical for health care providers to use the systems responsibly to mitigate bias, ensure effective outcomes, and provide safe clinical practices. In this study, the clinical task is the identification of heart failure (HF) prior to surgery with the intention of enhancing clinical decision-making skills. HF is a common and severe disease, but detection remains challenging due to its subtle manifestation, often concurrent with other medical conditions, and the absence of a simple and effective diagnostic test. While advanced HF algorithms have been developed, the use of these AI-based systems to enhance clinical decision-making in medical education remains understudied.
Objective: This research protocol is to demonstrate our study design, systematic procedures for selecting surgical cases from electronic health records, and interventions. The primary objective of this study is to measure the effectiveness of interventions aimed at improving HF recognition before surgery, the second objective is to evaluate the impact of inaccurate AI recommendations, and the third objective is to explore the relationship between the inclination to accept AI recommendations and their accuracy.
Methods: Our study used a 3 × 2 factorial design (intervention type × order of prepost sets) for this randomized trial with medical students. The student participants are asked to complete a 30-minute e-learning module that includes key information about the intervention and a 5-question quiz, and a 60-minute review of 20 surgical cases to determine the presence of HF. To mitigate selection bias in the pre- and posttests, we adopted a feature-based systematic sampling procedure. From a pool of 703 expert-reviewed surgical cases, 20 were selected based on features such as case complexity, model performance, and positive and negative labels. This study comprises three interventions: (1) a direct AI-based recommendation with a predicted HF score, (2) an indirect AI-based recommendation gauged through the area under the curve metric, and (3) an HF guideline-based intervention.
Results: As of July 2023, 62 of the enrolled medical students have fulfilled this study’s participation, including the completion of a short quiz and the review of 20 surgical cases. The subject enrollment commenced in August 2022 and will end in December 2023, with the goal of recruiting 75 medical students in years 3 and 4 with clinical experience.
Conclusions: We demonstrated a study protocol for the randomized trial, measuring the effectiveness of interventions using AI and HF guidelines among medical students to enhance HF recognition in preoperative care with electronic health record data.
Identifying Design Opportunities for Adaptive mHealth Interventions That Target General Well-Being: Interview Study With Informal Care Partners
JMIR Formative Research, October 2023
Xinghui Yan, Mark W Newman, Sun Young Park, Angelle Sander, Sung Won Choi, Jennifer Miner, Zhenke Wu, Noelle Carlozzi
Background: Mobile health (mHealth) interventions can deliver personalized behavioral support to users in daily contexts. These interventions have been increasingly adopted to support individuals who require low-cost and low-burden support. Prior research has demonstrated the feasibility and acceptability of an mHealth intervention app (CareQOL) designed for use with informal care partners. To further optimize the intervention delivery, we need to investigate how care partners, many of whom lack the time for self-care, react and act in response to different behavioral messages.
Objective: The goal of this study was to understand the factors that impact care partners’ decision-making and actions in response to different behavioral messages. Insights from this study will help optimize future tailored and personalized behavioral interventions.
Methods: We conducted semistructured interviews with participants who had recently completed a 3-month randomized controlled feasibility trial of the CareQOL mHealth intervention app. Of the 36 participants from the treatment group of the randomized controlled trial, 23 (64%) participated in these interviews. To prepare for each interview, the team first selected representative behavioral messages (eg, targeting different health dimensions) and presented them to participants during the interview to probe their influence on participants’ thoughts and actions. The time of delivery, self-reported perceptions of the day, and user ratings of a message were presented to the participants during the interviews to assist with recall.
Results: The interview data showed that after receiving a message, participants took various actions in response to different messages. Participants performed suggested behaviors or adjusted them either immediately or in a delayed manner (eg, sometimes up to a month later). We identified 4 factors that shape the variations in user actions in response to different behavioral messages: uncertainties about the workload required to perform suggested behaviors, concerns about one’s ability to routinize suggested behaviors, in-the-moment willingness and ability to plan for suggested behaviors, and overall capability to engage with the intervention.
Conclusions: Our study showed that care partners use mHealth behavioral messages differently regarding the immediacy of actions and the adaptation to suggested behaviors. Multiple factors influence people’s perceptions and decisions regarding when and how to take actions. Future systems should consider these factors to tailor behavioral support for individuals and design system features to support the delay or adaptation of the suggested behaviors. The findings also suggest extending the assessment of user adherence by considering the variations in user actions on behavioral support (ie, performing suggested or adjusted behaviors immediately or in a delayed manner).
'Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text
arXiv, accepted for EMNLP 2023, October 2023
Rishav Hada ∗, Agrima Seth ∗, Harshita Diddee, Kalika Bali
Language serves as a powerful tool for the manifestation of societal belief systems. In doing so, it also perpetuates the prevalent biases in our society. Gender bias is one of the most pervasive biases in our society and is seen in online and offline discourses. With LLMs increasingly gaining human-like fluency in text generation, gaining a nuanced understanding of the biases these systems can generate is imperative. Prior work often treats gender bias as a binary classification task. However, acknowledging that bias must be perceived at a relative scale; we investigate the generation and consequent receptivity of manual annotators to bias of varying degrees. Specifically, we create the first dataset of GPT-generated English text with normative ratings of gender bias. Ratings were obtained using Best--Worst Scaling -- an efficient comparative annotation framework. Next, we systematically analyze the variation of themes of gender biases in the observed ranking and show that identity-attack is most closely related to gender bias. Finally, we show the performance of existing automated models trained on related concepts on our dataset.
Data, not documents: Moving beyond theories of information-seeking behavior to advance data discovery
M | Library Deep Blue Documents, 2023
A.J. Million, Libby Hemphill, Jeremy York, Sara Lafia
Many theories of human information behavior (HIB) assume that information objects are in text document format. This paper argues that four important HIB theories are insufficient for describing users’ search strategies for research data because of differences between data and text documents. We first review and compare four HIB theories: Bates’ theory of berrypicking, Marchionni’s theory of electronic information search, Dervin’s theory of sense-making, and Meho and Tibbo’s model of social scientist information-seeking. All four theories assume that information-seekers search for text documents. We compare these theories to user search behavior by analyzing the Inter-university Consortium for Political and Social Research’s (ICPSR) search logs. Users took direct, orienting, and scenic paths when searching for research data. We interviewed ICPSR data users (n=20), and they said they needed dataset documentation and contextual information absent from text documents to find data, which suggested ongoing sense-making. However, sense-making alone does not explain the information-seeking behavior we observed. What mattered most to secondary data discovery were information attributes determined by the type of objects that users sought (i.e., data, not documents). We conclude by suggesting an alternative frame for building data discovery tools.
Trust Signals: An Intersectional Approach to Understanding Women of Color’s News Trust
Media and Communication, November 2023
Journalism scholars have increasingly become concerned with how our changing media environment has shifted traditional understandings of how news outlets create trust with audiences. While many scholars have focused on broad avenues of building trust with audiences through transparency, community engagement, and funding, arguably less attention has been paid to how audience members’ social positionality—determined by factors such as race, class, and socioeconomic status—can shape their varying understanding of what makes a news source trustworthy. Thus, in this study, I conducted focus groups with US women of color, a community marginalized minimally along race and gender, to understand how their positionality shapes how they conceptualize news trust. Through eight focus groups with N = 45 women of color, I found while participants used known antecedents of news trust, these were often more specifically rooted in their own experiences with racism, heterosexism, and classism. Further, participants had varying conceptualizations around antecedents of trust, such as accuracy and bias. Through these findings, I suggest how news organizations can better establish trust across marginalized communities.
Report of the 1st Workshop on Generative AI and Law
arXiv, November 2023
A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi1, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry, Milad Nasr, Paul Ohm, Adam Roberts, Tom Rubin, Pamela Samuelson, Ludwig Schubert, Kristen Vaccaro, Luis Villa, Felix Wu, Elana Zeide
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report.
Perspectives on Privacy in the Post-Roe Era: A Mixed-Methods of Machine Learning and Qualitative Analyses of Tweets
arXiv, November 2023
Yawen Guo, Rachael Zehrung, Katie Genuario, Xuan Lu, Qiaozhu Mei, Yunan Chen, Kai Zheng
Abortion is a controversial topic that has long been debated in the US. With the recent Supreme Court decision to overturn Roe v. Wade, access to safe and legal reproductive care is once again in the national spotlight. A key issue central to this debate is patient privacy, as in the post-HITECH Act era it has become easier for medical records to be electronically accessed and shared. This study analyzed a large Twitter dataset from May to December 2022 to examine the public’s reactions to Roe v. Wade’s overruling and its implications for privacy. Using a mixed-methods approach consisting of computational and qualitative content analysis, we found a wide range of concerns voiced from the confidentiality of patient–physician information exchange to medical records being shared without patient consent. These findings may inform policy making and healthcare industry practices concerning medical privacy related to reproductive rights and women’s health.
WORKSHOPS
CASMI, TRAILS and FAS Collaborating with Federal Standards Body to Assess AI Impacts and Risks
The Northwestern Center for Advancing Safety of Machine Intelligence (CASMI) designed and led a workshop to support the expansion of the National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) guidance from a sociotechnical lens. CASMI co-hosted the workshop with TRAILS – the NIST-National Science Foundation (NSF) Institute for Trustworthy AI in Law & Society – and the Federation of American Scientists (FAS) on Oct. 16-17 at The George Washington University (GW) in Washington, D.C., to discuss the sociotechnical methods and processes that are necessary to develop a testbed, or a controlled environment to measure and validate AI systems.
Abigail Jacobs, assistant professor of information and of complex systems at the University of Michigan, collaborated with TRAILS and FAS to help CASMI organize two workshop, entitled “Operationalizing the Measure Function of the NIST AI Risk Management Framework and Sociotechnical Approaches to Measurement and Validation for Safety in AI.
Operationalizing the Measure Function of the NIST AI Risk Management Framework
October 16-17, 2023
Washington, D.C.
CASMI co-hosted this workshop with Abigail Jacobs (University of Michigan) and two other organizations: the Institute for Trustworthy AI in Law & Society (TRAILS) and the Federation of American Scientists (FAS). The goal was to support expansion of the Measure function of the National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF).
Sociotechnical Approaches to Measurement and Validation for Safety in AI
July 18-19, 2023
Abigail Jacobs and CASMI convened a range of scholars from academia, industry, and government to discuss how to meaningfully operationalize safe, functional AI systems by focusing on measurement and validity in the AI pipeline.