Focus on AI: Robot Acceptance | AI Chatbots | Moral Economy
Monday, 09/22/2025
By Noor HindiUniversity of Michigan School of Information faculty and PhD students are advancing the field of artificial intelligence through innovative research and impactful contributions. Here are some of their recent publications.
Publications
Understanding Explanation Content for Cognitive and Affective Trust in Automated Vehicles
The Human Factors and Ergonomics Society, October 2025
Qiaoning Zhang, X. Jessie Yang, Lionel P. Robert Jr.
Trust is essential for the adoption and success of automated vehicles (AVs); however, many users remain skeptical about transferring control to these technologies. To build trust, AVs can provide explanations about their decisions, but what types of explanations matter most? This research explores how different kinds of AV explanations—describing what the vehicle is doing, explaining why it is doing it, or combining both elements—influence user trust. Through an experiment involving 121 U.S. drivers who viewed simulated driving scenarios, this study examined the effects on cognitive trust (related to perceptions of reliability and competence) and affective trust (related to emotional connection). The results indicate that providing explanations significantly improves cognitive trust compared to offering no explanations at all. More importantly, explanations that include reasons for actions particularly enhance affective trust, helping users feel emotionally connected to AVs. This study highlights the importance of transparent communication in AV design, showing that effective explanations are crucial not only for fostering user understanding but also for developing lasting emotional connections. These insights can guide future AV designs that encourage broader public acceptance and trust in automated driving technologies.
Understanding User Needs in Automated Vehicle Explanations: A Qualitative Approach
The Human Factors and Ergonomics Society, October 2025
Qiaoning Zhang, X. Jessie Yang, Lionel P. Robert Jr
Clear explanations about automated vehicle (AV) decisions are critical for enhancing user understanding and reducing uncertainty. However, how users perceive different AV explanations and how they would improve them remains underexplored. Through qualitative interviews, this study explored user responses to four types of AV explanations: no explanation, action explanations ("what"), reasoning explanations ("why"), and combined action and reasoning ("what and why"). Participants highlighted evident differences: the absence of explanations caused anxiety and confusion, while explanations offering reasons or actions alone had strengths and weaknesses. Combined explanations generally provided the best balance by enhancing predictability and transparency, though they occasionally risked information overload. Participants also suggested practical improvements for AV explanations, emphasizing the inclusion of visual cues, clear descriptions of consequences, and delivery through natural conversational speech. These insights underscore the importance of adaptable explanation designs tailored to diverse user preferences and contexts. This research provides user-driven recommendations for designing effective AV explanations, enhancing transparency, and strengthening public trust in automated driving technologies.
Voice Similarity and Its Impact on Cognitive and Affective Trust in Automated Vehicles
The Human Factors and Ergonomics Society, October 2025
Qiaoning Zhang, X. Jessie Yang, Lionel P. Robert Jr
Building user trust is critical for the widespread adoption of automated vehicles (AVs) as they become more integrated into our daily lives. This study explores how the voice used by AVs can influence two types of trust: cognitive trust (belief in the AV’s competence and reliability) and affective trust (emotional connection with the AV). Drawing from similarity-attraction theory, the research investigates whether users are more likely to trust AVs whose voices match their age and gender. In an online study involving over 300 U.S. drivers, participants experienced AV explanations delivered in voices that aligned with or differed from their demographic characteristics. The results revealed that users reported significantly higher cognitive and affective trust when the AV voice matched their own. Gender similarity strongly impacted both types of trust, while age similarity mainly affected affective trust. These findings highlight the power of personalized voice design in making AVs feel more relatable and trustworthy. This research offers valuable insights for designers and developers aiming to enhance human-AV interaction through more socially attuned and emotionally resonant communication strategies.
Rewarding Trust: How Reward Power Shapes Security Robot Acceptance
The Human Factors and Ergonomics Society, October 2025
As security robots take on more societal roles, public resistance can hinder their effectiveness. This study examines how a security robot's ability to offer rewards ("reward power") affects public acceptance and trust, which is vital for integrating robots into communities. Using a between-subjects experiment with 106 participants, we tested the impact of high versus low reward power through online video interactions. The results showed that reward power significantly increased robot acceptance by fostering trust during initial interactions. This research contributes to the field of human-security-robot interaction by highlighting the importance of reward power in building trust and acceptance. These findings provide design guidelines for improving public trust and acceptance, essential for successful real-world deployments of security robots.
Generalizing machine learning models from clinical free text
Nature, August 2025
Balaji Pandian, John Vandervest, Graciela Mentz, Jomy Varghese, Shavano D. Steadman, Sachin Kheterpal, Maggie Makar, V. G. Vinod Vydiswaran, Michael L. Burns
To assess strategies for enhancing the generalizability of healthcare artificial intelligence models, we analyzed the impact of preprocessing approaches applied to medical free text, compared single- versus multiple-institution data models, and evaluated data divergence metrics. From 1,607,393 procedures across 44 U.S. institutions, deep neural network models were created to classify anesthesiology Current Procedural Terminology codes from medical free text. Three levels of text preprocessing were analyzed from minimal to automated (cSpell) with comprehensive physician review. Kullback–Leibler Divergence and k-medoid clustering were used to predict single- vs multiple-institutional model performances. Single-institution models showed a mean accuracy of 92.5% [2.8% SD] and 0.923 [0.029] F1 on internal data but generalized poorly on external data (− 22.4% [7.0%]; − 0.223 [0.081]). Free text preprocessing minimally altered performance (+ 0.51% [2.23]; + 0.004 [0.020]). An all-institution model performed worse on internal data (-4.88% [2.43%]; − 0.045 [0.020]), but improved generalizability to external data (+ 17.1% [8.7%]; + 0.182 [0.073]). Compared to vocabulary overlap and Jaccard similarity, Kullback–Leibler Divergence correlated with model performance (R2 of 0.41 vs 0.16 vs 0.08, respectively) and was successful clustering institutions and identifying outlier data. Overall, pre-processing medical free text showed limited utility improving generalization of machine learning models, single institution models performed best but generalized poorly, while combined data models improved generalization but never achieved performance of single-institutional models. Kullback–Leibler Divergence provided valuable insight as a reliable heuristic to evaluate generalizability. These results have important implications in developing broad use artificial intelligence healthcare applications, providing valuable insight into their development and evaluations.
Readability Assessment and Comparison of Large Language Model-Generated Summaries of Trial Descriptions on ClinicalTrials.gov
National Library of Medicine, August 2025
Tzu-Chun Wu, Hanniel Shih, Anunita Nattam, Himaja Chintalapalli, David A Hanauer, Kai Zheng, Danny T Y Wu
This study evaluated the readability of ClinicalTrials.gov trial information using traditional readability measures (TRMs) and compared it to summaries generated by large language models (LLMs), specifically ChatGPT and a fine-tuned BART-Large-CNN (FBLC). The study involved: 1) assessing required reading levels (RRL) with TRMs, 2) generating sample LLM-based summaries, and 3) evaluating summary quality based on scores provided by two independent reviewers. The results show that the original ClinicalTrials.gov trial descriptions were scored above the recommended readability level. In contrast, ChatGPT-generated summaries had significantly lower RRLs and higher quality scores. We conclude that ChatGPT shows great promise of creating readable, high-quality summaries. Future research is warranted to assess whether LLMs could be a viable solution to improve the readability of ClinicalTrials.gov to facilitate comprehension by laypersons.
Privacy Perceptions in the Use of ChatGPT Across Different Contexts: A Survey Study of Commercial vs. University-specific Implementations
Twenty-First Symposium on Usable Privacy and Security, August 2025
Yuting Yang, Zixin Wang, Florian Schaub
As AI assistants like ChatGPT and Google Gemini become increasingly embedded in academic and everyday contexts, some universities have introduced institutional tools to address privacy and data security concerns. To examine how trust, usability, and privacy perceptions influence tool choice, we conducted a quantitative survey of 260 University of Michigan students, staff, and faculty. The survey collected data on usage patterns, perceived value, and user concerns, with additional open-text responses providing additional context. Results show that while commercial AI tools are preferred for their accuracy and efficiency, university-developed tools are rated higher on ethical standards, transparency, and data privacy. Paid commercial tools like ChatGPT Plus were rated significantly higher in user satisfaction and performance (p = 0.00089, paired t-test). These findings suggest that institutional tools could improve adoption by enhancing usability, while commercial tools may benefit from greater transparency and privacy safeguards.
Digital Health and AI Chatbots to Promote Sexual and Reproductive Health Among LBQ+ Women of Color
Medinfo, August 2025
Megan Threats, Yongjie Sha, Morgan Gray
This study investigated the acceptability of using digital health, including telemedicine (e.g., video calls on a computer or smartphone), mHealth (a mobile app, text messaging, and social media messages), and artificial intelligence-enabled chatbots to communicate health information and facilitate access and uptake of sexual and reproductive healthcare among lesbian, bisexual, and queer (LBQ+) women of color (WOC) in the United States.
The Moral Economy of AI
AAR '25: Proceedings of the sixth decennial Aarhus conference, August 2025
Yuchen Chen, Silvia Lindtner, Yuling Sun
The Chinese party-state frames AI as an ideal instrument to transform its rising elderly population from a national crisis into an opportunity. This must be done, its leaders argue, by integrating AI into society in ways that cultivate moral values of a harmonious society and traditional family structures. Drawing on ethnographic research on the implementation of three elderly care programs in Shanghai, we examine how the moral economy of AI comes into being, with a specific focus on the affective labor it necessitates from citizens. The lens of moral economy contributes to prior research on the political economy of technology and labor, as well as to discussions of AI and ethics.
Feeling like a State: Affect and Control in the Age of AI
AAR '25: Proceedings of the sixth decennial Aarhus conference, August 2025
Many commentators fear that AI enables new levels of surveillance and thus a crisis for liberal democracy. I argue that an obsession with surveillance has clouded other forms of control that are more difficult to notice, operating through the production and circulation of affect. Twenty years ago, the Aarhus Conference published some of the first critical writing on the themes of affect, AI, and control. I return to this groundbreaking work that offers fresh insights for how we understand contemporary governance of people, nature, and regions. This article offers a feminist ethnography to rethink control and pursue avenues for resistance and alternatives, bringing into conversation my observations from research in rural China and the use of AI in population management.
Understanding Predictive Models of Student Success with a Multiverse Analysis
Proceedings of the 18th International Conference on Educational Data Mining, July 2025
Yunxuan Tang, Emma Harvey, Chengyuan Yao, Renzhe Yu, Rene F. Kizilcec, Christopher Brooks
Predictive models of student success can provide timely information to inform interventions in K-12 and higher education. However, the design and implementation of these predictive models require various stakeholders to make decisions about the prediction target, data sources, processing, training, models, and deployment strategies. These choices are often poorly documented in the scholarly literature, even when code is openly available, limiting our ability to generalize and translate research findings to other institutions or contexts. More importantly, it obfuscates the potential trade-offs of decisions that are made with respect to prediction performance and other objectives, such as group fairness criteria. To address these challenges, we advocate for a multiverse approach in student success modeling and demonstrate the approach using a case study. In the multiverse framework, each plausible choice made to refine the problem space results in separate analyses being completed (each being referred to as a “universe"), with the final result being the collection of all universes explored. We demonstrate the mechanics and merits of this approach by building a first-year retention model for higher education. We interpret the findings of this analysis, specifically considering both model goodness-of-fit and fairness by group, demonstrating the value of the multiverse technique in engaging education-specific stakeholders—from administrative supervisors to model developers—in making predictive models that are robust, reproducible, and equitable.
Using large language models to categorize strategic situations and decipher motivations behind human behaviors
Proceedings of the National Academy of Sciences, July 2025
Yutong Xie, Qiaozhu Mei, Walter Yuan, Matthew O. Jackson
By varying prompts to a large language model, we can elicit the full range of human behaviors in a variety of different scenarios in classic economic games. By analyzing which prompts elicit which behaviors, we can categorize and compare different strategic situations, which can also help provide insight into what different economic scenarios might induce people to think about. We discuss how this provides a step toward a nonstandard method of inferring (deciphering) the motivations behind the human behaviors. We also show how this deciphering process can be used to categorize differences in the behavioral tendencies of different populations.
Plan More, Debug Less: Applying Metacognitive Theory to AI-Assisted Programming Education
Artificial Intelligence in Education, July 2025
Tung Phung, Heeryung Choi, Mengyan Wu, Adish Singla, Christopher Brooks
The growing adoption of generative AI in education highlights the need to integrate established pedagogical principles into AI-assisted learning environments. This study investigates the potential of metacognitive theory to inform AI-assisted programming education through a hint system designed around the metacognitive phases of planning, monitoring, and evaluation. Upon request, the system can provide three types of AI-generated hints–planning, debugging, and optimization–to guide students at different stages of problem-solving. Through a study with 102 students in an introductory data science programming course, we find that students perceive and engage with planning hints most highly, whereas optimization hints are rarely requested. We observe a consistent association between requesting planning hints and achieving higher grades across question difficulty and student competency. However, when facing harder tasks, students seek additional debugging but not more planning support. These insights contribute to the growing field of AI-assisted programming education by providing empirical evidence on the importance of pedagogical principles in AI-assisted learning.
Evaluating an AI Tutor for Bias Across Different Foundation Models
Artificial Intelligence in Education, July 2025
Aditya Vinodh, Emma Harvey, Husni Almoubayyed, Renzhe Yu, Christopher Brooks, Allison Koenecke, Rene F. Kizilcec
AI tutors are increasingly deployed to diverse groups of learners, raising the need to provide high-quality responses independent of the identity of learners who use them. We present a collaborative audit that assesses whether LiveHint AI, a large language model-based AI tutor that is currently under development by Carnegie Learning, meets this goal. We repeatedly prompt LiveHint AI with realistic student queries modified to include explicit or implicit statements of identity; e.g., identifying as a particular nationality or writing in a particular dialect. We then assess the responses based on their tone and level of detail. By evaluating different versions of LiveHint AI powered by GPT-4, GPT-4o, and Claude-3.5-Sonnet, we found that the choice of foundation model impacts the level of differentiation in responses. This differentiation may reflect pedagogical strategies (e.g., reducing text complexity when observing typos) or it may be undesirable (e.g., responding to an English prompt in a different language). Education researchers can use this approach to select foundation models that best fit their pedagogical approach, and build guardrails around potentially biased, inconsistent, or undesired behavior.
Applications of Generative AI to Support Teaching and Learning in Higher Education: A Half-Day Workshop
Artificial Intelligence in Education, July 2025
René F. Kizilcec, Ryan S. Baker, Christopher Brooks, Jadon Geathers, Yann Hicke, Steven Moore, Bo Wu
Generative AI-based technologies offer new ways to engage students in personalized, inclusive, and interactive learning experiences. This includes applications such as course assistants, simulations, interactive activities, and tailored feedback on assignments. The goal of this half-day workshop is to share ongoing work in this rapidly evolving space and discuss what features of implementations contribute to successful adoption and use in university courses. Bringing together researchers, educators, and developers, we will showcase cutting-edge systems and empirical findings that examine the effectiveness of AI in higher education. Through presentations and group activities, participants will collaboratively identify emerging trends, challenges, and best practices. The workshop aims to synthesize insights into a set of recommendations to guide future research and implementation of AI-enhanced teaching and learning strategies.
From Parts to Whole: How Trust in AI and Humans Shape System Trust
Proceedings of the Human Factors and Ergonomics Society Annual Meeting, July 2025
Hyesun Chung, X. Jessie Yang
An increasing number of studies on human-autonomy interaction (HAI) have started to expand their focus to investigating complex multi-agent HAI, involving multiple humans and/or agents within the interaction. The shift introduces new challenges. Organizational theory suggests that trust is multi-referent, meaning it is directed toward various targets. Within a human organization, trust typically has three possible referents: interpersonal, team, and organization. Applying to HAI, trust can be directed at three different referents: humans, autonomy, and the entire system. However, no study has yet examined the relationships between trust in these different referents. This study addresses this gap by exploring relationships between trust in three referents: the AI planner, peers, and the system. The study is conducted in a scenario where multiple people in the system must evacuate, with the ability to support each other by reporting roadblock information while receiving assistance from an AI planner that provides the shortest route.
Pre-prints, Working Papers, Articles, Workshops and Talks
Modeling Annotator Disagreement with Demographic-Aware Experts and Synthetic Perspectives
arXiv, August 2025
Yinuo Xu, Veronica Derricks, Allison Earl, David Jurgens
We present an approach to modeling annotator disagreement in subjective NLP tasks through both architectural and data-centric innovations. Our model, DEM-MOE (Demographic-Aware Mixture of Experts), routes inputs to expert subnetworks based on annotator demographics, enabling it to better represent structured, group-level variation compared to prior models. DEM-MOE consistently performs competitively across demographic groups, and shows especially strong results on datasets with high annotator disagreement. To address sparse demographic coverage, we test whether LLMgenerated synthetic annotations via zero-shot persona prompting can be used for data imputation. We show these synthetic judgments align moderately well with human annotations on our data and offer a scalable way to potentially enrich training data. We then propose and evaluate approaches for blending real and synthetic data using strategies tailored to dataset structure. We find that the optimal strategies depend on dataset structure. Together, these contributions improve the representation of diverse perspectives.
Distributional Alignment for Social Simulation with LLMs: A Prompt Mixture Modeling Approach
Open Review, July 2025
Yutong Xie, Ruoyi Gao, Qiaozhu Mei
Social simulation is crucial for understanding complex population dynamics across various disciplines. Recent advancements in large language models (LLMs) have significantly boosted this field. However, a persistent challenge remains, that is to accurately capture the inherent distributional diversity of social activities. In this work, we propose a novel methodology for distributional alignment in social simulation by modeling social behavior or social attribute distributions as a mixture of system prompts. We introduce expectation-maximization (EM) and gradient boosting algorithms specifically designed for LLMs to efficiently identify the effective prompt mixtures. We demonstrate superior performance in two fundamental social simulation tasks: simulating personality traits and economic behaviors. Compared to existing approaches, our method significantly reduces disparities in the simulated populations, yielding distributions that closely match the observed realistic data. Our tool offers a robust solution for accurately simulating diverse social populations, promising to facilitate advancements across social sciences and related fields.
RELATED
Keep up with research from UMSI experts by subscribing to our free research roundup newsletter!