Focus on AI: Moral Reasoning | Critical Turns | Bridging Gaps

Monday, 07/28/2025

By Noor Hindi

University of Michigan School of Information faculty and PhD students are advancing the field of artificial intelligence through innovative research and impactful contributions. Here are some of their recent publications.

Publications

Human-Autonomy Collaboration for Escaping Local Minima,

Proceedings of the 34th IEEE International Conference on Robot and Human Interactive Communication, August 2025

Alia Gilbert, Gurnoor Kaur, Kevin Mendez, Yule Xie, Lionel Robert, Dawn Tilbury

Effective human supervision of autonomous robots in high-stakes scenarios requires efficient intervention, particularly when unmanned ground vehicles (UGVs) encounter local minima problems. This study investigates user interface designs to support human intervention in resolving such issues without a complete system takeover. We conducted a human-subjects experiment comparing two intervention methods: direct waypoint selection via mouse input and directional commands via arrow keys. Participants supervised two UGVs while simultaneously performing a secondary task, simulating real-world multitasking scenarios. Results demonstrate that mouse-based waypoint selection led to significantly more efficient UGV paths than arrow key controls and was also preferred by participants. Our findings contribute to the design of human-autonomy interfaces.

Estimating Situation Awareness for Human-Robot Teaming

Proceedings of the 34th IEEE International Conference on Robot and Human Interactive Communication, August 2025

Arsha Ali, Lionel Robert, Dawn M. Tilbury

When humans supervise multiple semiautonomous robots while also attending to their own tasks simultaneously, they may lack the situational awareness needed to assist their robot teammates. There is a need to monitor the human’s situation awareness in real-time, so interventions can be taken to improve poor situation awareness. While prior work has developed models to estimate human situation awareness, they rely heavily on advanced machine learning models and a single source of input through eye-tracking that can pose operational challenges. We develop a real-time human situation awareness estimator based on data from a human-robot teaming experiment. The situation awareness estimator uses simple and interpretable logistic regression models that take inputs from both eye-tracking and behavioral measures. Cross-validation demonstrated the situation awareness estimator had an average accuracy of 74%. The estimator is robust to missing inputs, and can monitor human situation awareness non-intrusively in real-time.

Can Robots Take Over Security? A Brief Review and Critique of Security Robot vs. Human Security Agent

Proceedings of the 34th IEEE International Conference on Robot and Human Interactive Communication, August 2025

Xin Ye, Lionel Robert

Security robots are becoming increasingly prevalent for maintaining law and order, offering cost efficiencies and safety benefits in hazardous environments. Despite these advantages, significant questions remain regarding the public acceptance of robots as replacements for human security agents. This paper presents a systematic literature review to explore whether there is a discernible public preference between human security personnel and their robotic counterparts. The review identifies a contextual pattern: individuals tend to prefer human agents in citizen-initiated interactions, and security robots in police-initiated ones. This paper offers valuable insights to guide the future design and deployment of security robots.

Rebranding Sex Robots: Realbotix's Corporate Metamorphosis

Proceedings of the 34th IEEE International Conference on Robot and Human Interactive Communication, August 2025

Annette M. Masterson, Lionel Robert

Robots are rapidly becoming more interactive and dyadic. With advancements in artificial intelligence and robotic movements, companies are shifting their corporate messaging to highlight the social and companionship features of their robots. Realbotix’s recent rebranding exemplifies a deliberate effort to carve a new path within the humanoid robotics industry. Grounded in political economy and discourse analysis, this paper examines 86 publicity interviews and press releases from Realbotix to assess the positioning of intimacy and its associated corporate power. The findings reveal a focus on the robot’s social intelligence, framing the company as a leader in humanoid robotics and reshaping human–robot interactions.

Sexual and Emotional Intimacy with Robots: A Brief Review

Proceedings of the 31st Americas Conference on Information Systems, August 2025

Annette M. Masterson, Shiyu Li, Lionel Robert

Interactive robots foster closer human-robot connections, but emotional and sexual intimacy are often conflated with other traits. This paper bridges theory and practice by providing a framework for understanding intimacy with physical robots, drawing on models of interpersonal intimacy. Through a systematic literature review and qualitative analysis, we clarify definitions of intimacy and examine the nuances of human-robot interactions. Key contributions include: (1) examining definitions of emotional and sexual intimacy, (2) integrating two thematic domains, and (3) highlighting key findings and research gaps. Results indicate a tendency to view intimacy as primarily emotional or physical closeness, emphasize its benefits, and highlight the need to redefine human-robot boundaries.

Threats to scientific software from over-reliance on AI code assistants

Nature Computational Science, July 2025

Elle O'Brien

The adoption of generative artificial intelligence (AI) code assistants in scientific software development is promising, but user studies across an array of programming contexts suggest that programmers are at risk of over-reliance on these tools, leading them to accept undetected errors in generated code. Scientific software may be particularly vulnerable to such errors because most research code is untested and scientists are undertrained in software development skills. This Comment outlines the factors that place scientific code at risk and suggests directions for research groups, educators, publishers and funders to counter these liabilities.

Design Knowledge in AI: Navigating Temporality and Continuity

DIS '25 Companion: Companion Publication of the 2025 ACM Designing Interactive Systems Conference, July 2025

Hyungjun Cho, Jiyeon Amy Seo, Heekyoung Jung, EunJeong Cheon, Woosuk Seo, Maria Luce Lupetti, James Pierce, Graham Dove

Artificial intelligence (AI) is advancing at a rapid pace, particularly with regard to large language models (LLMs) where capabilities, applications, and interfaces are frequently reshaped. While these developments create new opportunities for design research, they also bring forth important considerations regarding the temporality and continuity of design knowledge. The fast-paced evolution of AI reshapes research focus and methodologies, often rendering previously established knowledge obsolete. Additionally, emerging concerns related to usability, ethics, and societal impact necessitate ongoing reassessment of research priorities. Despite these challenges, foundational theories in design research remain relevant, offering valuable frameworks for sustaining design inquiry amid AI’s rapid progression. This one-day workshop examines how design research can navigate AI’s evolving technical landscape while fostering knowledge that remains relevant over time. It aims to equip design researchers with strategies to sustain meaningful contributions in an ever-changing technological landscape.

Local Minima Prediction using Dynamic Bayesian Filtering for UGV Navigation in Unstructured Environments

IFAC 14th Symposium on Robotics, July 2025

Seung Hun Lee, Wonse Jo, Lionel Robert, Dawn M. Tilbury

Path planning is crucial for the navigation of autonomous vehicles, yet these vehicles face challenges in complex and real-world environments. Although a global view may be provided, it is often outdated, necessitating the reliance of Unmanned Ground Vehicles (UGVs) on real-time local information. This reliance on partial information, without considering the global context, can lead to UGVs getting stuck in local minima. This paper develops a method to proactively predict local minima using Dynamic Bayesian filtering, based on the detected obstacles in the local view and the global goal. This approach aims to enhance the autonomous navigation of self-driving vehicles by allowing them to predict potential pitfalls before they get stuck, and either ask for help from a human, or re-plan an alternate trajectory.

Predicting Human Altruistic and Compliance Behaviors in Multiple-Operator Single-Agent (MOSA) Interaction

International Journal of Human–Computer Interaction, July 2025

Hyesun Chung, Ruiwei Jiang, Siqian Shen, X. Jessie Yang

Human interaction with autonomous technologies has been extensively studied, mostly focusing on one-to-one dyadic interactions. In contrast, this study examines human altruistic and compliance behaviors in multiple-operator single-agent (MOSA) interaction. We developed a testbed where multiple players perform an evacuation task, assisted by an AI agent that plans the optimal route for everyone. During the evacuation, players could exhibit altruism by reporting additional information, albeit at a personal cost. A lab study with 32 participants, each completing four trials under varying display configurations that manipulated the communication of altruistic actions, yielded 1,012 and 3,865 data points on altruism and compliance, respectively. Using mixed-effects logistic regression, we identified key predictors of altruistic and compliance behaviors and developed prediction models, with accuracies of 73.36% and 91.07%, respectively. These findings offer valuable insights into the role of information transparency, reciprocal altruism, and compliance in MOSA interaction, with implications for designing AI-assisted collaborative systems.

Bridging Gaps Between Student and Expert Evaluations of AI-Generated Programming Hints

L@S '25: Proceedings of the Twelfth ACM Conference on Learning @ Scale, July 2025

Tung Phung, Mengyan Wu, Heeryung Choi, Gustavo Soares, Sumit Gulwani, Adish Singla, Christopher Brooks

Generative AI has the potential to enhance education by providing personalized feedback to students at scale. Recent work has proposed techniques to improve AI-generated programming hints and has evaluated their performance based on expert-designed rubrics or student ratings. However, it remains unclear how the rubrics used to design these techniques align with students' perceived helpfulness of hints. In this paper, we systematically study the mismatches in perceived hint quality from students' and experts' perspectives based on the deployment of AI-generated hints in a Python programming course. We analyze scenarios with discrepancies between student and expert evaluations, in particular, where experts rated a hint as high-quality while the student found it unhelpful. We identify key reasons for these discrepancies and classify them into categories, such as hints not accounting for the student's main concern or not considering previous help requests. Finally, we propose and discuss preliminary results on potential methods to bridge these gaps, first by extending the expert-designed quality rubric and then by adapting the hint generation process, e.g., incorporating the student's comments or history. These efforts contribute toward scalable, personalized, and pedagogically sound AI-assisted feedback systems, which are particularly important for high-enrollment educational settings.

HCI and Older Adults: The Critical Turn and What Comes Next

Foundations and Trends in Human-Computer Interaction, June 2025

Amanda Lazar, Robin N. Brewer, Bran Knowles

Human-Computer Interaction (HCI) has long studied the design of technology for older adults. A critical turn problematizing how older adults were being framed gained momentum in the 2010s. The literature comprising this critical turn offered insights for what researchers should avoid in their work as well as high level future directions. Past work was critiqued for positioning older adults as incapable technology users, the same as one another, and chronically ill and in need of care. In this monograph, we summarize some of the research that followed and responded to the critiques that began this critical turn. We focus our review on three spaces: technology use, intersectionality, and care. We describe how researchers have fruitfully drawn upon other disciplines including feminist and critical studies, gerontology, social computing, and disability studies to further break down myths, generate knowledge, and open new research spaces. We include our view of the gaps that remain and what should come next.

What's in a Prompt?: A Large-Scale Experiment to Assess the Impact of Prompt Design on the Compliance and Accuracy of LLM-Generated Text Annotations

Proceedings of the International AAAI Conference on Web and Social Media, June 2025

Shubham Atreja, Joshua Ashkinaze, Lingyao Li, Julia Mendelsohn, Libby Hemphill

Manually annotating data for computational social science tasks can be costly, time-consuming, and emotionally draining. While recent work suggests that LLMs can perform such annotation tasks in zero-shot settings, little is known about how prompt design impacts LLMs' compliance and accuracy. We conduct a large-scale multi-prompt experiment to test how model selection (GPT-4o, GPT-3.5, PaLM2, and Falcon7b) and prompt design features (definition inclusion, output type, explanation, and prompt length) impact the compliance and accuracy of LLM-generated annotations on four highly relevant and diverse CSS tasks (toxicity, sentiment, rumor stance, and news frames). Our results show that LLM compliance and accuracy are prompt-dependent. For instance, prompting for numerical scores instead of labels reduces all LLMs' compliance and accuracy. Concise prompts can significantly reduce prompting costs but also lead to lower accuracy on tasks like toxicity. Furthermore, minor prompt changes like asking for an explanation can cause large changes in the distribution of LLM-generated labels. By assessing the impact of prompt design on the quality and distribution of LLM-generated annotations, this work serves as both a practical guide and a warning for using LLMs in CSS research.

Pre-prints, Working Papers, Articles, Workshops and Talks

Recommendation and Temptation

arXiv, July 2025

Md Sanzeed Anwar, Paramveer S. Dhillon, Grant Schoenebeck

Traditional recommender systems based on revealed preferences often fail to capture the fundamental duality in user behavior, where consumption choices are driven by both inherent value (enrichment) and instant appeal (temptation). Consequently, these systems may generate recommendations that prioritize short-term engagement over long-lasting user satisfaction. We propose a novel recommender design that explicitly models the tension between enrichment and temptation. We introduce a behavioral model that accounts for how both enrichment and temptation influence user choices, while incorporating the reality of off-platform alternatives. Building on this model, we formulate a novel recommendation objective aligned with maximizing consumed enrichment and prove the optimality of a locally greedy recommendation strategy. Finally, we present an estimation framework that leverages the distinction between explicit user feedback and implicit choice data while making minimal assumptions about off-platform options. Through comprehensive evaluation using both synthetic simulations and real-world data from the MovieLens dataset, we demonstrate that our approach consistently outperforms competitive baselines that ignore temptation dynamics either by assuming revealed preferences or recommending solely based on enrichment. Our work represents a paradigm shift toward more nuanced and user-centric recommender design, with significant implications for developing responsible AI systems that genuinely serve users’ long-term interests rather than merely maximizing engagement.

Are Economists Always More Introverted? Analyzing Consistency in Persona-Assigned LLMs

arXiv, June 2025

Manon Reusens, Bart Baesens, David Jurgens

Personalized Large Language Models (LLMs) are increasingly used in diverse applications, where they are assigned a specific persona—such as a happy high school teacher—to guide their responses. While prior research has examined how well LLMs adhere to predefined personas in writing style, a comprehensive analysis of consistency across different personas and task types is lacking. In this paper, we introduce a new standardized framework to analyze consistency in persona-assigned LLMs. We define consistency as the extent to which a model maintains coherent responses when assigned the same persona across different tasks and runs. Our framework evaluates personas across four different categories (happiness, occupation, personality, and political stance) spanning multiple task dimensions (survey writing, essay generation, social media post generation, single turn, and multi-turn conversations). Our findings reveal that consistency is influenced by multiple factors, including the assigned persona, stereotypes, and model design choices. Consistency also varies across tasks, increasing with more structured tasks and additional context. All code is available on GitHub.

The Impact of Generative AI on Social Media: An Experimental Study

arXiv, June 2025

Anders Giovanni Møller, Daniel M. Romero, David Jurgens, Luca Maria Aiello

Generative Artificial Intelligence (AI) tools are increasingly deployed across social media platforms, yet their implications for user behavior and experience remain understudied, particularly regarding two critical dimensions: (1) how AI tools affect the behaviors of content producers in a social media context, and (2) how content generated with AI assistance is perceived by users. To fill this gap, we conduct a controlled experiment with a representative sample of 680 U.S. participants in a realistic social media environment. The participants are randomly assigned to small discussion groups, each consisting of five individuals in one of five distinct experimental conditions: a control group and four treatment groups, each employing a unique AI intervention—chat assistance, conversation starters, feedback on comment drafts, and reply suggestions. Our findings highlight a complex duality: some AI-tools increase user engagement and volume of generated content, but at the same time decrease the perceived quality and authenticity of discussion, and introduce a negative spill-over effect on conversations. Based on our findings, we propose four design principles and recommendations aimed at social media platforms, policymakers, and stakeholders: ensuring transparent disclosure of AI-generated content, designing tools with user-focused personalization, incorporating context-sensitivity to account for both topic and user intent, and prioritizing intuitive user interfaces. These principles aim to guide an ethical and effective integration of generative AI into social media.

Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework

arXiv, June 2025

Mohna Chakraborty, Lu Wang, David Jurgens

Large language models (LLMs) are increasingly deployed in domains requiring moral understanding, yet their reasoning often remains shallow, and misaligned with human reasoning (Jiang et al. , 2021). Unlike humans, whose moral reasoning integrates contextual trade-offs, value systems, and ethical theories, LLMs often rely on surface patterns, leading to biased decisions in morally and ethically complex scenarios. To address this gap, we present a value-grounded framework for evaluating and distilling structured moral reasoning in LLMs. We benchmark 12 open-source models across four moral datasets using a taxonomy of prompts grounded in value systems, ethical theories, and cognitive reasoning strategies. Our evaluation is guided by four questions: (1) Does reasoning improve LLM decision-making over direct prompting? (2) Which types of value/ethical frameworks most effectively guide LLM reasoning? (3) Which cognitive reasoning strategies lead to better moral performance? (4) Can small-sized LLMs acquire moral competence through distillation? We find that prompting with explicit moral structure consistently improves accuracy and coherence, with first-principles reasoning and Schwartz’s + care-ethics scaffolds yielding the strongest gains. Furthermore, our supervised distillation approach transfers moral competence from large to small models without additional inference cost. Together, our results offer a scalable path toward interpretable and value-grounded models.

RELATED

Keep up with research from UMSI experts by subscribing to our free research roundup newsletter!

Topics

Send Email