Focus on AI: Human–Robot Team Effectiveness

Monday, 02/09/2026

Last Updated: Monday, 02/09/2026

By Noor Hindi

University of Michigan School of Information faculty and PhD students are advancing the field of artificial intelligence through innovative research and impactful contributions. Here are some of their recent publications.

Publications

Promoting Human–Robot Team Effectiveness: Shared Mental Models and Communication Improve Team Situation Awareness and Performance

IEEE, January 2025

Arsha Ali, Jonathon M. Smereka, Kayla Riegner, Lionel P. Robert Jr, Dawn M. Tilbury

Human-robot teaming can benefit many domains. Teams with sufficient team situation awareness may better accomplish their goals, but team situation awareness can be challenging to develop and maintain. We interpret team situation awareness as the team’s collective understanding of the whole situation at a given time. In order to determine how team situation awareness can be developed and maintained in a human-robot team, we conducted a between-subjects experiment to investigate how shared mental models and communication impact team situation awareness, and how team situation awareness relates to performance. Results from 48 subjects showed the impact of shared mental models is relative to communication. A high shared mental model improved team situation awareness and performance efficiency when there was little communication, while the level of shared mental model was inconsequential when high communication was provided. In addition, team situation awareness was positively related to performance efficiency. The findings indicate that team situation awareness can be achieved through either high communication or a high shared mental model under limited communication, which consequently allows for improved performance.

The Structure of Major Life Transitions Among Older Suicide Decedents: An Application of Large Language Models

National Library of Medicine, December 2025

Briana Mezuk, Viktoryia Kalesnikava, David Jurgens, Lily Johns, Kara Zivin

The role of life events in shaping suicide risk for older adults is unclear. Structure and meaning of events are interconnected: structure shapes the relationship between event elements (e.g., timing), with meaning emerging through their dynamic interactions. Large language models (LLM) provide an opportunity to analyze textual data at scale, but they have not yet been widely used to understand the contributing circumstances of suicide. In this study, we applied LLMs to narrative texts from the National Violent Death Reporting System (2003-2022), a nationwide registry of suicide deaths, to (1) identify salient events in the lives of suicide decedents aged 50 + (n = 164,240), (2) classify aspects of the structure of these events (e.g., timing, sequence, expectedness), and (3) explore variation by age. Mean age of decedents was 63 years, 24% were female, 88% were non-Hispanic White; 54% did not have a known mental health problem at the time of their death. Most narratives described sequences of life events involving interlinked health or family problems that often escalated days before suicide (e.g., mobility loss after a recent surgery); less frequently, narratives mentioned isolated events, referring to financial or relationship crises. For descendants aged 65+, events often involved legal issues or relationship loss (e.g., widowhood), while those aged <65 events typically related to employment or relationship discord. Preliminary analyses demonstrate the feasibility of identifying structural elements of life events using LLMs; next steps will include refining prompts to improve classification of complex transitions from these texts.

Artificial Intelligence Use and Views: Key Findings From the National Poll on Healthy Aging

National Library of Medicine, December 2025

Erica Solway, Robin Brewer, J Scott Roberts, Dianne Singer, Sydney Strunk, Nicholas Box, Matthias Kirch

The spread of artificial intelligence (AI) technologies offers opportunities to improve home safety, provide health information, and support aging in place, but it also raises concerns about risks to health and well-being. Little is known about older adults’ use and views on AI related to healthy aging. In February 2025, the University of Michigan National Poll on Healthy Aging surveyed a diverse sample of 2,883 adults age 50 and older about their experiences with and views on AI technologies. Overall, 55% of adults age 50 and older said they had ever used AI technologies that one speaks or types to, with 14% using AI for health information. Many reported benefits of using AI-powered devices, including voice assistants and home security devices and systems, for aging in place. Overall, 35% of adults age 50 and older reported interest in using AI in their day-to-day lives. At the same time, 92% of adults age 50 and older agreed they want to know if the information they receive is from a person or from AI, and 81% wanted to learn more about the risks of AI. Nearly half (46%) had very little or no trust in AI-generated health information. This presentation will describe notable differences in AI use and perspectives by demographic subgroups and will conclude with a discussion of how these findings can be used for the development of programs to assist older adults in understanding the risks of AI and in safely and effectively using AI to support healthy aging.

Acceptability and Use of Digital Health and Artificial Intelligence–Enabled Chatbots for Sexual and Reproductive Health Among Lesbian, Bisexual, and Queer Women of Color in the United States: Cross-Sectional Survey Study

Journal of Medical Internet Research, December 2025

Megan Threats, Morgan Gray

Background: Cisgender lesbian, bisexual, and queer (LBQ+) women of color experience barriers to accessing sexual and reproductive health (SRH) services in the United States. Barriers, including limited provider access and poor patient-provider communication, contribute to SRH service underutilization and poorer outcomes among these women than their heterosexual counterparts. Digital health modalities, including telemedicine, mobile health, and chatbots enabled by artificial intelligence (AI), offer potential to expand access to SRH information and services among these women.

Objective: This study investigated the influencing factors, acceptability, and concerns regarding the use of digital health modalities (video calls, SMS text messaging, and mobile apps) and AI-enabled chatbots to support SRH information and service access among LBQ+ women of color in the United States. It also assessed their awareness and knowledge of human papillomavirus (HPV) and cervical cancer prevention, and attitudes toward HIV prevention medication.

Methods: A self-administered online survey was conducted from November 2020 to March 2021 with 285 LBQ+ women of color (aged ≥18 years) residing in the United States. The 88-item survey assessed digital health use, SRH knowledge and awareness, and acceptability of and concerns about digital health use for SRH information and services. Data were analyzed using descriptive statistics, Fisher exact tests, multivariable logistic regression, and thematic analysis.

Results: Most respondents (233/285, 81.8%) were comfortable using video calls to communicate with health care providers for SRH support. Respondents with a bachelor’s degree or higher (95% CI 0.00‐0.24), with health insurance (95% CI 56.1‐1025.7), and without a usual place of care (95% CI 0.07‐0.43) were significantly (P<.001) more likely to agree with using video calls. Respondents with a bachelor’s degree or higher (95% CI 0.23‐0.74), aged <45 years (95% CI 0.07‐0.25), and with health insurance (95% CI 3.23‐12.45) were significantly (P<.001) more likely to agree with using mobile apps. Respondents aged ≥45 years (95% CI 0.14‐0.53), without health insurance (95% CI 0.01‐0.06), and with an income of <US $49,000 (95% CI 1.32‐3.93) were significantly (P<.001) more likely to agree with the use of SMS text messaging. There was high acceptance of using chatbots for self-assessing sexually transmitted infection risk (229/285, 80.3%) but lower acceptance for self-assessing cervical cancer risk (136/285, 47.7%). Key concerns included data privacy and confidentiality, lack of affective communication, and technology connectivity and digital literacy issues. Respondents also demonstrated low knowledge of HPV and cervical cancer prevention.

Conclusions: Digital health was highly acceptable for supporting access to SRH information and services among LBQ+ women of color. Culturally tailored digital tools and interventions could improve awareness, knowledge, and attitudes toward SRH services. Addressing various digital literacy levels, data privacy concerns, and technology access and communication issues when developing digital health solutions may advance SRH equity among LBQ+ women of color.

Beyond Autonomy: A Plastic Surgeon's Responsibility in the Face of AI-Driven Misinformation

Annals of Plastic Surgery, December 2025

Pranav Rajaram, Megan Lane, Nazanin Andalibi, Oliver Haimson, Rachel Hooper, Hannes Prescher

A promising development in medical practice is a shift from a “doctor knows best” model toward patient autonomy rooted in shared decisions, transparent risk, and respect for values.1,2 Increasing Internet access, social media communities, and more recently generative large language models (LLMs) such as ChatGPT (OpenAI, San Francisco, CA) may have helped push patient autonomy forward by providing near-instant access to clinical information from trusted sources, but can also relay harmful medical misinformation. LLMs' fluent style can mask factual hallucinations and transmit the same gaps that already trouble online medical content.1 As a result, medical visits increasingly begin with fact-checking claims, shifting time away from substantive decision making toward correction. We argue that without deliberate safeguards for autonomy, increasing patient reliance on LLMs can pull clinical encounters back toward paternalism.

Ads that Talk Back: Implications and Perceptions of Injecting Personalized Advertising into LLM Chatbots

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, December 2025

Brian Jay Tang, Kaiwen Sun, Noah T. Curran, Florian Schaub, Kang G. Shin

Recent advances in large language models (LLMs) have enabled the creation of highly effective chatbots. However, the compute costs of widely deploying LLMs have raised questions about profitability. Companies have proposed exploring ad-based revenue streams for monetizing LLMs, which could serve as the new de facto platform for advertising. This paper investigates the implications of personalizing LLM advertisements to individual users via a between-subjects experiment with 179 participants. We developed a chatbot that embeds personalized product advertisements within LLM responses, inspired by similar forays by AI companies. The evaluation of our benchmarks showed that ad injection only slightly impacted LLM performance, particularly response desirability. Results revealed that participants struggled to detect ads, and even preferred LLM responses with hidden advertisements. Rather than clicking on our advertising disclosure, participants tried changing their advertising settings using natural language queries. We created an advertising dataset and an open-source LLM, Phi-4-Ads, fine-tuned to serve ads and flexibly adapt to user preferences.

SemNovel – A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models

Journal of Biomedical Informatics, December 2025

Xueqing Peng, Yutong Xie, Huan He, Brian Ondov, Kalpana Raja, Qijia Liu, Qiaozhu Mei, Hua Xu

Objective: The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.

Methods: We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.

Results: The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (ρ = 0.1782, p < 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (p < 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.

Conclusion: SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.

The AI-aging-enterprise: a political economy of aging and artificial intelligence

The Gerontologist, November 2025

Vera Gallistl, Clara Berridge, Muneeb Ul Lateef Banday, Justyna Stypinska, Anita Ho, Robin N Brewer, Alisa Grigorovich, Alexander Peine, Anna Wanka

The current discourse on artificial intelligence (AI) in gerontology remains mostly on an interventionist level and focused on solving problems faced by individuals, leaving the wider social conditions that shape the relationship between aging and AI out of view. The considerable accumulation of power, particularly for technology development companies, in the development and implementation of AI, however, calls for a deeper and more complex analysis of the relationships between AI, older adults and the “aging enterprise.” Building on the classical political economy of aging, and expanding it with concepts from material gerontology, we propose a “political economy of aging and AI” as a conceptual tool to analyze the relationships between (older) individuals, social and political structures, and (technological) materialities. We exemplify how such a political economy perspective enables a more complex consideration of the current discourse on (a) AI solutions, (b) AI ethics, and (c) AI policies. We close by summarizing our understanding of a political economy of aging and AI that centers materialities as the critical-gerontological focus of analysis and by outlining calls for action toward strengthening the power of older adults in the emerging AI-aging-enterprise.

Demo: Networked iGYM for AR Exergames

ACM MOBICOM '25: Proceedings of the 31st Annual International Conference on Mobile Computing and Networking, November 2025

Brandon McDonald, Michael Nebeling, Roland Graf, Hun-Seok Kim, Jiasi Chen

iGYM is an augmented reality exercise game for inclusive play that allows people with and without wheelchairs to participate equally in a soccer game with a projected virtual field and ball. However, currently iGYM requires all players to be co-located and lacks capabilities for remote play. In this work, we describe a networked iGYM implementation that allows teams of players to play with each other remotely. The key networking challenge is meeting tight end-to-end latency requirements for interactive play over the Internet. We demonstrate a portable tabletop version of iGYM, implemented in Unity and ROS2, using a distributed authority model to keep track of ownership and propagate state updates about players, game objects, and scores. Each client receives local player tracking updates and spawns player peripersonal circles under its own authority; a shared ball object is synchronized under server ownership with clientside interpolation. The demo GUI will let attendees inject artificial network delay to explore its impact on interactivity.

Pre-prints, Working Papers, Articles, Workshops and Talks

Ethical AI Use in Galleries, Libraries, Archives, and Museums (GLAM) (Workshop)

Ethics, AI, and the Public Humanities Speaker Series, February 2026

Thursday, February 26, 2026, 2:00 pm Eastern

Ken Axford, Hope Dunbar, Jesse Johnston

Of the jobs most threatened by artificial intelligence, Microsoft has named “Historian” as number two on their list. In their “Guiding Principles for Artificial Intelligence in History Education” (August 5, 2025), the American Historical Association acknowledges the power of AI, yet recognizes that its utility cannot replace our ability as historians to “appreciate the complexity of our shared past and what it means to be human.” In this four-part speaker series, the National Council on Public History (NCPH) and the American Conservation Experience (ACE) Mellon Humanities Program will bring together experts to help us consider the ways that artificial intelligence is reshaping our world and our work.

Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP

arXiv, January 2026

Yinuo Xu, David Jurgens

Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis. While early approaches treated disagreement as noise to be removed, recent work increasingly models it as a meaningful signal reflecting variation in interpretation and perspective. This survey provides a unified view of disagreement-aware NLP methods. We first present a domain-agnostic taxonomy of the sources of disagreement spanning data, task, and annotator factors. We then synthesize modeling approaches using a common framework defined by prediction targets and pooling structure, highlighting a shift from consensus learning toward explicitly modeling disagreement, and toward capturing structured relationships among annotators. We review evaluation metrics for both predictive performance and annotator behavior, and noting that most fairness evaluations remain descriptive rather than normative. We conclude by identifying open challenges and future directions, including integrating multiple sources of variation, developing disagreement-aware interpretability frameworks, and grappling with the practical tradeoffs of perspectivist modeling.

Empathy Applicability Modeling for General Health Queries

arXiv, January 2026

Shan Randhawa, Agha Ali Raza, Kentaro Toyama, Julie Hui, Mustafa Naseem

LLMs are increasingly being integrated into clinical workflows, yet they often lack clinical empathy, an essential aspect of effective doctor-patient communication. Existing NLP frameworks focus on reactively labeling empathy in doctors' responses but offer limited support for anticipatory modeling of empathy needs, especially in general health queries. We introduce the Empathy Applicability Framework (EAF), a theory-driven approach that classifies patient queries in terms of the applicability of emotional reactions and interpretations, based on clinical, contextual, and linguistic cues. We release a benchmark of real patient queries, dual-annotated by Humans and GPT-4o. In the subset with human consensus, we also observe substantial human-GPT alignment. To validate EAF, we train classifiers on human-labeled and GPT-only annotations to predict empathy applicability, achieving strong performance and outperforming the heuristic and zero-shot LLM baselines. Error analysis highlights persistent challenges: implicit distress, clinical-severity ambiguity, and contextual hardship, underscoring the need for multi-annotator modeling, clinician-in-the-loop calibration, and culturally diverse annotation. EAF provides a framework for identifying empathy needs before response generation, establishes a benchmark for anticipatory empathy modeling, and enables supporting empathetic communication in asynchronous healthcare.

Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages

arXiv, December 2025

Lechen Zhang, Yusheng Zhou, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, David Jurgens

System prompts provide a lightweight yet powerful mechanism for conditioning large language models (LLMs) at inference time. While prior work has focused on English-only settings, real-world deployments benefit from having a single prompt to operate reliably across languages. This paper presents a comprehensive study of how different system prompts steer models toward accurate and robust cross-lingual behavior. We propose a unified four-dimensional evaluation framework to assess system prompts in multilingual environments. Through large-scale experiments on five languages, three LLMs, and three benchmarks, we uncover that certain prompt components, such as CoT, emotion, and scenario, correlate with robust multilingual behavior. We develop a prompt optimization framework for multilingual settings and show it can automatically discover prompts that improve all metrics by 5-10%. Finally, we analyze over 10 million reasoning units and find that more performant system prompts induce more structured and consistent reasoning patterns, while reducing unnecessary language-switching. Together, we highlight system prompt optimization as a scalable path to accurate and robust multilingual LLM behavior.

Reflection-Satisfaction Tradeoff: Investigating Impact of Reflection on Student Engagement with AI-Generated Programming Hints

arXiv, December 2025

Heeryung Choi, Tung Phung, Mengyan Wu, Adish Singla, Christopher Brooks

Generative AI tools, such as AI-generated hints, are increasingly integrated into programming education to offer timely, personalized support. However, little is known about how to effectively leverage these hints while ensuring autonomous and meaningful learning. One promising approach involves pairing AI-generated hints with reflection prompts, asking students to review and analyze their learning, when they request hints. This study investigates the interplay between AI-generated hints and different designs of reflection prompts in an online introductory programming course. We conducted a two-trial field experiment. In Trial 1, students were randomly assigned to receive prompts either before or after receiving hints, or no prompt at all. Each prompt also targeted one of three SRL phases: planning, monitoring, and evaluation. In Trial 2, we examined two types of prompt guidance: directed (offering more explicit and structured guidance) and open (offering more general and less constrained guidance). Findings show that students in the before-hint (RQ1), planning (RQ2), and directed (RQ3) prompt groups produced higher-quality reflections but reported lower satisfaction with AI-generated hints than those in other conditions. Immediate performance did not differ across conditions. This negative relationship between reflection quality and hint satisfaction aligns with previous work on student mental effort and satisfaction. Our results highlight the need to reconsider how AI models are trained and evaluated for education, as prioritizing user satisfaction can undermine deeper learning.

RELATED

Keep up with research from UMSI experts by subscribing to our free research roundup newsletter!

Topics

Artificial Intelligence

Send Email