University of Michigan School of Information
Robot Power | Emotion AI | Superchats: AI in Focus

Friday, 03/28/2025
By Noor HindiUniversity of Michigan School of Information faculty and PhD students are advancing the field of artificial intelligence through innovative research and impactful contributions. Here are some of their recent publications.
Publications
Understanding entangled human-technology-world relations: use of intelligent voice assistants by older adults
IConference, March 2025
Alisha Pradham, Shaan Chopra, Pooja Upadhyay, Robin Brewer, Amanda Lazar
Introduction. Emerging technologies like intelligent voice assistants or social robots can shape human relations with the world. To illustrate how an emerging technology mediates relations and shapes social practices in the context of aging, we present findings on use of voice assistants by older adults.
Method. We analyzed interviews with 24 older adults by adopting a post-phenomenological perspective to examine how an emerging technology actively mediates relations between older individuals and their larger social world.
Results. Our findings surface the different types of relations that voice assistants mediate between older adults and their larger social world, unpacking how these relations shape social practices around what it means to give company to pets, to live alone, or to give and receive care.
Discussion. We discuss implications for understanding the mutually constitutive relations between older adults and the emerging technologies they use and opportunities in designing to support neglected relations, and accounting for nonhuman actors in technology and aging research.
Conclusion. We provide a preliminary understanding on how an emerging technology shapes social practices in later life. This understanding is crucial for aging and technology research, as several emerging technologies (e.g., social robots) target older adults, yet little is known about the relationships and discursive practices that shape their use.
Repairing Trust in Robots?: A Meta-analysis of HRI Trust Repair Studies with A No-Repair Condition
HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, March 2025
Connor Esterwood, Lionel P. Robert
As robots become more integrated into various sectors, understanding human–robot interaction (HRI) dynamics, particularly trust repair, is crucial for successful collaboration. For this paper, the authors conducted a meta-analysis of 22 HRI trust repair studies with 3,763 participants to evaluate the effectiveness of strategies for restoring trust after breaches relative to offering no repair. The analysis identified three key findings: (1) strategies are differentially effective, showing limited success in restoring trustworthiness; (2) the overall impact on repairing trust is marginal, with a small effect size; and (3) apologies and explanations are the most effective strategies for trust repair. These insights enrich HRI literature by providing a comprehensive evaluation of trust repair mechanisms, offering valuable guidance for future research and practical improvements in human–robot collaboration.
Virtually the Same or Realistically Different?: A Meta-analysis of Real vs. ‘Not So Real’ Robots
HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, March 2025
Connor Esterwood, Ruijia (Hannah) Guan, Xin Ye, Lionel P. Robert
This study examined an important debate in Human-Robot Interaction (HRI) research: the suitability of nonphysically non-collocated robots instead of physically collocated robots for HRI research. This meta-analysis (N=34 studies) examined the equivalence of physically and non-physically collocated robots in HRI research, focusing on anthropomorphism, social presence, and user engagement. No significant differences were found, suggesting that non-physical representations are viable alternatives. However, observed heterogeneity indicates potential moderating factors (e.g., task complexity, user characteristics, design features) warranting further investigation. These findings inform choices in resource-constrained environments.
Security Robot Power and Acceptance: Exploring French and Raven's Five Forms of Power
HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, March 2025
The increasing deployment of robots in authority roles, such as security, necessitates understanding public acceptance of robot-exercised power. This study investigated the relationship between perceived power bases (expert, legitimate, referent, reward, coercive) and public acceptance of security robots. One hundred participants viewed videos depicting robot-citizen interactions. Results revealed positive correlations between perceived expert and legitimate power and acceptance and a negative correlation between perceived coercive power and acceptance; reward power showed no significant relationship. These findings contribute to the human-robot interaction (HRI) literature by demonstrating the influence of perceived power on public acceptance and offering design guidelines for enhancing the acceptance of security robots by emphasizing expertise and legitimate authority while minimizing coercive tactics.
Emotion AI Will Not Fix the Workplace
Association for Computing Machinery Interactions Magazine, February 2025
The European Union recently decided to ban emotion AI in the workplaces and education settings, with exceptions for medical and safety settings. This ban took effect on August 1, 2024. Technologists, policymakers, and the public may wonder what this means for the U.S. and how the EU might regulate emotion AI better.
Socially Aware Language Technologies: Perspectives and Practices
Computational Linguistics, February 2025
Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank
Language technologies have advanced substantially, particularly with the introduction of large language models. However, these advancements can exacerbate several issues that models have traditionally faced, including bias, evaluation, and risk. In this perspective paper, we argue that many of these issues share a common core: a lack of awareness of the social factors, interactions, and implications of the social environment in which NLP operates. We call this social awareness. While NLP is improving at addressing linguistic issues, there has been relatively limited progress in incorporating social awareness into models to work in all situations for all users. Integrating social awareness into NLP will improve the naturalness, usefulness, and safety of applications while also opening up new applications. Today, we are only at the start of a new, important era in the field.
The Development and Validation of the Critical Reflection and Agency in Computing Scale
SIGCSETS 2025, Proceedings of the 56th ACM Technical Symposium on Computer Science Education, February 2025
Aadarsh Padiyath, Mark Guzdial, Barbara Ericson
As discussions of computing's impact on society increase in public discourse, so does recognition for computing students to address the ethical and sociotechnical implications of their work. While efforts to integrate issues of ethics and social justice into computing curricula are nascent, we lack a standardized measure to monitor our progress towards these goals. In this poster, we report on the development and validation of the Critical Reflection and Agency in Computing Index, a novel instrument designed to assess undergraduate computing students' attitudes towards practicing critically conscious computing. The resulting index is a theoretically grounded, expert-reviewed tool with evidence for reliability and validity to support research and practice in computing ethics education. This enables researchers and educators to gain insights into students' perspectives, inform the design of targeted ethics interventions, and monitor the effectiveness of computing ethics education initiatives.
Benchmarking LLMs' Judgments with No Gold Standard
The Thirteenth International Conference on Learning Representations, January 2025
Shengwei Xu, Yuxuan Lu, Grant Schoenebeck, Yuqing Kong
We introduce the GEM (Generative Estimator for Mutual Information), an evaluation metric for assessing language generation by Large Language Models (LLMs), particularly in generating informative judgments, without the need for a gold standard reference. GEM broadens the scenarios where we can benchmark LLM generation performance-from traditional ones, like machine translation and summarization, where gold standard references are readily available, to subjective tasks without clear gold standards, such as academic peer review. GEM uses a generative model to estimate mutual information between candidate and reference responses, without requiring the reference to be a gold standard. In experiments on a human-annotated dataset, GEM demonstrates competitive correlations with human scores compared to the state-of-the-art GPT-4o Examiner, and outperforms all other baselines. Additionally, GEM is more robust against strategic manipulations, such as rephrasing or elongation, which can artificially inflate scores under a GPT-4o Examiner. We also present GRE-bench (Generating Review Evaluation Benchmark) which evaluates LLMs based on how well they can generate high-quality peer reviews for academic research papers. Because GRE-bench is based upon GEM, it inherits its robustness properties. Additionally, GRE-bench circumvents data contamination problems (or data leakage) by using the continuous influx of new open-access research papers and peer reviews each year. We show GRE-bench results of various popular LLMs on their peer review capabilities using the ICLR2023 dataset.
Pre-prints, Working Papers, Articles, Reports, Workshops and Talks
Who Reaps All the Superchats? A Large-Scale Analysis of Income Inequality in Virtual YouTuber Livestreaming
arXiv, March 2025
Ruijing Zhao, Brian Diep, Jiaxin Pei, Dongwook Yoon, David Jurgens, Jian Zhu
The explosive growth of Virtual YouTubers (VTubers)-streamers who perform behind virtual anime avatars-has created a unique digital economy with profound implications for content creators, platforms, and viewers. Understanding the economic landscape of VTubers is crucial for designing equitable platforms, supporting content creator livelihoods, and fostering sustainable digital communities. To this end, we conducted a large-scale study of over 1 million hours of publicly available streaming records from 1,923 VTubers on YouTube, covering tens of millions of dollars in actual profits. Our analysis reveals stark inequality within the VTuber community and characterizes the sources of income for VTubers from multiple perspectives. Furthermore, we also found that the VTuber community is increasingly monopolized by two agencies, driving the financial disparity. This research illuminates the financial dynamics of VTuber communities, informing the design of equitable platforms and sustainable support systems for digital content creators.
Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions
arXiv, February 2025
Matthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, Dirk Hovy
People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person’s sociodemographic characteristics. LLMs have also been used to label data, but recent work has shown that models perform poorly when prompted with sociodemographic attributes, suggesting limited inherent sociodemographic knowledge. Here, we ask whether LLMs can be trained to be accurate sociodemographic models of annotator variation. Using a curated dataset of five tasks with standardized sociodemographics, we show that models do improve in sociodemographic prompting when trained but that this performance gain is largely due to models learning annotator-specific behaviour rather than sociodemographic patterns. Across all tasks, our results suggest that models learn little meaningful connection between sociodemographics and annotation, raising doubts about the current use of LLMs for simulating sociodemographic variation and behaviour.
When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models
arXiv, February 2025
Julia Mendelsohn, Ceren Budak
Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. WATER or VERMIN). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.
Tokenization is Sensitive to Language Variation
arXiv, February 2025
Anna Wegmann, Dong Nguyen, David Jurgens
Variation in language is ubiquitous and often systematically linked to regional, social, and contextual factors. Tokenizers split texts into smaller units and might behave differently for less common linguistic forms. This might affect downstream LLM performance differently on two types of tasks: Tasks where the model should be robust to language variation (e.g., for semantic tasks like NLI, labels do not depend on whether a text uses British or American spelling) and tasks where the model should be sensitive to language variation (e.g., for formbased tasks like authorship verification, labels depend on whether a text uses British or American spelling). We pre-train BERT base models for the popular Byte-Pair Encoding algorithm to investigate how key algorithmic design choices impact downstream models’ performances: fitting corpus, pre-tokenizer and vocabulary size. We find that the best tokenizer varies on the two task types—with the pre-tokenizer having the biggest impact on performance. Further, we introduce a new approach to estimate tokenizer impact on downstream LLM performance, showing significant improvement over techniques like Rényi efficiency. We encourage more work on language variation and its relation to tokenizers and thus LLM performance.
Neurobiber: Fast and Interpretable Stylistic Feature Extraction
arXiv, February 2025
Kenan Alkiek, Anna Wegmann, Jian Zhu, David Jurgens
Linguistic style is pivotal for understanding how texts convey meaning and fulfill communicative purposes, yet extracting detailed stylistic features at scale remains challenging. We present NEUROBIBER, a transformer-based system for fast, interpretable style profiling built on Biber’s Multidimensional Analysis (MDA). NEUROBIBER predicts 96 Biber-style features from our open-source BIBERPLUS library—a Python toolkit that computes stylistic features and provides integrated analytics (e.g., PCA, factor analysis). Despite being up to 56 times faster than existing open source systems, NEUROBIBER replicates classic MDA insights on the CORE corpus and achieves competitive performance on the PAN 2020 authorship verification task without extensive retraining. Its efficient and interpretable representations readily integrate into downstream NLP pipelines, facilitating large-scale stylometric research, forensic analysis, and real-time text monitoring. All components are made publicly available.
The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs
arXiv, January 2025
Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek
Conceptual operationalizations of empathy in NLP are varied, with some having specific behaviors and properties, while others are more abstract. How these variations relate to one another and capture properties of empathy observable in text remains unclear. To provide insight into this, we analyze the transfer performance of empathy models adapted to empathy tasks with different theoretical groundings. We study (1) the dimensionality of empathy definitions, (2) the correspondence between the defined dimensions and measured/observed properties, and (3) the conduciveness of the data to represent them, finding they have a significant impact to performance compared to other transfer setting features. Characterizing the theoretical grounding of empathy tasks as direct, abstract, or adjacent further indicates that tasks that directly predict specified empathy components have higher transferability. Our work provides empirical evidence for the need for precise and multidimensional empathy operationalizations.
Bridging AI and Science: Implications from a Large-Scale Literature Analysis of AI4Science
arXiv, February 2025
Yutong Xie*, Yijun Pan*, Hua Xu, Qiaozhu Mei
Artificial Intelligence has proven to be a transformative tool for advancing scientific research across a wide range of disciplines. However, a significant gap still exists between AI and scientific communities, limiting the full potential of AI methods in driving broad scientific discovery. Existing efforts in identifying and bridging this gap have often relied on qualitative examination of small samples of literature, offering a limited perspective on the broader AI4Science landscape. In this work, we present a large-scale analysis of the AI4Science literature, starting by using large language models to identify scientific problems and AI methods in publications from top science and AI venues. Leveraging this new dataset, we quantitatively highlight key disparities between AI methods and scientific problems, revealing substantial opportunities for deeper AI integration across scientific disciplines. Furthermore, we explore the potential and challenges of facilitating collaboration between AI and scientific communities through the lens of link prediction. Our findings and tools aim to promote more impactful interdisciplinary collaborations and accelerate scientific discovery through deeper and broader AI integration. Our code and dataset are available at: https://github.com/charles-pyj/Bridging-AI-and-Science.
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UNIMORAL
arXiv, February 2025
Moral reasoning is a complex cognitive process shaped by individual experiences and cultural contexts and presents unique challenges for computational analysis. While natural language processing (NLP) offers promising tools for studying this phenomenon, current research lacks cohesion, employing discordant datasets and tasks that examine isolated aspects of moral reasoning. We bridge this gap with UNIMORAL, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas annotated with labels for action choices, ethical principles, contributing factors, and consequences, alongside annotators’ moral and cultural profiles. Recognizing the cultural relativity of moral reasoning, UNIMORAL spans six languages, Arabic, Chinese, English, Hindi, Russian, and Spanish, capturing diverse socio-cultural contexts. We demonstrate UNIMORAL’s utility through a benchmark evaluations of three large language models (LLMs) across four tasks: action prediction, moral typology classification, factor attribution analysis, and consequence generation. Key findings reveal that while implicitly embedded moral contexts enhance the moral reasoning capability of LLMs, there remains a critical need for increasingly specialized approaches to further advance moral reasoning in these models.
“It felt more real”: Investigating the User Experience of the MiWaves Personalizing JITAI Pilot Study
arXiv, February 2025
Susobhan Ghosh, Pei-Yao Hung, Lara N. Coughlin, Erin E. Bonar, Yongyi Guo, Inbal Nahum-Shani, Maureen Walton, Mark W. Newman, Susan A. Murphy
Cannabis use among emerging adults is increasing globally, posing significant health risks and creating a need for effective interventions. We present an exploratory analysis of the MiWaves pilot study, a digital intervention aimed at supporting cannabis use reduction among emerging adults (ages 18-25). Our findings indicate the potential of self-monitoring check-ins and trend visualizations in fostering self-awareness and promoting behavioral reflection in participants. MiWaves intervention message timing and frequency were also generally well-received by the participants. The participants’ perception of effort were queried on intervention messages with different tasks, and our findings suggest that messages with tasks like exploring links and typing in responses are perceived as requiring more effort as compared to messages with tasks involving reading and acknowledging. Finally, we discuss the findings and limitations from this study and analysis, and their impact on informing future iterations on MiWaves.
Capital and CHI: Technological Capture and How It Structures CHI Research
arXiv, January 2025
This paper advances a theoretical argument about the role capital plays in structuring CHI research. We introduce the concept of technological capture to theorize the mechanism by which this happens. Using this concept, we decompose the effect on CHI into four broad forms: technological capture creates market-creating, market-expanding, market-aligned, and externality-reducing CHI research. We place different CHI subcommunities into these forms—arguing that many of their values are inherited from capital underlying the field. Rather than a disciplinary- or conference-oriented conceptualization of the field, this work theorizes CHI as tightly-coupled with capital via technological capture. The paper concludes by discussing some implications for CHI.
AI Companies Threaten Independant Social Media Research
TechPolicy, January 2025
Ryan Mcgrady, Ethan Zuckerman, Kevin Zheng
The machine learning models that produce text, images, and video based on written prompts require massive amounts of data. Popular models like ChatGPT may have started small, with public domain ebooks, Congressional transcripts, and freely licensed content like Wikipedia, but the competitive demand for more and more data to build better and better models has led these companies to look elsewhere. Their bold decisions to ingest large quantities of copyrighted books, news, photos, and other content, implicitly arguing that transforming media into an AI model is a form of fair use, has triggered many ongoing lawsuits. Given a tech culture that values speed over permission, the next inevitable tranche of media to ingest was social media—a vast treasure trove of the world's user-generated content.
Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations
arXiv, December 2024
Megan A. Brown, Andrew Gruen, Gabe Maldoff, Solomon Messing, Zeve Sanderson, Michael Zimmer
Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative AI relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new challenges and concerns for researchers. This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers, examining the legal, ethical, institutional, and scientific factors that we recommend researchers consider when scraping the web. We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping. We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner. We aim to equip researchers with the relevant information to mitigate risks and maximize the impact of their research amidst this evolving data access landscape
RELATED
Keep up with research from UMSI experts by subscribing to our free research roundup newsletter!