Skip to main content

University of Michigan School of Information


Media Center

UMSI welcomes media inquiries

Tour-guide robots | Lonely data science: UMSI Research Roundup

UMSI research roundup. Tour guide robots, lonely data science. Check out UMSI faculty and PhD student publications.

Thursday, 02/29/2024

University of Michigan School of Information faculty and PhD students are creating and sharing knowledge that helps build a better world. Here are some of their recent publications.


From Information Poverty to Information Deficit: An Intersectional Analysis of Women of Color’s News Information-Seeking Habits in the Digital Age 

International Journal of Communication, February 2024 

Chelsea Peterson-Salahuddin

Scholars have used information poverty theory for decades to understand when and why marginalized individuals feel disconnected from news and information. However, by focusing on how individuals create information-poor environments, these studies shift attention away from the role of institutions in sustaining informational deficits. This article engages intersectionality as a systemic analysis of power to understand the structural, societal-level dimension of women of color’s news information-seeking habits in the digital age. Through eight focus groups with N = 45 women of color, this study elucidates the dynamic role of intersecting forms of systemic marginalization in informing women of color’s information-seeking habits. This study contributes to our understanding of the role of media institutions in creating and sustaining informational inequities.  

Social Acceptability of Health Behavior Posts on Social Media: An Experiment

American Journal of Preventive Medicine, January 2024 

Ashley N. Bhogal,, Veronica J. Berrocal, Daniel M. Romero, Matthew A. Willis, V.G. Vinod VydiswaranTiffany C. Veinot

Introduction: Social media sites like Twitter (now X) are increasingly used to create health behavior metrics for public health surveillance. Yet little is known about social norms that may bias the content of posts about health behaviors. Social norms for posts about four health behaviors (smoking tobacco, drinking alcohol, physical activity, eating food) on Twitter/X were evaluated.

Methods: This was a randomized experiment delivered via web-based survey to adult, English-speaking Twitter/X users in three Michigan, USA, counties from 2020 to 2022 (n=559). Each participant viewed 24 posts presenting experimental manipulations regarding four health behaviors and answered questions about each post's social acceptability. Principal component analysis was used to combine survey responses into one perceived social acceptability measure. Linear mixed models with the Benjamini–Hochberg correction were implemented to test seven study hypotheses in 2023.

Results: Supporting six hypotheses, posts presenting healthier (CI: 0.028, 0.454), less stigmatized behaviors (CI: 0.552, 0.157) were more socially acceptable than posts regarding unhealthier, stigmatized behaviors. Unhealthy (CI: −0.268, −0.109) and stigmatized behavior (CI: −0.261, −0.103) posts were less acceptable for more educated participants. Posts about collocated activities (CI: 0.410, 0.573) and accompanied by expressions of liking (CI: 0.906, 1.11) were more acceptable than activities undertaken alone or disliked. Contrary to one hypothesis, posts reporting unusual activities were less acceptable than usual ones (CI: −0.472, 0.312).

Conclusions: Perceived social acceptability may be associated with the frequency and content of health behavior posts. Users of Twitter/X and other social media platform posts to estimate health behavior prevalence should account for potential estimation biases from perceived social acceptability of posts.

On the Influence and Political Leaning of Overlap between Propaganda Communities

ACM Journal on Computing and Sustainable Societies, January 2024

Anirban Sen, Soumyasis Gun, Soham De, Joyojeet Pal 

Social media offers increasingly diverse mechanisms for the distribution of motivated information, with multiple propaganda communities exhibiting overlaps with respect to user, content, and network characteristics. This has particularly been an issue in the Global South, where recent work has shown various forms of strife related to polarizing speech online. It has also emerged that propagandist information, including fringe positions on issues, can find its way into the mainstream when sufficiently reinforced in tone and frequency, some of which often requires sophisticated organizing and information manipulation. In this study, we analyze the overlap between three events with varying degrees of propagandist messaging by analyzing the content and network characteristics of users leading to overlap between their users and discourse. We find that a significant fraction of users leading to overlap between the three event communities are influential in information spread across the three event networks, and political leaning is one of the factors that helps explain what brings the communities together. Our work sheds light on the importance of network characteristics of users, which can prove to be instrumental in establishing the role of political leaning on overlap between multiple propaganda communities. 

A Physical Activity and Diet Just‐in‐Time Adaptive Intervention to Reduce Blood Pressure: The myBPmyLife Study Rationale and Design

Journal of American Heart Association, January 2024 

Jessica R. Golbus, V. Swetha E. Jeganathan, Rachel Stevens, Weena Ekechukwu, Zahera Farhan, Rocio Contreras, Nikhila Rao, Brad Trumpower, Tanima Basu, Evan Luff, Lesli E. Skolarus, Mark W. Newman, Brahmajee K. Nallamothu and Michael P. Dorsch

BACKGROUND: Smartphone applications and wearable devices are promising mobile health interventions for hypertension self‐management. However, most mobile health interventions fail to use contextual data, potentially diminishing their impact. The myBPmyLife Study is a just‐in‐time adaptive intervention designed to promote personalized self‐management for patients with hypertension.

METHODS AND RESULTS: The study is a 6‐month prospective, randomized‐controlled, remotely administered trial. Participants were recruited from the University of Michigan Health in Ann Arbor, Michigan or the Hamilton Community Health Network, a federally qualified health center network in Flint, Michigan. Participants were randomized to a mobile application with a just‐in‐time adaptive intervention promoting physical activity and lower‐sodium food choices as well as weekly goal setting or usual care. The mobile study application encourages goal attainment through a central visualization displaying participants' progress toward their goals for physical activity and lower‐sodium food choices. Participants in both groups are followed for up for 6 months with a primary end point of change in systolic blood pressure. Exploratory analyses will examine the impact of notifications on step count and self‐reported lower‐sodium food choices. The study launched on December 9, 2021, with 484 participants enrolled as of March 31, 2023. Enrollment of participants was completed on July 3, 2023. After 6 months of follow‐up, it is expected that results will be available in the spring of 2024.

CONCLUSIONS: The myBPmyLife study is an innovative mobile health trial designed to evaluate the effects of a just‐in‐time adaptive intervention focused on improving physical activity and dietary sodium intake on blood pressure in diverse patients with hypertension.

Resource Requirements for Participant Enrollment From a University Health System and a Federally Qualified Health Center Network in a Mobile Health Study: The myBPmyLife Trial

Journal of American Heart Association, January 2024 

Lesli E. Skolarus, Zahera Farhan, Sonali R. Mishra, Nikhila Rao, Kaitlyn Bowie, Sarah Bailey, Michael P. Dorsch, Mark W. Newman, Brahmajee K. Nallamothu and Jessica R. Golbus

Black and low‐income people are more likely to have hypertension but less likely to enroll in blood pressure (BP) control trials than their counterparts. Mobile health (mHealth) interventions have shown promise for improving BP control, although clinical trials to examine intervention efficacy are needed in diverse communities. As barriers to BP control may differ by race and income, diverse representation is needed to ensure participant safety, generalizability of results, and health equity. Barriers to trial participation include mistrust, resource constraints, and the need for frequent in‐person trial visits during working hours. To understand what resource commitment is needed to overcome these barriers, we explored the amount of study team contact required to enroll participants from a university clinic compared with a federally qualified health center (FQHC) clinic for a community‐engaged, remote mHealth trial.

Evaluating Interpretive Research In Hci

IX Interactions, January 2024 

Robert Soden, Austin Toombs, Michaelanne Thomas

Over the past few review cycles at CHI, CSCW, and other HCI venues, there has been a significant increase in the demands that reviewers and associate chairs (ACs) place on methods reporting for qualitative research. Some of this is appropriate, and to be expected as the community continues to grow and our engagement with a broad range of disciplinary and theoretical perspectives matures. However, this has not come without problems. In this article, we highlight what appears to be a growing misunderstanding of interpretive research practices. We discuss how to evaluate their methods and claims, and the vital contributions they make to HCI research and practice.

Persisting through friction: growing a community driven knowledge infrastructure

Archival Science, January 2024

Alexandria J. RayburnRicardo L. Punzalan, Andrea K. Thomer

Many memory institutions hold heritage items belonging to Indigenous peoples. There are current eforts to share knowledge about these heritage items with their communities; one way this is done is through digital access. This paper examines The Great Lakes Research Alliance for the Study of Aboriginal Arts and Cultures (GRASAC), a network of researchers, museum professionals, and community members who maintain a digital platform that aggregates museum and archival research on Anishinaabe, Haudenosaunee, and Huron-Wendat cultures into a centralized database. The database, known as the GRASAC Knowledge Sharing System (GKS), is at a point of infrastructural growth, moving from a password protected system to one that is open to the public. Rooted in qualitative research from semi-structured interviews with the creators, maintainers, and users of the database, we examine the frictions in this expanding knowledge infrastructure (KI), and how they are eased over time. We find the friction within GRASAC resides in three main categories: collaborative friction, data friction, and our novel contribution: systemic friction.

Citizen attitudes toward science and technology, 1957–2020: measurement, stability, and the Trump challenge

Science and Public Policy, January 2024

Jon D Miller, Belén Laspra, Carmelo Polino, Glenn Branch, Mark S Ackerman, Robert T Pennock

In democratic societies around the world, the number of science policy decisions is increasing. One of the fundamental principles of democracy is that citizens should be able to understand the issues before them. Using a 63-year cross-sectional US data set, we use confirmatory factor analysis to construct and test a two-dimensional measure of attitude to science and technology that has been relatively stable over the last six decades. Previous and current research tells us that only one in three US adults is scientifically literate, meaning that trust in scientific expertise is important to many citizens. We find that trust in scientific expertise polarized during the Trump administration. Using the same data set, we construct two structural equation models to determine the factors that predict positive attitudes toward science and technology. Comparing 2016 and 2020, we find that the Trump attacks on science did not reduce public support for science.

Human technology intermediation to reduce cognitive load: understanding healthcare staff members’ practices to facilitate telehealth access in a Federally Qualified Health Center patient population

Journal of the American Medical Informatics Association, January 2024

Alicia K. WilliamsonMarcy G. Antonio, Sage Davis, Vaishnav Kameswaran, Tawanna R. Dillahunt, Lorraine R. Buis, Tiffany C. Veinot

Objectives: The aim of this study was to investigate how healthcare staff intermediaries support Federally Qualified Health Center (FQHC) patients’ access to telehealth, how their approaches reflect cognitive load theory (CLT) and determine which approaches FQHC patients find helpful and whether their perceptions suggest cognitive load (CL) reduction. 

Materials and Methods: Semistructured interviews with staff (n ¼ 9) and patients (n ¼ 22) at an FQHC in a Midwestern state. First-cycle coding of interview transcripts was performed inductively to identify helping processes and participants’ evaluations of them. Next, these inductive codes were mapped onto deductive codes from CLT. 

Results: Staff intermediaries used 4 approaches to support access to, and usage of, video visits and patient portals for FQHC patients: (1) shielding patients from cognitive overload; (2) drawing from long-term memory; (3) supporting the development of schemas; and (4) reducing the extraneous load of negative emotions. These approaches could contribute to CL reduction and each was viewed as helpful to at least some patients. For patients, there were beneficial impacts on learning, emotions, and perceptions about the self and technology. Intermediation also resulted in successful visits despite challenges. 

Discussion: Staff intermediaries made telehealth work for FQHC patients, and emotional support was crucial. Without prior training, staff discovered approaches that aligned with CLT and helped patients access technologies. Future healthcare intermediary interventions may benefit from the application of CLT in their design. Staff providing brief explanations about technical problems and solutions might help patients learn about technologies informally over time. 

Conclusion: CLT can help with developing intermediary approaches for facilitating telehealth access. 

In the Academy, Data Science is Lonely: Barriers to adopting data science methods for scientific research

Harvard Data Science Review, February 2024

Elle O’Brien, Jordan Mick

Data science has been heralded as a transformative family of methods for scientific discovery. Despite this excitement, putting these methods into practice in scientific research has proven challenging. We conducted a qualitative interview study of 25 researchers at the University of Michigan, all scientists who currently work outside of data science (in fields such as astronomy, education, chemistry, and political science) and wish to adopt data science methods as part of their research program. Semi-structured interviews explored the barriers they faced and strategies scientists used to persevere. These scientists quickly identified that they lacked the expertise to confidently implement and interpret new methods. For most, independent study was unsuccessful, owing to limited time, missing foundational skills, and difficulty navigating the marketplace of educational data science resources. Overwhelmingly, participants reported isolation in their endeavors and a desire for a greater community. Many sought to bootstrap a community on their own, with mixed results. Based on their narratives, we provide preliminary recommendations for academic departments, training programs, campuswide data science initiatives, and universities to build supportive communities of practice that cultivate expertise. These community relationships may be key to growing the research capacity of scientific institutions. 

Pre-prints, Working Papers, and Reports

Assessment of Discoverability Metrics for Harmful Content

University of Michigan Center for Social Media Responsibility, December 2023 

Paul ResnickSiqi Wu, James Park

Many stakeholders are interested in tracking prevalence metrics of harmful content on social media platforms. TrustLab tracks several metrics and has produced a report for the European Commission’s Code of Practice. One of TrustLab’s prevalence metrics, which they refer to as “discoverability,” is calculated by simulating a set of user searches, classifying the results as harmful or not, and reporting the proportion of harmful results. At TrustLab’s request, 1 the University of Michigan Center for Social Media Responsibility (CSMR) has identified key concerns and considerations in producing and interpreting any discoverability metric, and some possible approaches for addressing these concerns. There is a family of possible discoverability metrics, each based on alternative design choices. This report analyzes the impacts of design alternatives on two key principles from measurement theory [1], validity and reliability, alongside a third principle, “robustness to strategic actors.” 

● Validity: the accuracy of the metric – whether it correctly measures the thing that it is supposed to measure. For discoverability metrics in particular, a key element of validity is comparability – the extent to which comparisons across platforms, countries, and time periods are meaningful. For example, is harmful content more prevalent on X or YouTube? Is it more prevalent in Slovakia or Poland? Did the prevalence decline on YouTube in Poland since last quarter? 

● Reliability: the consistency of the metric – a measure could be accurate on average but have a high degree of variability between repeated individual measurements. In that case, any particular measurement would have to be treated as unreliable. 

● Robustness to strategic actors: whether, for example, a platform could manipulate or game the discoverability metric without changing what real users experience on the platform.

Landscape of Generative AI in Global News: Topics, Sentiments, and Spatiotemporal Analysis

arXiv, January 2024

Lu Xian, Lingyao Li, Yiwei Xu, Ben Zefeng ZhangLibby Hemphill

Generative AI has exhibited considerable potential to transform various industries and public life. The role of news media coverage of generative AI is pivotal in shaping public perceptions and judgments about this significant technological innovation. This paper provides in-depth analysis and rich insights into the temporal and spatial distribution of topics, sentiment, and substantive themes within global news coverage focusing on the latest emerging technology –generative AI. We collected a comprehensive dataset of news articles (January 2018 to November 2023, N = 24,827). For topic modeling, we employed the BERTopic technique and combined it with qualitative coding to identify semantic themes. Subsequently, sentiment analysis was conducted using the RoBERTa-base model. Analysis of temporal patterns in the data reveals notable variability in coverage across key topics–business, corporate technological development, regulation and security, and education–with spikes in articles coinciding with major AI developments and policy discussions. Sentiment analysis shows a predominantly neutral to positive media stance, with the business-related articles exhibiting more positive sentiment, while regulation and security articles receive a reserved, neutral to negative sentiment. Our study offers a valuable framework to investigate global news discourse and evaluate news attitudes and themes related to emerging technologies.

PRewrite: Prompt Rewriting with Reinforcement Learning

arXiv, January 2024 

Weize Kong, Spurthi Amba Hombaiah, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a “trial and error” fashion. This manual procedure can be time consuming, ineffective, and the generated prompts are, in a lot of cases, sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications? 

To address these questions, in this paper, we investigate prompt engineering automation. We consider a specific use case scenario in which developers/users have drafted initial prompts, but lack the time/expertise to optimize them. We propose PRewrite, an automated tool to rewrite these drafts and to generate highly effective new prompts. PRewrite is based on the Reinforcement Learning (RL) framework which allows for end-to-end optimization and our design allows the RL search to happen in a large action space. The automated tool leverages manually crafted prompts as starting points which makes the rewriting procedure more guided and efficient. The generated prompts are human readable, and self explanatory, unlike some of those in previous works. We conducted extensive experiments on diverse datasets and found that the prompts generated with this new method not only outperform professionally crafted prompts, but also prompts generated with other previously proposed methods.

Bridging the Preference Gap between Retrievers and LLMs

arXiv, January 2024 

Zixuan Ke, Weize Kong, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

Large Language Models (LLMs) have demonstrated superior results across a wide range of tasks, while retrieval has long been established as an effective means of obtaining task-relevant information for humans. Retrieval-augmented Generation (RAG) are known for their effectiveness in knowledge-intensive tasks by locating relevant information and placing it within the context window of the LLM. However, the relationship between retrievers and LLMs is still under-investigated. Most existing work treats the retriever and the LLM as independent components and leaves a gap between retrieving human-friendly information and assembling a LLM-friendly context. In this work, we examine a novel bridge model, validate the ranking and selection assumptions in retrievers in the context of RAG, and propose a training framework that chains together supervised and reinforcement learning to learn a bridge model. Empirical results demonstrate the effectiveness of our method in both question-answering and personalized generation tasks.

Toward Personalized Tour-Guide Robot: Adaptive Content Planner based on Visitor’s Engagement

19th Annual ACM/IEEE International Conference on Human Robot Interaction, March 2024

Yanran Lin, Wonse Jo, Arsha Ali, Lionel P. Robert Jr, Dawn M. Tilbury

In the evolving landscape of human-robot interactions, tour-guide robots are increasingly being integrated into various settings. However, the existing paradigm of these robots relies heavily on prerecorded content, which limits effective engagement with visitors. We propose to address this issue of visitor engagement by transforming tour-guide robots into dynamic, adaptable companions that cater to individual visitor needs and preferences. Our primary objective is to enhance visitor engagement during tours through a robotic system capable of assessing and reacting to visitor preference and engagement. Leveraging this data, the system can calibrate and adapt the tour-guide robot’s content in real-time to meet individual visitor preferences. Through this research, we aim to enhance the tour-guide robots’ impact in delivering engaging and personalized visitor experiences by providing an adaptive tour-guide robot solution that can learn from humans’ preferences and adapt its behaviors by itself.

Designing Healthcare Robots at Home for Older Adults: A Kano Model Perspective

19th Annual ACM/IEEE International Conference on Human Robot Interaction, March 2024

Qiaoning Zhang, Feng Zhou, Lionel P. Robert JrX. Jessie Yang

Healthcare robots at home are increasingly essential for promoting the independence of older adults, yet their widespread acceptance is hindered by a lack of clarity regarding optimal design features. To address this, this study employs the Kano model to systematically identify and prioritize the features of healthcare robots that most significantly influence user satisfaction and acceptance among older adults. We conducted a survey study with 253 U.S. older adults to evaluate a variety of robot features. The results highlight design features that markedly affect user satisfaction and acceptance. ‘Medication Management’ and ‘Managing Illness and Monitoring Health’ are identified as one-dimensional features, whereas ‘Animal-like Appearance’ is a less favored reverse feature, potentially diminishing satisfaction. ‘Housework’ along with seven other features, is recognized as attractive, with sixteen features deemed indifferent. 

How AI Ideas Affect the Creativity, Diversity, and Evolution of Human Ideas: Evidence From a Large, Dynamic Experiment

arXiv, January 2024 

Joshua AshkinazeJulia Mendelsohn, Li Qiwei, Ceren BudakEric Gilbert

Exposure to large language model output is rapidly increasing. How will seeing AI-generated ideas affect human ideas? We conducted an experiment (800+ participants, 40+ countries) where participants viewed creative ideas that were from ChatGPT or prior experimental participants and then brainstormed their own idea. We varied the number of AI-generated examples (none, low, or high exposure) and if the examples were labeled as “AI” (disclosure). Our dynamic experiment design—ideas from prior participants in an experimental condition are used as stimuli for future participants in the same experimental condition—mimics the interdependent process of cultural creation: creative ideas are built upon prior ideas. Hence, we capture the compounding effects of having LLMs “in the culture loop”. We find that high AI exposure (but not low AI exposure) did not affect the creativity of individual ideas but did increase the average amount and rate of change of collective idea diversity. AI made ideas different, not better. There were no main effects of disclosure. We also found that self-reported creative people were less influenced by knowing an idea was from AI, and that participants were more likely to knowingly adopt AI ideas when the task was difficult. Our findings suggest that introducing AI ideas into society may increase collective diversity but not individual creativity.

The Dynamics of (Not) Unfollowing Misinformation Spreaders

arXiv, January 2024 

Joshua AshkinazeEric Gilbert, Ceren Budak

Many studies explore how people “come into” misinformation exposure. But much less is known about how people “come out of” misinformation exposure. Do people organically sever ties to misinformation spreaders? And what predicts doing so? Over six months, we tracked the frequency and predictors of ∼1M followers unfollowing ∼5K health misinformation spreaders on Twitter. We found that misinformation ties are persistent. Monthly unfollowing rates are just 0.52%. Users are also 31% more likely to unfollow non-misinformation spreaders than they are to unfollow misinformation spreaders. Although generally infrequent, the factors most associated with unfollowing misinformation spreaders are (1) redundancy and (2) ideology. First, users initially following many spreaders, or who follow spreaders that tweet often, are most likely to unfollow later. Second, liberals are more likely to unfollow than conservatives. Overall, we observe strong persistence of misinformation ties. The fact that users rarely unfollow misinformation spreaders suggests a need for external nudges and the importance of preventing exposure from arising in the first place.

Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?

arXiv, February 2024

Yulin YuDaniel M. Romero

Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement. In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. We investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 6,000 datasets curated within one of the largest social science databases, ICPSR. This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, the combination of datasets with atypically combined topics has the opposite effect – the use of such data is associated with fewer citations. Third, younger and less experienced research teams tend to use atypical combinations of datasets in research at a higher frequency than their older and more experienced counterparts. Lastly, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an under-utilized but powerful strategy correlated with the scientific and broader impact of scientific discoveries.

Form-From: A Design Space of Social Media Systems

arXiv, February 2024

Amy X. Zhang, Michael S. Bernstein, David R. Karger, Mark S. Ackerman

Social media systems are as varied as they are pervasive. They have been almost universally adopted for a broad range of purposes including work, entertainment, activism, and decision making. As a result, they have also diversified, with many distinct designs differing in content type, organization, delivery mechanism, access control, and many other dimensions. In this work, we aim to characterize and then distill a concise design space of social media systems that can help us understand similarities and differences, recognize potential consequences of design choice, and identify spaces for innovation. Our model, which we call Form-From, characterizes social media based on (1) the form of the content, either threaded or flat, and (2) from where or from whom one might receive content, ranging from spaces to networks to the commons. We derive Form-From inductively from a larger set of 62 dimensions organized into 10 categories. To demonstrate the utility of our model, we trace the history of social media systems as they traverse the Form-From space over time, and we identify common design patterns within cells of the model.