University of Michigan School of Information
Service Robots in Restaurants | Social Media and Job Market Success: UMSI Research Roundup

Wednesday, 07/03/2024
By Noor HindiUniversity of Michigan School of Information faculty and PhD students are creating and sharing knowledge that helps build a better world. Here are some of their recent publications.
Publications
Text Messages to Promote Physical Activity in Patients With Cardiovascular Disease: A Micro-Randomized Trial of a Just-In-Time Adaptive Intervention
Circulation: Cardiovascular Quality and Outcomes, June 2024
Jessica R. Golbus, Jieru Shi, Kashvi Gupta, Rachel Stevens, V.Swetha E. Jeganathan, Evan Luff, Thomas Boyden, Bhramar Mukherjee, Sarah Kohnstamm, Vlad Taralunga, Vik Kheterpal, Sachin Kheterpal, Kenneth Resnicow, Susan Murphy, Walter Dempsey, Predrag Klasnja, Brahmajee K. Nallamothu
Background: Text messages may enhance physical activity levels in patients with cardiovascular disease, including those enrolled in cardiac rehabilitation. However, the independent and long-term effects of text messages remain uncertain.
Methods: The VALENTINE study (Virtual Application-supported Environment to Increase Exercise) was a micro-randomized trial that delivered text messages through a smartwatch (Apple Watch or Fitbit Versa) to participants initiating cardiac rehabilitation. Participants were randomized 4× per day over 6-months to receive no text message or a message encouraging low-level physical activity. Text messages were tailored on contextual factors (eg, weather). Our primary outcome was step count 60 minutes following a text message, and we used a centered and weighted least squares mean method to estimate causal effects. Given potential measurement differences between devices determined a priori, data were assessed separately for Apple Watch and Fitbit Versa users over 3 time periods corresponding to the initiation (0–30 days), maintenance (31–120 days), and completion (121–182 days) of cardiac rehabilitation.
Results: One hundred eight participants were included with 70 552 randomizations over 6 months; mean age was 59.5 (SD, 10.7) years with 36 (32.4%) female and 68 (63.0%) Apple Watch participants. For Apple Watch participants, text messages led to a trend in increased step count by 10% in the 60-minutes following a message during days 1 to 30 (95% CI, −1% to +20%), with no effect from days 31 to 120 (+1% [95% CI, −4% to +5%]), and a significant 6% increase during days 121 to 182 (95% CI, +0% to +11%). For Fitbit users, text messages significantly increased step count by 17% (95% CI, +7% to +28%) in the 60-minutes following a message in the first 30 days of the study with no effect subsequently.
Conclusions: In patients undergoing cardiac rehabilitation, contextually tailored text messages may increase physical activity, but this effect varies over time and by device.
Causal Inference for Human-Language Model Collaboration
North American Chapter of the Association for Computational Linguistics, March 2024
Bohan Zhang, Yixin Wang, Paramveer S. Dhillon
In this paper, we examine the collaborative dynamics between humans and language models (LMs), where the interactions typically involve LMs proposing text segments and humans editing or responding to these proposals. Productive engagement with LMs in such scenarios necessitates that humans discern effective text-based interaction strategies, such as editing and response styles, from historical human-LM interactions. This objective is inherently causal, driven by the counterfactual ‘what-if’ question: how would the outcome of collaboration change if humans employed a different text editing/refinement strategy? A key challenge in answering this causal inference question is formulating an appropriate causal estimand: the conventional average treatment effect (ATE) estimand is inapplicable to text-based treatments due to their high dimensionality. To address this concern, we introduce a new causal estimand—Incremental Stylistic Effect (ISE), which characterizes the average impact of infinitesimally shifting a text towards a specific style, such as increasing formality. We establish the conditions for the non-parametric identification of ISE. Building on this, we develop CausalCollab, an algorithm designed to estimate the ISE of various interaction strategies in dynamic human-LM collaborations. Our empirical investigations across three distinct human-LM collaboration scenarios reveal that CausalCollab effectively reduces confounding and significantly improves counterfactual estimation over a set of competitive baselines.
Service Robots in Restaurants: Anticipated Changes in Employee Feedback, Feedback, Performance, Satisfaction, and Performance, Satisfaction, and Turnover Intention
Association for Information Systems, August 2024
Samia Cornelius Bhatti, Aarushi Jain, Lionel Peter Robert
Service robots are increasingly integrated into workplaces such as restaurants, offering numerous benefits but also posing challenges. Employees often express skepticism and apprehension about the potential changes in their work resulting from the presence of robots. Consequently, both industry practitioners and researchers have emphasized the importance of investigating how employees might perceive and react to working alongside robots. This paper addresses this concern by presenting findings from a qualitative study employing thematic content analysis based on the Job Characteristics Model (JCM) with 220 restaurant employees. The results reinforce the associations posited by JCM, but also find an inverse relationship between feedback and knowledge of results and knowledge of results and satisfaction. They also provide insights about a direct association between knowledge of results and turnover intention.
“Why is Everything in the Cloud?”: Co-Designing Visual Cues Representing Data Processes with Children
IDC ‘24: Proceedings of the 23rd Annual ACM Interaction Design and Children Conference, June 2024
Kaiwen Sun, Ritesh Kanchi, Frances Marie Tabio Ello, Li-Neishin Co, Mandy Wu, Susan A. Gelman, Jenny Radesky, Florian Schaub, Jason Yip
Children struggle to understand hidden data processes (e.g., inferences) and related privacy implications (e.g., profiling). Children use visual cues to reason about technical processes in digital products, sometimes drawing inaccurate conclusions when interface cues are vague or absent. We conducted five consecutive participatory design sessions with children (ages 7–12), probing their perceptions of visual cues and data processes; and iteratively designed and reviewed new visual cues with them. We found that children conceptualized data collection concretely, lacked awareness of its pervasive nature, expressed limited understanding of data inferences, and recognized certain visual cues (e.g., loading, cloud) but unable to explain their meanings. We designed visual cues in “symbolic” and “concrete” styles using icons and metaphors, which helped children understand data flows. Our work contributes to developing comprehensible visual cues for children to support their data and privacy literacy. We discuss design and policy implications of our findings.
Making a Metaphor Sandwich: Analyzing Children’s use of Metaphor During Tabletop Telepresence Robot Supported Participatory Design
IDC ‘24: Proceedings of the 23rd Annual ACM Interaction Design and Children Conference, June 2024
Casey Lee Hunt, Kaiwen Sun, Kaitlyn Tseng, Priyanka Balasubramaniyam, Allison Druin, Amanda Huynh, Daniel Leithinger, Jason Yip
Strengthening telepresence for children can improve their educational and socio-emotional outcomes. Meanwhile, understanding how children conceptualize new technologies supports designers to create engaging and intuitive interactions for them. In this pictorial, we explore children’s relationship to a promising and emerging approach to telepresence–tabletop robots. We analyze metaphors children used to describe a tabletop telepresence robot platform during 2-years (~ 100 hours) of online participatory design with this technology. We use illustrations to convey and contextualize how children imagined the tabletop telepresence robots. We find that children used three categories of metaphor in their imaginings:(1) robot capabilities (magic/fragile),(2) robot roles (competitive/play-acting/creative),(3) robot agency (remote controlled/autonomous). We discuss these metaphors in the context of existing child-robot interaction, tangible interaction, and telepresence literature. Finally, we contribute the theoretical framework of a “metaphor sandwich” to describe children’s use of mixed metaphors during high engagement with the tabletop telepresence robots.
Misunderstanding the harms of online misinformation
Nature, June 2024
Ceren Budak, Brendan Nyhan, David M. Rothschild, Emily Thorson, Duncan J. Watts
The controversy over online misinformation and social media has opened a gap between public discourse and scientific research. Public intellectuals and journalists frequently make sweeping claims about the effects of exposure to false content online that are inconsistent with much of the current empirical evidence. Here we identify three common misperceptions: that average exposure to problematic content is high, that algorithms are largely responsible for this exposure and that social media is a primary cause of broader social problems such as polarization. In our review of behavioral science research on online misinformation, we document a pattern of low exposure to false and inflammatory content that is concentrated among a narrow fringe with strong motivations to seek out such information. In response, we recommend holding platforms accountable for facilitating exposure to false and extreme content in the tails of the distribution, where consumption is highest and the risk of real-world harm is greatest. We also call for increased platform transparency, including collaborations with outside researchers, to better evaluate the effects of online misinformation and the most effective responses to it. Taking these steps is especially important outside the USA and Western Europe, where research and data are scant and harms may be more severe.
Filter Bubble or Homogenization? Disentangling the Long-Term Effects of Recommendations on User Consumption Patterns
WWW ‘24: Proceedings of the ACM on Web Conference, May 2024
Md Sanzeed Anwar, Grant Schoenebeck, Paramveer S. Dhillon
Recommendation algorithms play a pivotal role in shaping our media choices, which makes it crucial to comprehend their long-term impact on user behavior. These algorithms are often linked to two critical outcomes: homogenization, wherein users consume similar content despite disparate underlying preferences, and the filter bubble effect, wherein individuals with differing preferences only consume content aligned with their preferences (without much overlap with other users). Prior research assumes a trade-off between homogenization and filter bubble effects and then shows that personalized recommendations mitigate filter bubbles by fostering homogenization. However, because of this assumption of a tradeoff between these two effects, prior work cannot develop a more nuanced view of how recommendation systems may independently impact homogenization and filter bubble effects. We develop a more refined definition of homogenization and the filter bubble effect by decomposing them into two key metrics: how different the average consumption is between users (inter-user diversity) and how varied an individual’s consumption is (intra-user diversity). We then use a novel agent-based simulation framework that enables a holistic view of the impact of recommendation systems on homogenization and filter bubble effects. Our simulations show that traditional recommendation algorithms (based on past behavior) mainly reduce filter bubbles by affecting inter-user diversity without significantly impacting intra-user diversity. Building on these findings, we introduce two new recommendation algorithms that take a more nuanced approach by accounting for both types of diversity.
Characteristics of and Variation in Suicide Mortality Related to Retirement During the Great Recession: Perspectives From the National Violent Death Reporting System
The Gerontologist, May 2024
Aparna Ananthasubramaniam, David Jurgens, Eskira Kahsay, Briana Mezuk
Background and Objectives: Suicide rates typically increase during recessions. However, few studies have explored how recessions affect risk among older adults nearing retirement. This study used a large suicide mortality registry to characterize and quantify suicide related to retirement during the Great Recession (GR).
Research Design and Methods: Data come from the National Violent Death Reporting System (NVDRS, 2004–2017; N = 53,298 suicide deaths age ≥50). We analyzed the text narratives (i.e., descriptions of the most salient circumstances to each suicide) of these decedents using natural language processing (NLP) to identify cases that were “retirement-related” (RR, e.g., anticipating, being unable to, or recently retiring). We used time-series analysis to quantify variation in RR over the GR, and compared these trends to retirees (i.e., decedents whose occupation was “retired”) and all decedents aged ≥50. We used content and network analysis to characterize themes represented in the narratives.
Results: There were 878 RR cases (1.6% of suicides aged ≥50) identified by the NLP model; only 52% of these cases were among retirees. RR cases were younger (62 vs 75 years) and more educated (41.5% vs 24.5% college degree) than retirees. The rate of RR suicide was positively associated with indicators of the GR (e.g., short-term unemployment R2 = 0.70, p = .024), but economic indicators were not correlated with the suicide rate among retirees or older adults in general. Economic issues were more central to the narratives of RR cases during the GR compared to other periods.
Discussion and Implications: Recessions shape suicide risk related to retirement transitions.
Pre-prints, Working Papers, Articles, Reports, Workshops and Talks
ROB 204: Introduction to Human-Robot Systems at the University of Michigan, Ann Arbor
arXiv, May 2024
Leia Stirling, Joseph Montgomery, Mark Draelos, Christoforos Mavrogiannis, Lionel P. Robert Jr, Odest Chadwicke Jenkins
The University of Michigan Robotics program focuses on the study of embodied intelligence that must sense, reason, act, and work with people to improve quality of life and productivity equitably across society. ROB 204, part of the core curriculum towards the undergraduate degree in Robotics, introduces students to topics that enable conceptually designing a robotic system to address users’ needs from a sociotechnical context. Students are introduced to human-robot interaction (HRI) concepts and the process for socially engaged design with a Learn-Reinforce-Integrate approach. In this paper, we discuss the course topics and our teaching methodology, and provide recommendations for delivering this material. Overall, students leave the course with a new understanding and appreciation for how human capabilities can inform requirements for a robotics system, how humans can interact with a robot, and how to assess the usability of robotic systems.
Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study
arXiv, May 2024
Shreyas Bhat, Joseph B. Lyons, Cong Shi, X. Jessie Yang
With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic goal alignment within a task) between the robot and the human is gaining increasing research attention. Prior literature on value alignment makes an inherent assumption that aligning the values of the robot with that of the human benefits the team. This assumption, however, has not been empirically verified. Moreover, prior literature does not account for human’s trust in the robot when analyzing human-robot value alignment. Thus, a research gap needs to be bridged by answering two questions: How does alignment of values affect trust? Is it always beneficial to align the robot’s values with that of the human? We present a simulation study and a human-subject study to answer these questions. Results from the simulation study show that alignment of values is important for trust when the overall risk level of the task is high. We also present an adaptive strategy for the robot that uses Inverse Reinforcement Learning (IRL) to match the values of the robot with those of the human during interaction. Our simulations suggest that such an adaptive strategy is able to maintain trust across the full spectrum of human values. We also present results from an empirical study that validate these findings from simulation. Results indicate that real-time personalized value alignment is beneficial to trust and perceived performance by the human when the robot does not have a good prior on the human’s values.
Nudging Users to Change Breached Passwords Using the Protection Motivation Theory
arXiv, May 2024
Yixin Zou, Khue Le, Peter Mayer, Alessandro Acquisti, Adam J. Aviv, Florian Schaub
We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords. Our online experiment (𝑛=1, 386) compared the effectiveness of a threat appeal (highlighting negative consequences of breached passwords) and a coping appeal (providing instructions on how to change the breached password) in a 2x2 factorial design. Compared to the control condition, participants receiving the threat appeal were more likely to intend to change their passwords, and participants receiving both appeals were more likely to end up changing their passwords; both comparisons have a small effect size. Participants’ password change behaviors are further associated with other factors such as their security attitudes (SA-6) and time passed since the breach, suggesting that PMT-based nudges are useful but insufficient to fully motivate users to change their passwords. Our study contributes to PMT’s application in security research and provides concrete design implications for improving compromised credential notifications.
Social Media and Job Market Success: A Field Experiment on Twitter
SSRN, May 2024
Jingyi Qiu, Yan Chen, Alain Cohn, Alvin Roth
We conducted a field experiment on Twitter to examine the impact of social media promotion on job market outcomes in economics. Half of the 519 job market papers tweeted from our research account were randomly assigned to be quote-tweeted by prominent economists. Papers assigned to be quote-tweeted received 442% more views and 303% more likes. Moreover, candidates in the treatment group received one additional flyout, with women receiving 0.9 more job offers. These findings suggest that social media promotion can improve the visibility and success of job market candidates, especially for underrepresented groups in economics such as women.
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows
arXiv, June 2024
Xingjian Zhang, Yutong Xie, Jin Huang, Jinge Ma, Zhaoying Pan, Qijia Liu, Ziyang Xiong, Tolga Ergen, Dongsub Shim, Honglak Lee, Qiaozhu Mei
Scientific innovation relies on detailed workflows, which include critical steps such as analyzing literature, generating ideas, validating these ideas, interpreting results, and inspiring follow-up research. However, scientific publications that document these workflows are extensive and unstructured. This makes it difficult for both human researchers and AI systems to effectively navigate and explore the space of scientific innovation. To address this issue, we introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of Scientific Workflows. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. Using Large Language Models (LLMs), we automatically extract five core aspects from these publications – context, key idea, method, outcome, and projected impact – which correspond to five key steps in the research workflow. These structured summaries facilitate a variety of downstream tasks and analyses. The quality of the LLM-extracted summaries is validated by comparing them with human annotations. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset, which make various types of predictions and recommendations along the scientific workflow. MASSW holds significant potential for researchers to create and benchmark new AI methods for optimizing scientific workflows and fostering scientific innovation in the field. Our dataset is openly available at https://github.com/xingjian-zhang/massw.
Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course
arXiv, June 2024
Aadarsh Padiyath, Xinying Hou, Amy Pang, Diego Viramontes Vargas, Xingjian Gu, Tamara Nelson-Fromm, Zihan Wu, Mark Guzdial, Barbara Ericson
The capability of large language models (LLMs) to generate, debug, and explain code has sparked the interest of researchers and educators in undergraduate programming, with many anticipating their transformative potential in programming education. However, decisions about why and how to use LLMs in programming education may involve more than just the assessment of an LLM’s technical capabilities. Using the social shaping of technology theory as a guiding framework, our study explores how students’ social perceptions influence their own LLM usage. We then examine the correlation of self-reported LLM usage with students’ self-efficacy and midterm performances in an undergraduate programming course. Triangulating data from an anonymous end-of-course student survey (n = 158), a mid-course self-efficacy survey (n=158), student interviews (n = 10), self-reported LLM usage on homework, and midterm performances, we discovered that students’ use of LLMs was associated with their expectations for their future careers and their perceptions of peer usage. Additionally, early self-reported LLM usage in our context correlated with lower self-efficacy and lower midterm scores, while students’ perceived over-reliance on LLMs, rather than their usage itself, correlated with decreased self-efficacy later in the course.
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
arXiv, June 2024
Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens
Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems’ objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of “Bidirectional Human-AI Alignment” to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions.
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways
arXiv, June 2024
Shubham Atreja, Joshua Ashkinaze, Lingyao Li, Julia Mendelsohn, Libby Hemphill
Manually annotating data for computational social science tasks can be costly, time consuming, and emotionally draining. While recent work suggests that LLMs can perform such annotation tasks in zero-shot settings, little is known about how prompt design impacts LLMs’ compliance and accuracy. We conduct a large-scale multi-prompt experiment to test how model selection (ChatGPT, PaLM2, and Falcon7b) and prompt design features (definition inclusion, output type, explanation, and prompt length) impact the compliance and accuracy of LLM-generated annotations on four CSS tasks (toxicity, sentiment, rumor stance, and news frames). Our results show that LLM compliance and accuracy are highly prompt dependent. For instance, prompting for numerical scores instead of labels reduces all LLMs’ compliance and accuracy. The overall best prompting setup is task-dependent, and minor prompt changes can cause large changes in the distribution of generated labels. By showing that prompt design significantly impacts the quality and distribution of LLM-generated annotations, this work serves as both a warning and practical guide for researchers and practitioners.