Skip to main content

University of Michigan School of Information


Wildfires, Privacy and Digital Health: UMSI Research Roundup

UMSI research roundup. Wildfires, privacy and digital health. Check out UMSI faculty and PhD student publications.

Monday, 08/21/2023

University of Michigan School of Information faculty and PhD students are creating and sharing knowledge that helps build a better world. Here are some of their recent publications.

How to Train Your YouTube Recommender to Avoid Unwanted Videos

arXiv, August 2023

Alexander Liu, Siqi Wu, Paul Resnick

YouTube provides features for users to indicate disinterest when presented with unwanted recommendations, such as the "Not interested" and "Don't recommend channel" buttons. These buttons are purported to allow the user to correct "mistakes" made by the recommendation system. Yet, relatively little is known about the empirical efficacy of these buttons. Neither is much known about users' awareness of and confidence in them. To address these gaps, we simulated YouTube users with sock puppet agents. Each agent first executed a "stain phase", where it watched many videos of one assigned topic; it then executed a "scrub phase", where it tried to remove recommendations of the assigned topic. Each agent repeatedly applied a single scrubbing strategy, either indicating disinterest in one of the videos visited in the stain phase (disliking it or deleting it from the watch history), or indicating disinterest in a video recommended on the homepage (clicking the "not interested" or "don't recommend channel" button or opening the video and clicking the dislike button). We found that the stain phase significantly increased the fraction of the recommended videos dedicated to the assigned topic on the user's homepage. For the scrub phase, using the "Not interested" button worked best, significantly reducing such recommendations in all topics tested, on average removing 88% of them. Neither the stain phase nor the scrub phase, however, had much effect on videopage recommendations. We also ran a survey (N = 300) asking adult YouTube users in the US whether they were aware of and used these buttons before, as well as how effective they found these buttons to be. We found that 44% of participants were not aware that the "Not interested" button existed. However, those who were aware of this button often used it to remove unwanted recommendations (82.8%) and found it to be modestly effective (3.42 out of 5).

Researchers’ Experiences in Analyzing Privacy Policies: Challenges and Opportunities

Proceedings on Privacy Enhancing Technologies, August 2023

Abraham Mhaidli, Selin Fidan, An Doan, Gina Herakovic, Mukund Srinath, Lee Matheson, Shomir Wilson, Florian Schaub

Companies’ privacy policies and their contents are being analyzed for many reasons, including to assess the readability, usability, and utility of privacy policies; to extract and analyze data practices of apps and websites; to assess compliance of companies with relevant laws and their own privacy policies, and to develop tools and machine learning models to summarize and read policies. Despite the importance and interest in studying privacy policies from researchers, regulators, and privacy activists, few best practices or approaches have emerged and infrastructure and tool support is scarce or scattered. In order to provide insight into how researchers study privacy policies and the challenges they face when doing so, we conducted 26 interviews with researchers from various disciplines who have conducted research on privacy policies. We provide insights on a range of challenges around policy selection, policy retrieval, and policy content analysis, as well as multiple overarching challenges researchers experienced across the research process. Based on our findings, we discuss opportunities to better facilitate privacy policy research, including research directions for methodologically advancing privacy policy analysis, potential structural changes around privacy policies, and avenues for fostering an interdisciplinary research community and maturing the field.

Feasibility, perceived impact, and acceptability of a socially assistive robot to support emotion regulation with highly anxious university students: an open trial

JMIR Mental Health, August 2023

A. Jess Williams, Maureen Freed, Nikki Theofanopoulou, Claudia Dauden Roquet, Predrag Klasnja, James J. Gross, Jessica Schleider, Petr Slovak

Background: Mental health difficulties among university students have been rising rapidly over the last decade, and the demand for university mental health services commonly far exceeds available resources. Digital interventions are seen as one potential solution to these challenges. However, as in other mental health contexts, digital programmes often face low engagement and uptake; and the field lacks usable, engaging, evidence-supported mental health interventions that may be used flexibly when students need them most. Objectives: The aim of this study is to investigate the feasibility and acceptability of a new, in-situ intervention tool (Purrble) among university students experiencing anxiety. As an intervention, Purrble was designed to provide in-situ support for emotion regulation---a well-known transdiagnostic construct---directly in the moments when individuals are facing emotionally challenging situations. A secondary aim is to consider the perceived impact of Purrble on youth mental health, as reported by students over a 7-week deployment. Methods: A mixed-methods open trial was conducted, with 78 under- and post-graduate students at Oxford University. Participants were recruited based on moderate to high levels of anxiety measured by GAD-7 at baseline (M: 16.09; SD: 3.03). All participants had access to Purrble for 7 weeks during the spring term, with data on their perceived anxiety, emotion dysregulation, emotion regulation self-efficacy, and engagement with the intervention collected at baseline (pre-), week 4 (mid-), and week 8 (post-intervention). Qualitative responses were also collected at the mid- and post-intervention points. Results: Findings demonstrated a sustained engagement with Purrble over the 7 week period, with the acceptability further supported by the qualitative data indicating that students accepted Purrble and that Purrble was well-integrated into their daily routines. Exploratory quantitative data analysis indicated that Purrble was associated with reductions in student anxiety (dz= 0.96, 95% CI [0.62, 1.29]); and emotion dysregulation (dz = 0.69, 95% CI [0.38, 0.99]), and with increases in emotion regulation self-efficacy (dz = -0.56, 95% CI [-0.86, -0.26]). Conclusions: This is a first trial of a simple, physical intervention that aims to provide ongoing emotion regulation support to university students. Both quantitative and qualitative data suggests that Purrble is an acceptable and feasible intervention among students, the engagement with which can be sustained at a stable level across a 7-week period, while retaining a perceived benefit for those who use it (61% of our sample). The consistency of use is particularly promising given that there was no clinician engagement or further support provided beyond Purrble being delivered to the students. These results show promise for an innovative intervention model, which could be complementary to the existing interventions.

Unpacking the Effect of Political Affiliation on Organizational Resilience Following a Policy Shock

Academy of Management Proceedings, July 2023

Amrita Lahiri, Nanjundi Karthick Krishnan, Alex Kier, Aditya Johri, Joyojeet Pal

Resilience results from the interaction between an organization and its environment under conditions of adversity. While research has shown the positive association between resources and resilience, an understanding of how partisan contexts impacts this relationship is missing from the literature. In our study, through a combination of exploratory interviews and survey of entrepreneurs, we account for the socio-political context of adversity in influencing entrepreneurs’ adaptation to a crisis by examining India’s 2016 demonetization policy shock. We find that entrepreneurs’ political affiliation not only shapes how they interpret their environment, but also influences the extent to which they leverage available financial and human capital resources towards business resilience. Our study sheds light on the mechanisms behind entrepreneurial actions in the aftermath of a policy shock and how they ultimately shape organizational resilience.

Styling STEM: How African American Women Cosmetologists Can Help to Reimagine STEM Education 

International Journal of Gender, Science and Technology, August 2023

Holly Okonkwo, Michael Lachney, Madison C. Allen Kuyenga, LaQuita Love, William Babbitt, Audrey Bennett, Ron Eglash

This paper analyzes interviews with African American women cosmetologists who collaborated in designing and implementing a series of community-centered science, technology, engineering, and mathematics (STEM) education programs to support broadening the participation of Black children in those fields. These collaborations used technologies and media as bridges between STEM knowledge as it appears in schools and STEM knowledge as it has been and is communicated, produced, and used by Black hair care experts. We discuss how acknowledging these experts as knowledge producers who have unique pedagogical expertise not only provides new ways for reimagining STEM fields for Black children, but also helps to acknowledge STEM's existing and generative presence in Black communities. Our findings reveal three ways that this group of African American cosmetologists helped reimagine STEM education: 1) STEM as personal and situated; 2) STEM as a blend of public and community institutions and 3) STEM as community.

RCT Rejection Sampling for Causal Estimation Evaluation

arXiv, August 2023

Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya

Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates—such as text data, genomics, or the behavioral social sciences—researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT—which we release publicly—consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation. 

Bursts of contemporaneous publication among high- and low-credibility online information providers

New Media and Society, July 2023

Ceren Budak, Lia Bozarth, Robert M Bond, Drew Margolin, Jason J Jones, R Kelly Garrett

In studies of misinformation, the distinction between high- and low-credibility publishers is fundamental. However, there is much that we do not know about the relationship between the subject matter and timing of content produced by the two types of publishers. By analyzing the content of several million unique articles published over 28 months, we show that high- and low-credibility publishers operate in distinct news ecosystems. Bursts of news coverage generated by the two types of publishers tend to cover different subject matter at different times, even though fluctuations in their overall news production tend to be highly correlated. Regardless of the mechanism, temporally convergent coverage among low-credibility publishers has troubling implications for American news consumers.

Promises and Trust Repair in UGVs

Proceedings of the HFES 67th Annual Meeting of the Human Factors and Ergonomics Society, October 2023

Connor Esterwood, Arsha Ali, Zariq George, Samantha Dubrow, Jonathon Smereka, Kayla Riegner, Dawn Tilbury, Lionel P. Robert Jr

Unmanned ground vehicles (UGVs) are autonomous robots capable of performing tasks through self navigation and decision-making. They have the potential to replace humans in dangerous driving scenarios. However, UGVs must be viewed as trustworthy to be accepted, and like any automation, they can make mistakes that decrease human trust in them. Trust repair strategies can mitigate the consequences of trust violations, but they are not always effective. To better understand their effectiveness on UGVs, we designed a between-subjects study examining promises on a UGV’s trustworthiness. Preliminary results showed that promises had a marginal impact on overall trustworthiness but were influential in repairing benevolence but not ability or integrity. These findings have implications for the design of UGV’s and trust repair theory.

A Metadata-Driven Approach to Understand Graph Neural Networks

The 3rd Workshop of Graph Learning Benchmarks, August 2023

Ting Wei Li, Qiaozhu Mei, Jiaqi Ma

Graph Neural Networks (GNNs) have achieved remarkable success in various applications, but their performance can be sensitive to specific data properties of the graph datasets they operate on. Current literature on understanding the limitations of GNNs has primarily employed a model-driven approach that leverages heuristics and domain knowledge from network science or graph theory to model the GNN behaviors, which is time-consuming and highly subjective. In this work, we propose a metadata-driven approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. Our theoretical findings reveal that datasets with a more balanced degree distribution exhibit better linear separability of node representations, thus leading to better GNN performance. We also conduct controlled experiments using synthetic datasets with varying degree distributions, and the results align well with our theoretical findings. Collectively, both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.

Institutional isomorphism in corporate Twitter discourse on citizenship and immigration in India and the United States

Global Policy, August 2023

Shehla Rashid Shora, Arshia Arya, Joyojeet Pal

High net-worth individuals (HNIs) play important roles in influencing policy through their voices. Technology-mediated means of addressing issues, such as social media activism, have become a central part of such policy advocacy. We examined the Twitter engagement of the 50 wealthiest individuals and their ‘networks’ in India and the United States, specifically their engagement with citizens' movements and policy issues related to citizenship and immigration, with a focus on debates triggered by the enactment of the Citizenship Amendment Act (CAA) and rescission of Deferred Action for Childhood Arrivals (DACA), respectively. We quantified the level of engagement of ‘HNI networks’ with these debates through a textual analysis of their tweets using computational methods combined with manual annotation, followed by qualitative analysis and comparison of subjective meanings attached by actors to key terms. We found that American HNIs leveraged their social media presence to advocate for inclusive immigration and naturalisation policies, their model of advocacy characterised by confrontation, collective action and ownership by key actors, thus exhibiting mimetic isomorphism. Indian HNIs' tweets on CAA were few and far between, with no call for change and no evidence of either collective action or individual ownership, and a hesitation to challenge the central government, thus exhibiting coercive isomorphism.

The Impact of Modality, Technology Suspicion, and NDRT Engagement on the Effectiveness of AV Explanations

IEEE Access, August 2023

Qiaoning Zhang, Connor Esterwood, Anuj K. Pradhan, Dawn Tilbury, X. Jessie Yang, Lionel P. Robert Jr.

Explanations — reasons or justifications for action — are being used to promote the acceptance of automated vehicles (AVs). Yet, it is unclear whether and how the modality of explanation affects its effectiveness. Despite its importance in the technology acceptance literature, the impact of technology suspicion on the adoption of AVs is yet to be fully examined. To expand our understanding of AV explanation, we conducted a within-subjects experiment with 32 participants using a high-fidelity driving simulator. Four conditions were presented to participants: (1) auditory explanation with a non-driving-related task (NDRT), (2) auditory explanation without NDRT, (3) visual explanation with NDRT, and (4) visual explanation without NDRT. The results indicate that auditory explanations are more effective in reducing anxiety and unsafety perception for high-suspicion individuals, especially in the absence of NDRT. Conversely, individuals who are less technology suspicious prefer visual explanations, which can result in lower levels of anxiety and perceived unsafety. The study highlights the importance of considering individuals’ technology suspicion and engagement with NDRT when selecting the appropriate explanation modality, and the findings can guide the design of future AV systems to promote effective human-machine interaction.


The 3rd Workshop on Graph Learning Benchmarks (GLB 2023)

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2023

Jiaqi Ma, Jiong Zhu, Yuxiao Dong, Danai Koutra, Jingrui He, Qiaozhu Mei, Anton Tsitsulin, Xingjian Zhang, Marinka Zitnik

Recent years have witnessed a surge of research interest in graph machine learning. However, the benchmark datasets available to the field are rather limited in both quantity and diversity, an issue particularly notable given the immense potential applications of graph learning. The lack of diverse benchmark datasets may have biased the development of graph machine learning techniques towards narrow directions. By crowdsourcing novel tasks and datasets, this workshop aims to increase the diversity of graph learning benchmarks, identify new demands of graph machine learning in general, and gain a better synergy of how concrete techniques perform on these benchmarks. Moreover, this workshop offers a platform for discussions of best practices in curating graph learning benchmarks and data-centric approaches for graph learning.

Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text

DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023, August 2023

Mukund Srinath, Lee Matheson, Pranav Narayanan Venkit, Gabriela Zanfir-Fortuna, Florian Schaub, C. Lee Giles, Shomir Wilson

The General Data Protection Regulation (GDPR) and other recent privacy laws require organizations to post their privacy policies, and place specific expectations on organisations' privacy practices. Privacy policies take the form of documents written in natural language, and one of the expectations placed upon them is that they remain up to date. To investigate legal compliance with this recency requirement at a large scale, we create a novel pipeline that includes crawling, regex-based extraction, candidate date classification and date object creation to extract updated and effective dates from privacy policies written in English. We then analyze patterns in policy dates using four web crawls and find that only about 40% of privacy policies online contain a date, thereby making it difficult to assess their regulatory compliance. We also find that updates in privacy policies are temporally concentrated around passage of laws regulating digital privacy (such as the GDPR), and that more popular domains are more likely to have policy dates as well as more likely to update their policies regularly.

What’s Next for Modernizing Gender, Sex, and Sexual Orientation Terminology in Digital Health Systems? Viewpoint on Research and Implementation Priorities

Journal of Medical Internet Research, July 2023 

Roz Queen, Karen L Courtney, Francis Lau, Kelly Davison, Aaron Devor, Marcy G Antonio

In 2021, Canada Health Infoway and the University of Victoria's Gender, Sex, and Sexual Orientation Research Team hosted a series of discussions to successfully and safely modernize gender, sex, and sexual orientation information practices within digital health systems. Five main topic areas were covered: (1) terminology standards; (2) digital health and electronic health record functions; (3) policy and practice implications; (4) primary care settings; and (5) acute and tertiary care settings. In this viewpoint paper, we provide priorities for future research and implementation projects and recommendations that emerged from these discussions.

AI Consent Futures: A Case Study on Voice Data Collection with Clinicians

Proceedings of the ACM on Human-Computer Interaction, October 2023

Lauren Wilcox, Robin Brewer, Fernando Diaz

As new forms of data capture emerge to power new AI applications, questions abound about the ethical implications of these data collection practices. In this paper, we present clinicians’ perspectives on the prospective benefits and harms of voice data collection during health consultations. Such data collection is being proposed as a means to power models to assist clinicians with medical data entry, administrative tasks, and consultation analysis. Yet, clinicians’ attitudes and concerns are largely absent from the AI narratives surrounding these use cases, and the academic literature investigating them. Our qualitative interview study used the concept of an informed consent process as a type of design fiction, to support elicitation of clinicians’ perspectives on voice data collection and use associated with a fictional, near-term AI assistant. Through reflexive thematic analysis of in-depth sessions with physicians, we distilled eight classes of potential risks that clinicians are concerned about, including workflow disruptions, self-censorship, and errors that could impact patient eligibility for services. We conclude with an in-depth discussion of these prospective risks, reflect on the use of the speculative processes that illuminated them, and reconsider evaluation criteria for AI-assisted clinical documentation technologies in light of our findings.

Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season

arXiv, August 2023

Zihui Ma, Lingyao Li, Libby Hemphill, Gregory B. Baecher

Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decisionmakers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics: “health impact,” “damage,” and “evacuation.” We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.