University of Michigan School of Information
Climate change, empathy and roboviz: UMSI research roundup
“Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection”
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, December 2022.
Suchin Gururangan, Dallas Card, Sarah Dreier, Emily Gade, Leroy Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith
Language models increasingly rely on massive web crawls for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and news often serve as anchors for automatically selecting web text most suitable for language modeling, a process typically referred to as quality filtering. Using a new dataset of U.S. high school newspaper articles—written by students from across the country—we investigate whose language is preferred by the quality filter used for GPT-3. We find that newspapers from larger schools, located in wealthier, educated, and urban zones (ZIP codes) are more likely to be classified as high quality. We also show that this quality measurement is unaligned with other sensible metrics, such as factuality or literary acclaim. We argue that privileging any corpus as high quality entails a language ideology, and more care is needed to construct training corpora for language models, with better transparency and justification for the inclusion or exclusion of various texts.
“Parasitic Knowledge Infrastructures: Data Reuse by Anthropogenic Climate Change Skeptics”
Proceedings of the Association for Information Science and Technology, October 2022.
Stakeholders from academia, industry, funding agencies, and scholarly publishing are increasingly investing in open data partially in the hope that it will democratize science and promote more diverse data reuse. However, fewer studies examine how unconventional communities outside academia and industry use open data. Through an investigative digital ethnography, I observed the data practices of anthropogenic climate change (ACC) skeptics, specifically how they discuss, evaluate, and reuse open data. This poster focuses on the knowledge infrastructure that affords the data practices of ACC skeptics. I argue that ACC skeptics are building a parasitic knowledge infrastructure on the back of the climate science knowledge infrastructure it often seeks to discredit. Understanding the infrastructure that supports skeptics' data reuse can inform how we design policies and infrastructure to actualize open data's promises while minimizing its perils.
“Development and Validation of Multivariable Prediction Algorithms to Estimate Future Walking Behavior in Adults: Retrospective Cohort Study”
JMIR Mhealth Uhealth, January 2023.
Junghwan Park, Gregory J Norman, Predrag Klasnja, Daniel E Rivera, Eric Hekler
Background: Physical inactivity is associated with numerous health risks, including cancer, cardiovascular disease, type 2 diabetes, increased health care expenditure, and preventable, premature deaths. The majority of Americans fall short of clinical guideline goals (ie, 8000-10,000 steps per day). Behavior prediction algorithms could enable efficacious interventions to promote physical activity by facilitating delivery of nudges at appropriate times.
Objective:The aim of this paper is to develop and validate algorithms that predict walking (ie, >5 min) within the next 3 hours, predicted from the participants’ previous 5 weeks’ steps-per-minute data.
Methods: We conducted a retrospective, closed cohort, secondary analysis of a 6-week microrandomized trial of the HeartSteps mobile health physical-activity intervention conducted in 2015. The prediction performance of 6 algorithms was evaluated, as follows: logistic regression, radial-basis function support vector machine, eXtreme Gradient Boosting (XGBoost), multilayered perceptron (MLP), decision tree, and random forest. For the MLP, 90 random layer architectures were tested for optimization. Prior 5-week hourly walking data, including missingness, were used for predictors. Whether the participant walked during the next 3 hours was used as the outcome. K-fold cross-validation (K=10) was used for the internal validation. The primary outcome measures are classification accuracy, the Mathew correlation coefficient, sensitivity, and specificity.
Results: The total sample size included 6 weeks of data among 44 participants. Of the 44 participants, 31 (71%) were female, 26 (59%) were White, 36 (82%) had a college degree or more, and 15 (34%) were married. The mean age was 35.9 (SD 14.7) years. Participants (n=3, 7%) who did not have enough data (number of days <10) were excluded, resulting in 41 (93%) participants. MLP with optimized layer architecture showed the best performance in accuracy (82.0%, SD 1.1), whereas XGBoost (76.3%, SD 1.5), random forest (69.5%, SD 1.0), support vector machine (69.3%, SD 1.0), and decision tree (63.6%, SD 1.5) algorithms showed lower performance than logistic regression (77.2%, SD 1.2). MLP also showed superior overall performance to all other tried algorithms in Mathew correlation coefficient (0.643, SD 0.021), sensitivity (86.1%, SD 3.0), and specificity (77.8%, SD 3.3).
Conclusions: Walking behavior prediction models were developed and validated. MLP showed the highest overall performance of all attempted algorithms. A random search for optimal layer structure is a promising approach for prediction engine development. Future studies can test the real-world application of this algorithm in a “smart” intervention for promoting physical activity.
“TIP: A Trust Inference and Propagation Model in Multi-Human Multi-Robot Teams”
arXiv, January 2023. Will appear at the 2021 ACM/IEEE International Conference on Human-Robot Interaction (HRI '23).
Yaohui Guo, X. Jessie Yang, Cong Shi
Trust has been identified as a central factor for effective humanrobot teaming. Existing literature on trust modeling predominantly focuses on dyadic human-autonomy teams where one human agent interacts with one robot. There is little, if not no, research on trust modeling in teams consisting of multiple human agents and multiple robotic agents. To fill this research gap, we present the trust inference and propagation (TIP) model for trust modeling in multihuman multi-robot teams. We assert that in a multi-human multirobot team, there exist two types of experiences that any human agent has with any robot: direct and indirect experiences. The TIP model presents a novel mathematical framework that explicitly accounts for both types of experiences. To evaluate the model, we conducted a human-subject experiment with 15 pairs of participants (𝑁 = 30). Each pair performed a search and detection task with two drones. Results show that our TIP model successfully captured the underlying trust dynamics and significantly outperformed a baseline model. To the best of our knowledge, the TIP model is the first mathematical framework for computational trust modeling in multi-human multi-robot teams.
“A Community Participatory Approach to Creating Contextually Tailored mHealth Notifications: myBPmyLife Project”
Health Promotion Practice, January 2023.
Abby Katherine Hellem, Amanda Casetti, Kaitlyn Bowie, Jessica R. Golbus, Beza Merid, Brahmajee K. Nallamothu, Michael P. Dorsch, Mark W. Newman, Lesli Skolarus
Just-in-time adaptive interventions (JITAIs) are a novel approach to mobile health (mHealth) interventions, sending contextually tailored behavior change notifications to participants when they are more likely to engage, determined by data from wearable devices. We describe a community participatory approach to JITAI notification development for the myBPmyLife Project, a JITAI focused on decreasing sodium consumption and increasing physical activity to reduce blood pressure. Eighty-six participants were interviewed, 50 at a federally qualified health center (FQHC) and 36 at a university clinic. Participants were asked to provide encouraging physical activity and low-sodium diet notifications and provided feedback on researcher-generated notifications to inform revisions. Participant notifications were thematically analyzed using an inductive approach. Participants noted challenging vocabulary, phrasing, and culturally incongruent suggestions in some of the researcher-generated notifications. Community-generated notifications were more direct, used colloquial language, and contained themes of grace. The FQHC participants’ notifications expressed more compassion, religiosity, and addressed health-related social needs. University clinic participants’ notifications frequently focused on office environments. In summary, our participatory approach to notification development embedded a distinctive community voice within our notifications. Our approach may be generalizable to other communities and serve as a model to create tailored mHealth notifications to their focus population.
“A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing”
arXiv, October 2022.
Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek
We review the state of research on empathy in natural language processing and identify the following issues: (1) empathy definitions are absent or abstract, which (2) leads to low construct validity and reproducibility. Moreover, (3) emotional empathy is overemphasized, skewing our focus to a narrow subset of simplified tasks. We believe these issues hinder research progress and argue that current directions will benefit from a clear conceptualization that includes operationalizing cognitive empathy components. Our main objectives are to provide insight and guidance on empathy conceptualization for NLP research objectives and to encourage researchers to pursue the overlooked opportunities in this area, highly relevant, e.g., for clinical and educational sectors.
“Putting Humans Back in the Loop: An Affordance Conceptualization of the 4th Industrial Revolution”
Information Systems Journal, December 2022.
Nigel P. Melville, Lionel Robert, Xiao Xiao
The current technology epoch—sometimes called the fourth industrial revolution (4IR)—involves the innovative application of rapidly advancing digital technologies such as artificial intelligence. Societal implications of the 4IR are significant and wide-ranging, from life-saving drug development to privacy loss and app addiction. A review of the information systems literature, however, reveals a narrow focus on technology-enabled business benefits. Scant research attention has been paid to the role of humans and humanistic outcomes. To spur new research addressing these issues, formalized affordance theory is employed to develop a new 4IR conceptualization. Four groupings of affordances that capture salient 4IR action possibilities are developed within two categories: machine emulation of human cognition (expansive decision-making and creativity automation) and machine emulation of human communication (relationship with humans and intermachine teaming). Implications are explored in the context of human-machine coworking and the development of artificial intelligence safety regulations. Overall, the affordance conceptualization of the 4IR advances a new sociotechnical lexicon of action possibilities and their joint enactment in achieving humanistic and instrumental outcomes, enabling alignment of the scope of 4IR research with the scope of 4IR phenomena—and bringing humans back into the loop.
“Targeting Patients’ Cognitive Load for Telehealth Video Visits Through Student-Delivered Helping Sessions at a United States Federally Qualified Health Center: Equity-Focused, Mixed Methods Pilot Intervention Study”
Journal of Medical Internet Research, January 2023.
Marcy G Antonio, Alicia Williamson, Vaishnav Kameswaran, Ashley Beals, Elizabeth Ankrah, Shannon Goulet, Yucen Wang, Grecia Macias, Jade James-Gist, Lindsay K Brown, Sage Davis, Srijanani Pillai, Lorraine Buis, Tawanna Dillahunt, Tiffany C Veinot
Background: The task complexity involved in connecting to telehealth video visits may disproportionately impact health care access in populations already experiencing inequities. Human intermediaries can be a strategy for addressing health care access disparities by acting as technology helpers to reduce the cognitive load demands required to learn and use patient-facing telehealth technologies.
Objective: We conducted a cognitive load theory–informed pilot intervention involving warm accompaniment telehealth helping sessions with patients at a Federally Qualified Health Center (FQHC). We demonstrate how to design and report recruitment methods, reach, delivery process, and the preliminary impact of a novel equity-focused intervention.
Methods: Early into the COVID-19 pandemic a telehealth helping session was offered to patients at FQHC via phone. Graduate students led the sessions on conducting a telehealth video test run or helping with patient portal log-in. They systematically recorded their recruitment efforts, intervention observations, and daily reflection notes. Following the intervention, we asked the intervention participants to participate in an interview and all patients who had telehealth visits during and 4 weeks before and after the intervention period to complete a survey. Electronic health records were reviewed to assess telehealth visit format changes. Descriptive and inferential statistical analyses of the recruitment records, electronic health record data, and surveys were performed. Through integrative analysis, we developed process-related themes and recommendations for future equity-focused telehealth interventions.
Results: Of the 239 eligible patients, 34 (14.2%) completed the intervention and 3 (1.2%) completed subsequent interviews. The intervention participants who completed the survey (n=15) had lower education and less technological experience than the nonintervention survey participants (n=113). We identified 3 helping strategies for cognitive load reduction: providing step-by-step guidance for configuring and learning, building rapport to create confidence while problem-solving, and being on the same page to counter informational distractions. Intervention participants reported increased understanding but found that learning the video visit software was more difficult than nonintervention participants. A comparison of visit experiences did not find differences in difficulty (cognitive load measure) using telehealth-related technologies, changes to visit modality, or reported technical problems during the visit. However, the intervention participants were significantly less satisfied with the video visits.
Conclusions: Although a limited number of people participated in the intervention, it may have reached individuals more likely to need technology assistance. We postulate that significant differences between intervention and nonintervention participants were rooted in baseline differences between the groups’ education level, technology experience, and technology use frequency; however, small sample sizes limit conclusions. The barriers encountered during the intervention suggest that patients at FQHC may require both improved access to web-based technologies and human intermediary support to make telehealth video visits feasible. Future large, randomized, equity-focused studies should investigate blended strategies to facilitate video visit access.
“Equitable Research PRAXIS: A Framework for Health Informatics Methods”
Yearbook of Medical Informatics, December 2022.
Tiffany C. Veinot, Phillipa J. Clarke, Daniel M. Romero, Lorraine R. Buis, Tawanna R. Dillahunt, Vinod V.G. Vydiswaran, Ashley Beals, Lindsay Brown, Olivia Richards, Alicia Williamson, Marcy G. Antonio
Objectives: There is growing attention to health equity in health informatics research. However, the literature lacks a comprehensive framework outlining critical considerations for health informatics research with marginalized groups.
Methods: Literature review and experiences from nine equity-focused health informatics conducted in the United States and Canada. Studies focus on disparities related to age, disability or chronic illness, gender/sex, place of residence (rural/urban), race/ethnicity, sexual orientation, and socioeconomic status.
Results: We found four key equity-related methodological considerations. To assist informaticists in addressing equity, we contribute a novel framework to synthesize these four considerations: PRAXIS (Participation and Representation, Appropriate methods and interventions, conteXtualization and structural competence, Investigation of Systematic differences). Participation and representation refers to the necessity for meaningful participation of marginalized groups in research, to elevate the voices of marginalized people, and to represent marginalized people as they are comfortable (e.g., asset-based versus deficit-based). Appropriate methods and interventions mean targeting methods, instruments, and interventions to reach and engage marginalized people. Contextualization and structural competence mean avoiding individualization of systematic disparities and targeting social conditions that (re-)produce inequities. Investigation of systematic differences highlights that experiences of people marginalized according to specific traits differ from those not so marginalized, and thus encourages studying the specificity of these differences and investigating and preventing intervention-generated inequality. We outline guidance for operationalizing these considerations at four research stages.
Conclusions: This framework can assist informaticists in systematically addressing these considerations in their research in four research stages: project initiation; sampling and recruitment; data collection; and data analysis. We encourage others to use these insights from multiple studies to advance health equity in informatics.
“Roboviz: A Game-Centered Project for Information Visualization Education”
arXiv, August 2022.
Due to their pedagogical advantages, large final projects in information visualization courses have become standard practice. Students take on a client--real or simulated--a dataset, and a vague set of goals to create a complete visualization or visual analytics product. Unfortunately, many projects suffer from ambiguous goals, over or under-constrained client expectations, and data constraints that have students spending their time on non-visualization problems (e.g., data cleaning). These are important skills, but are often secondary course objectives, and unforeseen problems can majorly hinder students. We created an alternative for our information visualization course: Roboviz, a real-time game for students to play by building a visualization-focused interface. By designing the game mechanics around four different data types, the project allows students to create a wide array of interactive visualizations. Student teams play against their classmates with the objective to collect the most (good) robots. The flexibility of the strategies encourages variability, a range of approaches, and solving wicked design constraints. We describe the construction of this game and report on student projects over two years. We further show how the game mechanics can be extended or adapted to other game-based projects.