Winter roster of Data Science/Computational Social Science Seminars speakers announced
The University of Michigan Data Science / Computational Social Science (DS/CSS) faculty have announced the speakers for the group’s winter 2022 seminar series.
The Data Science/Computational Social Science seminar series brings together a vibrant and diverse community of scholars whose cutting-edge research in information science, computer science and the social sciences aims to broaden our understanding of the important social and technological issues.
The events are scheduled for Thursdays at noon ET. All seminar talks will be held online via Zoom. Registration to attend the events can be found at umsi.info/DSCSS.
Coordinator for the winter 2022 series is Paramveer Dhillon, assistant professor of information at the School of Information.
Topics and abstracts will be announced separately, closer to each event.
The winter 2022 seminar schedule of speakers:
Networks and Identity Drive Geographical Properties of the Diffusion of Linguistic Innovation
Adoption of cultural innovation (e.g., music, beliefs, language) is often geographically correlated, with adopters largely residing within the boundaries of relatively few well-studied, socially significant areas. These cultural regions are often hypothesized to be the result of either identity performance driving the adoption of cultural innovation, or (ii) homophily in the networks underlying diffusion. While social scientists often treat either network or identity as the core social structure in modeling language change, we show that key geographic properties of diffusion actually depend on both factors, as each one influences different mechanisms of diffusion. Specifically, we find that the network principally drives spread between urban counties via weak-tie diffusion, while identity plays a disproportionate role in transmission between rural counties via strong-tie diffusion. Our work suggests that models must integrate network and identity in order to understand and reproduce the adoption of innovation.
Demographic Disparities in Wikipedia Coverage: A Global Perspective
Wikipedia has become one of the primary sources of knowledge on the web. It aims to document knowledge from the natural point of view. However, studies have identified the existence of various kinds of bias in Wikipedia articles. For example, gender inequality has been found in topics, word choice, coverage, and references on Wikipedia biographies. Prior studies have focused on only measuring bias in a single language edition of Wikipedia or in multiple languages separately. This helps us understand how bias can impact who becomes popular within one culture. It is still unclear how demographic bias limits people from passing the threshold of recognition across cultures. In this paper, using about 800K articles from WikiProject Biography over ten years across the 12 largest language editions of Wikipedia, we study global demographic bias in Wikipedia coverage across multiple languages regarding gender, ethnicity, age, and nationality. We measure global coverage in several ways, including page existence, length, and global consensus, which measures content similarity across languages. We find that minorities in ethnicity and nationality are still covered less than their majority counterparts. Fortunately, from 2010 to 2020, we observe a significant reduction in global coverage bias.
Can Machine Learning and Big Data Improve the Targeting of Humanitarian Assistance?
Targeting is a central challenge in the administration of anti-poverty programs: Given available data, how does one rapidly identify the individuals and families with the greatest need? Here we show that non-traditional “big” data from satellites and mobile phone networks can improve the targeting of anti-poverty programs. Our analysis compares outcomes – including exclusion errors, total social welfare, and measures of fairness – under different targeting regimes. Relative to other available approaches, the machine learning system reduces errors of exclusion by 4-21%. These results highlight the potential for new data sources to contribute to humanitarian response efforts, particularly in crisis settings when traditional data are missing or out of date.
Eliciting Thinking Hierarchy without a Prior
A key challenge in crowdsourcing is that majority may make systematic mistakes. Prior work focuses on eliciting the best answer without a prior even when the majority is wrong. Here without any prior, we want to elicit the full hierarchy where the higher-ranking answers, which may not be supported by the majority, are from more sophisticated people. We propose a new model, called the thinking web, that describes the hierarchy among people's thinking types through a weighted directed acyclic graph. To learn the thinking web without any prior, we propose a novel, powerful and practical elicitation paradigm, the Answer-Guess paradigm and it works as follows. First, we ask a single open response question and ask for both of each respondent's answer and guess(es) for other people's answers. Second, we construct an Answer-Guess matrix that records the number of people who report a specific Answer-Guess pair. Third, by ranking the answers to maximize the sum of the upper triangular area of the matrix, we obtain and visualize the hierarchy of the answers without any prior. We also conduct four empirical studies to demonstrate the superiority of our approach compared to the plurality vote and also validate our thinking web model: more sophisticated people can reason about less sophisticated people’s mind and the hierarchy can be approximately described by a directed acyclic graph.
Using machine learning to increase quality in healthcare and public health
Our society remains profoundly unequal. Worse, there is abundant evidence that algorithms can, improperly applied, exacerbate inequality in healthcare and other domains. This talk pursues a more optimistic counterpoint – that data science and machine learning can also be used to illuminate and reduce inequality in healthcare and public health – by presenting vignettes about women's health, COVID-19 and pain.
Understanding and Countering Problematic Information on Social Media Platforms
Online social media platforms have brought numerous positive changes, including access to vast amounts of news and information. Yet, those very opportunities have created new challenges — our information ecosystem is now rife with problematic content, ranging from misinformation and conspiracy theories to hateful and incendiary propaganda. As a social computing researcher, my work introduces computational methods and systems to understand and design defenses against such problematic online content. In this talk, I will focus on two aspects of problematic online information: 1) conspiracy theories and 2) extremist propaganda.
First, leveraging data spanning millions of conspiratorial posts on Reddit, 4chan and 8chan, I will present scalable methods to unravel who participates in online conspiratorial discussions, what causes users to join conspiratorial communities and then potentially abandon them. Second, I will dive into a special type of problematic content: extremist hate groups. Merging theories from social movement research with big data analyses, I will discuss the ecosystem of extremists' communication and the roles played by them. Finally, I will close by previewing important new opportunities to address some of these problems, including conducting social audits to defend against algorithmically generated misinformation and designing socio-technical interventions to promote meaningful credibility assessment of information.
You Might Also Think This Is Unfair: Operationalizing Fairness and Respect in Information Systems
Every day, information access systems mediate our experience of the world beyond our immediate senses. Google and Bing help us find what we seek, Amazon and Netflix recommend things for us to buy and watch, Apple News gives us the day's events, and LinkedIn helps us find new jobs. These systems deliver immense value, but also have profound influence on how we experience information and the resources and perspectives we see. The influence and impacts of these systems raise a number of questions: How are the costs and benefits of search, recommendation, and other information access systems distributed? Is that distribution equitable, or does it benefit a few at the expense of many? Are they designed with respect for their users, producers, and other affected people?
In this talk, I will discuss how to locate specific questions about the equity of an information access system in a landscape of harms, present some of my own group's work on quantifying and measuring systematic biases, and look to a future of engaged, human-centered research and development on information access systems grounded in the dignity and well-being of everyone they affect.
Responsible AI: Challenges and Opportunities
Artificial intelligence (AI) is a top national priority of the United States, and it promises to drive the next economic growth of our world. As AI and machine learning technologies continue to shape our future, it is critical that we understand the opportunities ahead of us, and avoid the perils. William Wang will describe the key recent advances in responsible artificial intelligence and outline the new challenges for building human-centered AI technologies, focusing on issues of fairness, bias, transparency and energy efficiency of AI algorithms.
Availability of the Gig Economy and Long Run Labor Supply Effects for the Unemployed
A growing number of American workers earn income through platforms in the gig economy which provide access to flexible work (e.g. Uber, Lyft, TaskRabbit). This major labor market innovation presents individuals with a new set of income smoothing opportunities when they lose their job. I use U.S. administrative tax records to measure take-up of gig employment following unemployment spells and to evaluate the effect of working in the gig economy on workers' overall labor supply, skill acquisition and earnings trajectory. To do so, I utilize penetration of gig platforms across counties over time, along with variation in individual-level predicted propensities for gig work based on pre-unemployment characteristics. In the short run, I show an increase in gig work following an unemployment spell and that individuals are correspondingly better able to smooth the resulting drop in income. However, individuals stay in these positions and are less likely to return to traditional wage jobs. Thus, several years later, prime-age (25-54) workers' income lags significantly behind comparable individuals who did not have gig work available. Among older workers (55+), I find an increase in gig work corresponds to a postponement of Social Security retirement benefits and a reduction in receipt of Social Security Disability Insurance (SSDI).
Understanding and Designing Graph Neural Networks as Graph Signal Denoising
Graph Neural Networks (GNNs) have shown their power in graph representation learning that have advanced various real-world applications in many domains such as biology and health care. As a result, a large number of GNNs have been developed in recent years. In this talk, I first connect numerous GNNs with a traditional optimization problem on graphs, i.e., the graph denoising problem. This connection not only provides a unified understanding of various existing GNNs but also paves a principled and innovative way to design GNNs as graph denoising problems. As an illustration, I then demonstrate how to design GNNs from this new perspective to advance real-world applications.
A computational linguistic model of organizational identification
Identification, the quality of associating the self with a group, is a fundamental human tendency. In contemporary market societies, organizations often serve as important sources of identification. Why is it that some people identify with the organization that employs them more strongly than others? Whereas existing work often attributes this variation to differences between organizational attributes or individual personalities, we trace it to differences in the network positions people occupy. To do so, we use person-specific word embeddings to develop a method for detecting organizational identification in interpersonal email communication. We validate our measure using established survey-based methods and apply it to members of three different organizations. Drawing on structuralist sociological theories, we demonstrate that individuals embedded in dense networks tend to exhibit higher organizational identification. Our findings and methods apply to other settings where the strength of group identification strongly predicts individual behavior.
Collective emotions on social media: Validity and applications to the COVID-19 pandemic
In this talk, I will first present two case studies on the relationship between social media measures for emotions at the population-level with emotions reported in surveys. They included both dictionary- as well as machine-learning based emotion measures, and one included data from two different social media platforms. Overall, we found that social media emotion measures closely tracked emotions reported in surveys from the UK and Austria in a time period before and during COVID-19. The results show that daily and weekly social media indicators can represent emotional trends in societies at large, but also highlight the need for further validation studies.
Second, I will present a large-scale study on collective emotions during the early COVID-19 outbreak. Social media emotion measures based on tweets in six different languages showed strong and enduring increases of anxiety and sadness expressions, together with decreases in anger expressions in 18 countries around the world. These changes were in part related to increases in COVID-19 cases and the stringency of measures against the spread of the virus. Taken together, these studies illustrate that social media emotion measures can provide added value in addition to representative surveys, in particular during unexpected crisis events.