406: Enhancing remote work sustainability through human-centered data science with Xuan Lu

Information Changes Everything: The Podcast. Xuan Lu, research fellow, University of Michigan School of Information. News and research from the world of information science.

June 25, 2024

Listen to UMSI on:

Information Changes Everything

News and research from the world of information science
Presented by the University of Michigan School of Information (UMSI)

Episode

406

Released

June 25, 2024

Recorded

2023

Guests

Xuan Lu is an assistant professor at the University of Arizona College of Information Science.

When this talk was recorded, Lu was a research fellow at the University of Michigan School of Information.

Summary

In this episode of “Information Changes Everything,” we look at the shift from in-person workplaces to remote work. UMSI research fellow Xuan Lu shares a human-centered data science approach to improving the sustainability of remote workers and teams.

Resources and links mentioned

Reach out to us at [email protected].

Timestamps

Intro (0:00)

Information news from UMSI (1:22)

Hear excerpts from Xuan Lu’s 2023 talk, “Towards Sustainable Remote Work: A

Practice of Human-Centered Data Science” at UMSI (2:35)

Next time: Beth Patin on the role of information in fighting epistemicide (21:32)

Outro (22:10)

Subscribe to “Information Changes Everything” on your favorite podcasting platform for more intriguing discussions and expert insights.

About us

The “Information Changes Everything” podcast is a service of the University of Michigan School of Information, leaders and best in research and education for applied data science, information analysis, user experience, data analytics, digital curation, libraries, health informatics and the full field of information science. Visit us at si.umich.edu.

Questions or comments

If you have questions, comments, or topics you'd like us to cover, please reach out to us at [email protected].

Xuan Lu (00:00):

Since the shift to remote work, the emotional issues became more significant due to this isolation. However, as remote workers are using online platforms to communicate in their daily working, is it feasible to use the platforms as a channel to monitor their emotions at scale?

Kate Atkins, host (00:21):

That was UMSI research fellow Xuan Lu during a talk at UMSI focused on sustainable remote work, and this is "Information Changes Everything," where we put the spotlight on news and research from the world of information science. You're going to hear from experts, students, researchers, and other people making a real difference. As always, we're presented by the University of Michigan School of Information – UMSI for short. Learn more about us at si.umich.edu. I'm your host, Kate Atkins. Today we'll hear more from Xuan Lu’s talk from UMSI’s 2023 Data Science and computational social Science seminar series. She highlights the accelerated transition from in-person workplaces to remote work that took place during the COVID-19 pandemic. Xuan Lu also introduces a human-centered data science approach to analyzing and improving the sustainability of remote workers and teams. Before we jump in a few other people and projects that you should know about. | According to a University of Michigan study, marginalized social media users believe their content is being suppressed through shadowbanning.

(01:36):

Despite platform denials, users engage in collaborative algorithm investigation to test and share suspicions of reduced visibility. | UMSI has introduced the global engagement fellowship to enhance undergraduate students’ international experience through workshops, study abroad programs and reflective activities. Participants who study in various countries will return to serve as peer mentors, fostering cultural intelligence. | Where else but in Britain would you find a Royal School of Needlework located in a palace no less from its home In Hampton Court Palace, the RSN has just launched a new website showcasing the first 100 pieces of its huge collection. Each entry is beautifully and meticulously annotated as you'd expect from needleworkers. | For more on all of these stories, check out si.umich.edu or click the link in our show notes. Now back to Xuan Lu.

Xuan Lu (02:38):

All of us have observed the workspace transformation under the pandemic in recent years and at the early stage of COVID-19 pandemic, many organizations shifted to working remotely as a response. Some companies even made it a permanent choice for their employees to work remotely, especially IT companies such as Amazon, Twitter, Microsoft, and Google. For example, in October 2020, Microsoft announced that they'll let more employees work from home permanently. And I believe almost everyone here have the similar experience working remotely. However, it comes to the late stage of the pandemic, companies are changing their workplace strategies and they are the office first, the hybrid and the remote first strategies. For some companies, the remote work experience has been so positive that they have chosen to go fully remote and some stand firm in their belief that going back to the office is the best option. And many more companies are currently choosing the hybrid approach as a trade-off.

(03:44):

So what is the future? What is the general future of workplace? Will people eventually go back to offices as what they did before the pandemic or where more organizations tend to remote work again or at least hybrid work? So actually we're not going to make a conclusion here because this is still an ongoing debate about where to go, but my opinion is that in the long term, there is still a trend towards the remote work. Okay, so I'll show you a brief overview of how business organizations have been changing in history. So by the 18th century, most businesses were organized as small, local often family affair. Then innovations were created for communications including carbon paper, telegraph, typewriter, and telephone. And in the 19th century businesses were allowed to grow and centralize on a large scale. And since the 20th century, new information technologies were developed like instant messaging like email and the internet itself.

(04:49):

So they're making it possible to collaborate from different places and businesses. And in the last two centuries, we can observe a significant decrease in the cost of communication. So this is a key factor of the evolution of business organizations. And another observable change is that the workers' freedom and flexibility decreased a lot during the centralization phase and it was increasing during the following decentralizing phase. So workers with this kind of freedom and flexibility such as following their own working schedules, such as making their own decisions often work harder and often show more dedication and more creativity. So it was anticipated long before the pandemic that decentralization will become more desirable in the coming decades as qualities like motivation, creativity, flexibility are important in many places. So this trend was accelerated by the Covid pandemic. Although most organizations and teams switched to remote work due to the pandemic, the paradigm of the remote work has long been considered as a potential trend of the future of work. And the emerging changes meant that the workplaces due to the pandemic posed

(06:13):

Many challenges to both the organizations and the individual workers. For example, how will the switch to remote work affect the workers' productivity, creativity, and innovation? And with the lack of communication richness in virtual teamwork, how to deal with the typical teamwork problems such as conflicts and misunderstandings in an effective way. And with a lack of emotional intelligence in virtual communications how will workers communicate and regulate their emotions and how to handle the boundaries between work and non-work and how to stay a good status of health and wellbeing. So these kind of challenges and many others are critical to the sustainability of both the organizations and the individual workers in the context of remote work. And there are actually persistent challenges that should be considered for the future of work. So in my work, we use a human-centered data science approach to address the challenges with an end-to-end methodology.

(07:15):

In particular, the data are generated and collected from the human behaviors. Then we use the data science technologies, especially machine learning and causal inference, to analyze the data at scale and derive findings that will ultimately benefit the humans. And in the future work demand the human behaviors we are focusing on are mainly workers working activities, including their communication activities. In addition, we also care about human health, which is hard to measure at scale, but can be reflected by the working activities to some extent. So we measure the outcomes related to sustainability from both the team level, including the team productivity, the team growth, and also the worker level, including the dropout risk and the emotional status and so on. And we use data science to understand them and to find potential interventions to help improve the sustainability of both teams and individual workers in remote work.

(08:16):

I'm going to introduce two parts of my work. One is about the team sustainability under external shock, which focuses on understanding the shock effects of the COVID-19 pandemic on remote teams. And the other is about workers sustainability in remote teams, which investigate how to promote the long-term contribution of remote workers. As mentioned before, the remote work paradigm occurred long before the pandemic. Many sectors practice remote work and the sector of IT and other communication services ranked number one in the percentage of remote workers in the year 2018 in European Union. We chose the open source software development community on GitHub to be the subjects because the developers in this community are probably the most familiar with remote work and remote collaborations. And also the data are available for research which can be accessed through public APIs and third party services. So the research question here is that for teams that have already started collaborating remotely before the pandemic, how will they react to the shock and what kind of teams are more resilient?

(09:30):

Answering this question can help develop more resilient teams in open source software community. And also as many organizations have switched to virtual collaborations, such an analysis would provide valuable insights for them to deal with future shocks such as another wave of the pandemic. So first of all, we'd like to make a visual examination about the overall effect of the shock on the GitHub community. The number of active repositories shows an increase in trend over years, and the forecasted values during the year 2020 also follows the same trend. And it is interesting to find that the observed values are significantly larger than expected since March 2020 when the COVID-19 became a worldwide pandemic, which shows that the shock does have effects on the activeness of the teams on GitHub. So it is likely that many new teams started to use GitHub or existing teams have returned to their GitHub repositories for remote collaborations.

(10:34):

And we can also see that the shock effects are changing over time. And the trends suggests a potential heterogeneity on different kinds of teams. From a causal perspective, the pandemic can be regarded as a treatment, then its effect on a team is the individual treatment effect or ITE, which is a difference between the potential outcome of the team. If we met the team exposed to the pandemic shock and the potential outcome, if we met it not exposed to the shock, then what we need to do is to estimate the individual treatment effect from data. The ITE can be calculated as the observed outcome of this team minus the expected outcome if the shock had never happened. However, this cannot be done with standard causal inference methods as we are facing with a non-conventional situation here. to be more specific in causal inference literature, the standard way is to estimate the counterfactual is to match treated subjects with untreated subjects that have similar properties in our setting, however, because the pandemic is worldwide, every team is actually treated and we cannot find a control group for the teams.

(11:49):

And to achieve this, we used data a year before to trend the models, which means that we use the team properties in the last quarter of the year 2018 to predict the outcomes in months T of 2019. And we proposed a comprehensive set of features to characterize the properties of the team from this dimensions, including the team size, the level of multitasking of the team members, the tenures of the developers, and their prestige and so on. Okay, so now we have the individual treatment effects of each team, then we can visualize the distribution of them in each month. And the mean of the distribution in a particular month is essentially the average treatment effect. When the mean is below zero, it means the outcome has declined because of the pandemic. So by comparing the distributions in these two data sets, we can observe that when the pandemic hits there are noticeable negative effects on team productivity, especially in the first three months of 2020.

(12:57):

Why for the team size, a clearly declining pattern can be observed from April. So here are the two key findings about this first part of the work. So we found that the effect of a shock can be immediately observed in team productivity while the effect on team growth is lacked. And we found that quite a lot of team properties are correlated with the resilience of the individual teams under the shock. So when recalling the big picture, we can see that when the subjects are the teams, the outcomes of the productivity and the team growth and their factors can be well measured. With the working activity data, well, it is hard to measure outcomes related to team members' health and wellbeing, especially at scale. So considering that this kind of status can be related to their working activities, we then investigate the possibility to send such status through their activities, especially in communications and explore their relation with some explicit outcomes that can be accurately measured.

(14:05):

So in the next work, we select the long-term contribution as this explicit outcome as it is related to the emotional and mental health status of remote workers, such as the motivation for work and the passion for work and the depression and burnout and something like that. And also the long-term developers or long-term contributors are of significance to project success in the open source development community. So we then move to the second part using human-centered data science to promote the long-term contribution of remote workers. Since the shift to remote work, the emotional issues became more significant due to this something like isolation, for example, in a BBC report, our remote workers are suffering from loneliness, stress and anxiety and some other kind of issues. However, at virtual workplaces without face-to-face interactions, emotions are much harder to observe. However, as remote workers are using online platforms to communicate in their daily working, is it feasible to use the platforms as a channel to monitor their emotions at scale? people are using different elements including texts, emojis, and figures to express themselves.

(15:30):

And if we split these elements into different layers and regard them as mediums of online communications, so we can have the medium of text, emoji and multimedia. So can we use some kind of technologies to extract emotions from each layer? Actually, there can be different benefits and challenges in leveraging each layer. And in my work I propose to use the emoji layer because they were originally created to help express emotions, and they are adopted by the Unicode in 2010, and they are ubiquitously used by worldwide users. So to check the visibility of doing so, we first measure the use of emojis in the whole GitHub community. So we calculate the proportion of posts containing emojis and also the top emojis in different type of posts. It's not spreading that people are using less emojis in the work context in comparison to social media like Twitter, right?

(16:36):

However, we can observe a relatively high level of usage of emojis in poor request comments and commit comments which are related to the communications about the code contribution here. And we can also observe that the emojis related to emotions are really popular on this platform, like the faces and the gestures are really popular. And also we can see the very specified usage of some emojis like the rocket, which means deploying, launching or shipping. So shipping some a project. And the tickets here used in issue tracking and technical support, which means that emojis are very well adapted to this kind of highly specified context. So what we did is to run OS regressions of emerging usage on work status measurements. And we found that actually indeed emerge usage is highly reflecting the working status of the individual developers. So the next thing we did to explore whether emerges can be used to predict the future dropouts of developers.

(17:44):

So here we use the dropout as an indicator of long-term contribution. And we looked at whether a developer who are active in the year 2018 would drop out in 2019, which means that they will conduct no activity on GitHub in that whole year. And we found that so nine module users, which means not even one emoji was used by them in year 2018. For these emoji users, they have a three times higher ratio of dropouts in comparison to emoji users. And then we use machine learning models to predict the dropouts for emoji users with the features extracted from their emoji usage in year 2018, we can get a really good performance of the models regarding both the accuracy and AUC. And we find that, okay, future dropout can be effectively predicted using emotive features alone. So we're very interested in this question. Can emojis be used as the treatments to reduce the future dropout risk?

(18:55):

So we frame this as a causal inference problem. And so the treatment here is using emoji in the year 2018, and outcome output is whether one will drop out in the year 2019. So to make the causal inference, we adjust for the confounders with propensity scores. And the propensity score estimates one's probability of using emojis given the confounder factors. And with this propensity score, we can estimate the treatment effect by matching emoji users with non-emoji users. So here is the result. We can see that after matching the probability of dropping out by using emoji can be reduced by around a half. So this supports our hypothesis that using emojis can help reduce the future dropouts. So the key findings about the second part of work constitute from the prediction task and also the causal inference task here and beyond what we have been talking today, my research actually focuses on the scope in the intersections of human technology and the future of work domain.

(20:18):

And my previous work also studies the user interactions with technology innovations and how to improve work efficiency with novel machine learning methods for the future work. Based on what we have done, there are a lot of interesting directions for future work. So regarding the software development community, we have layered several quite interesting outcomes. And I believe that the methodology and the insights from this software development community can be further generalized to other domains like the scientific workflow. And what we have done in the domain of future of work can be further generalized to broader communities and societal scenarios such as the future of education. There are discussions about which is better, to fully return to campus or to further promote the online learning platforms. And also the future of healthcare is also an interesting domain to look at.

Kate Atkins, host (21:21):

You can watch the full talk by clicking the link in our show notes. To learn more about upcoming events like this, visit us at umsi.info/events. And tune in next time to hear from Beth Patin, an assistant professor within the Syracuse University School of Information Studies during a 2024 UMSI data archives and Information in Society seminar. The talk highlights the transformative potential of libraries, archives, and museums in fighting against epistemicide.

Beth Patin (21:53):

As librarians, you should not be waiting for somebody to come in and challenge books. We know that it's happening. We should have a plan. As institutions, as teachers, as professors, we know that things are unequitable. What is your plan?

Kate Atkins, host (22:08):

That's in our next episode. Before we go, have you ever struggled to explain what in the world is information science? We can help. Our website features frequently asked questions and videos that answer everything you've ever wanted to know or share about information science. Visit us at si.umich.edu or click the link in our show notes. The University of Michigan School of Information creates and shares knowledge so that people like you will use information with technology to build a better world. Don't forget to subscribe to "Information Changes Everything" on your favorite podcasting platform. And if you've got questions, comments, or episode ideas, send us an email at [email protected]. From all of us at the University of Michigan School of Information, thanks for listening.