To manage the growing scale and task complexity of applied data science, data science workers often collaborate within and across teams. Yet, tools for performing and sharing data analysis are limited in supporting data scientists to work together. My research in human-computer interaction (HCI) explores barriers in real-world data science collaboration practices, and reimagines the workflow and interfaces for collaborative data science.
My work combines thoughtful qualitative inquiry of data science collaboration in practice with technical HCI work that leverages emerging computing technologies. First, I investigate the challenges in sharing analysis code and artifacts in data science along a series of collaboration scenarios, including the usage of synchronous editing in computational notebooks, the communication challenges for AI developers in multidisciplinary teams, and the strategies for reuse and sharing among data scientists in software teams. Second, to solve the challenges, I design, build, and evaluate computational tools that aim to help data scientists handle off work during collaboration, which include: (1) capturing the contextual links between messages and notebook elements; (2) leveraging NLP and AI techniques for automatically documenting data science code; (3) implementing data frame visualizations as a first-class citizen in data science programming environments for explaining the impact of code changes. Lastly, I design collaborative tools to help novice data science programmers learn and talk about programming. I explore a series of designs that make it easier for students to seek instructional help, or seek peer help for testing and reviewing code.
My work has been published at top-tier HCI venues (e.g., CHI, TOCHI, CSCW), and has received several paper awards. Several of my work has led to publications at the intersection between HCI and other areas, including venues like ICSE, EMNLP, and IJCAI. In addition, I pursue a broader impact of solving real-world problems by outreach to industry and open source community. Beyond scholarly outputs, my work has also been used to improve the authentic work environments data scientists use daily
Designing Future Computational Notebooks for Collaboration and Learning
Computing technologies allow people to work, learn, and socialize remotely but seamlessly together, particularly over complex computational tasks like programming and data science. Yet, collaboration in data science is often hard. Since data science is highly exploratory, the artifact and analysis often iterate fast. It is difficult to maintain a shared understanding across various collaborators. On the other hand, tools like computational notebooks provide a convenient approach for data scientists to run, document, and share analysis in a storytelling way. However, there are still many open-ended questions about how to improve the collaboration experience by designing better collaborative data science tools. For example, data scientists often neglect to keep updated documentation during rapid exploration, which results in computational notebooks that are messy and difficult to read; without strategic planning, working together in a shared notebook may block each other's work. My dissertation draws upon human-centered design techniques to identify barriers in real-world data science programming practices, and explore the design space of collaborative data science environments through tool-building
Fields of interest
Human Computer Interaction (HCI)
Data Science, Analytics, and Visualization
Educational Technology and Learning Analytics
BEng, Zhejiang University, 2016
MSc, Simon Fraser University, 2018