Science friction: Data, metadata and collaboration

To investigate data practices empirically, this project examined four large cyberinfrastructure efforts: the Long Term Ecological Research Network, the Center for Embedded Network Sensing, the Water and Environmental Research Systems Network, and the Earth System Modeling Framework.

Science continues to become more data-driven, collaborative, and interdisciplinary, demanding increases for interoperability among data, tools, and services. Metadata—usually viewed simply as “data about data,” describing objects such as books, journal articles, or datasets—serve key roles in interoperability. The project examined what is called “science friction,” which occurs when scientists from two or more disciplines work together on related problems, and found that metadata may be a source of friction between scientific collaborators, impeding data sharing. 

Start date: 10/1/2008
End date: 9/30/2012

Read More

Science friction occurs far beyond laboratories and e-science networks since scientists in today’s world are not the only ones who want to know about other people’s data. Email and other internal communications among scientists can sometimes be understood as metadata or significant information about the datasets and their interpretation, meaning that these communications can be misunderstood by others with whom the scientists share little common ground.

Scientists and supporters of science have long worked to improve the sharing, reuse, storage, and retrieval of scientific data. Today, efforts focus on advanced cyberinfrastructure: using networked computers, databases, and organizations to bridge divides among diverse scientific disciplines. Cyberinfrastructure divides into three main activities. First, large numbers of automatic sensors monitor subjects of interest, such as ecosystems and the Earth's climate, producing massive volumes of digitized data. Second, in many fields computer models have replaced laboratory experiments as the principal means of data collection, prediction, and decision-making. Third, increasingly vast data resources (scientific memory) are now available, but are often distributed across thousands of research sites and institutions, in numerous incompatible formats. 

For cyberinfrastructure-enabled science to deliver on its transformative potential, designers need better ways of understanding how scientists actually create and share data in practice, and how they use it to create new knowledge. The investigators studied how cyberinfrastructure is used in monitoring, modeling, and memory. These projects were spread across many disciplines addressing three important domains related to climate change concerns: ecology and environment, hydrology and water management, and earth system science. In conducting the project, the investigators developed innovative methods of distributed ethnography, collaborative history, and multimodal network analysis. 

The research argued that metadata represent a form of scientific communication, requiring precision and lubrication to reduce science friction. Precision makes it possible to join one part (dataset) more perfectly to another one. Lubrication refers to the processes by which people overcome friction without precise solutions or the need to modify components. Well-codified metadata products increase the precision with which a dataset can be fitted to purposes for which it was not originally intended, or can be reused by people who did not participate in creating it. Temporary, incomplete, ad hoc metadata processes act as lubricants in disjointed, imprecise scientific communication. This category of metadata frequently appears alone, in the case of datasets for which no metadata products exist, and has typically been brushed aside in the quest to achieve comprehensive, stable, permanent catalogs. 

The project partnered with organizations working to enhance the role of women in computing, to build database systems for American Indian communities, and to engage other groups often ignored in information infrastructure development. 

Collaborators on this project from outside of U-M were Christine Borgman (University of California, Los Angeles), Geoffrey C. Bowker (University of Pittsburgh), and David Ribes (Georgetown University).

To get more information about this project, please view the article “Science friction: Data, metadata, and collaboration,” located on Paul Edwards’ website.

Grants

Collaborative Research: AOC: Monitoring, Modeling and Memory: Dynamics of Data and Knowledge in Scientific Cyberinfrastructures, National Science Foundation: $1,249,911

 

The National Science Foundation (NSF) is an independent federal agency created by Congress in 1950 "to promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense…"