Validating Quality in Large-Scale Digitization: Metrics, Measurement, and Use-Cases

Project Description

Overview

This project examines validation of quality in large-scale digitization, specifically the HathiTrust Digital Library. The two-year research project is investigating possible methods for detecting and measuring errors and other quality issues within mass-digitized literature. It is also analyzing the potential impact of found errors on educational and scholarly use within a representative set of use cases: reading online, printing copies, mining texts, and managing print collections.

Impact/Benefits

The findings of this study will make a significant contribution to the field of information quality, and will inform digital repositories about assessing the quality of objects they have committed to preserving on a large scale. Understanding how to judge the quality of the HathiTrust digital deposits will help libraries make decisions about re-digitization of materials and about managing collections of print volumes with secure and useable copies held in digital repositories. The ability to assess and document the quality of volumes will pave the way for certification of these volumes in relation to specific uses, enhancing the decision‐making capabilities of users and stakeholders when selecting a volume or set of volumes for particular purposes.

Background

Ongoing mass digitization of books and serials is generating vast digital collections and transforming education and research at all levels. However, these efforts have also raised questions about value of the digital copies produced by such large-scale projects. For digital repositories and their communities of users to trust that deposited objects have the capacity to meet the uses envisioned for them, repositories must validate the quality and fitness for use of the objects they preserve. This project addresses some questions concerning the value of digital copies and takes a major step toward automating quality review and sharing the characteristics of digitized books and journals.
 

SI Lead Investigator:

Research Team:

Funding Partner:

  • Institute of Museum and Library Services

Amount Awarded:

$674,722

Start Date:

11/01/2010

End Date:

10/31/2012

randomness