The Project

This Atlas is a product of Oceanic Exchanges: Tracing Global Information Networks in Historical Newspaper Repositories, 1840–1914. The project was funded through the Transatlantic Partnership for Social Sciences and Humanities 2016 Digging into Data Challenge and was undertaken by researchers from Universität Stuttgart, Staatsbibliothek zu Berlin, Universidad Nacional Autónoma de México, Universiteit Utrecht, Turun Yliopisto, Loughborough University, University College London, University of Edinburgh, Northeastern University, North Carolina State University, and University of Nebraska-Lincoln between 2017 and 2019.

The dramatic expansion of newspapers over the nineteenth century created a global culture of abundant, rapidly circulating information. The significance of the newspaper has largely been defined in metropolitan and national terms in scholarship, while digitisation by local institutions further situates newspapers in national contexts. Oceanic Exchanges brought together leading efforts in computational periodicals research from six countries—Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States—to examine patterns of information flow across national and linguistic boundaries. Through computational analysis, the project crosses the boundaries that separate digitised newspaper corpora to illustrate the global connectedness of nineteenth-century newspapers, uncovering how the international was refracted through the local as news, advice, vignettes, popular science, poetry, fiction, and more. By linking research across large-scale digital newspaper collections, Oceanic Exchanges offers a model for data custodians that host large-scale humanities data. Recent research from the Always Already Computational: Collections as Data project (2016–2018) has highlighted the need to reshape our understanding of digital collections and find ways to better support computational use of data, not as an afterthought but as part of the design of those data collections. Our Atlas is among the first major projects to align with the principles outlined by the Collections as Data project, including lowering barriers to use, sharing documentation, fostering interoperability, and doing so in an open access format.

The Atlas of Digitised Newspapers and Metadata arose out of the need for the data from different datasets to be transformed into a single unitary standard that could be inputted into project workflows across the project. At the most basic level, what was needed was a bespoke JSON format compatible with the text-matching software passim, the primary tool used by the project to identify textual reappearances across the collections, as well as discrete plain-text files of the newspaper content. Although the basic bibliographical and content fields of a database could be quickly identified to allow for unique identification of specific texts, a deeper understanding of the meaning of the metadata—and therefore its full potential for digital research—was difficult to obtain. The different vocabularies (such as Dublin Core, METS, ALTO, PREMIS, MIX) were used inconsistently and combined differently in different instantiations. In order to meaningfully connect these collections, the researchers needed to bring these collections together—their data, metadata and paradata—and then examine them as research objects in and of themselves.

In 2017-18, led by Paul Fyfe of North Carolina State University, Oceanic Exchanges gathered together fourteen instantiations of ten distinct digitised newspaper databases, detailed below, alongside histories of their creation, composition and licensing. In 2018–19, a team led by M. H. Beals of Loughborough University worked to catalogue the data and metadata available across these collections, to undertake detailed interviews with data providers and libraries, and to develop a robust taxonomy for discussing the digitised newspaper not only as a facsimile but as a research object in its own right. This Atlas represents our current conception of this often-misunderstood research object, an ontology that describes the relationships between a database’s internal components, between the data and metadata in different collections and between the digital object and its physical predecessors.