The Atlas aims to facilitate more historically informed understandings of digitised newspapers for researchers across disciplines. The nineteenth-century newspaper was a messy object, filled with an ever-changing mix of material in an innumerable number of amorphous layouts; digitised newspapers are no different. Each database contains a theoretically standardised collection of data, metadata, and images; however, the precise nature and nuance of the data is often occluded by the automatic processes that encoded it. Moreover, no true universal standard has been implemented to facilitate cross-database analysis, encouraging digital research to remain within existing institutional or commercial silos. To overcome this, and to promote a remixing of discrete repositories, researchers must solve several technical and philosophical challenges.
At the start of the project, Oceanic Exchanges aimed to explore six digitised newspapers datasets from the North Atlantic, covering the nations of Mexico, the United States, the United Kingdom, the Netherlands, Germany, and Finland. These collections were chosen on two primary criteria. First, they were collections with which the Principal Investigators from each national team were intimately familiar, having previously investigated internal reprinting patterns with them. Second, the full machine-readable data and metadata sets was made available to the team as either public domain data or under widely available licensing agreements for text-mining. Once the project began, two additional English-language collections—Papers Past from the National Library of New Zealand and the (London) Times Digital Archive by commercial publisher Gale—were made available to the team. Because these datasets, alongside the Trove newspaper collection of the National Library of Australia, were well known to team members and were quickly accessible to researchers under clear licensing schema, they have been included in the Atlas despite being beyond its initial North Atlantic remit. Although we recognise that many other digitised newspaper databases currently exist, and that large sections of the globe are not represented in this guide—namely South America, Africa and Asia—we feel that the Atlas nonetheless represents the first international, multi-lingual collaboration of its kind, providing an exemplar of physically reconnecting siloed datasets and working with collections as data, their content, structure and metadata intimately bound together. Moreover, by opening the digital Atlas to public collaboration, we hope that additional collections will be chronicled and mapped by the wider community in the future.
Each database instantiation that we have surveyed uses different language to describe physical objects (for example a newspaper issue, edition or volume), layout terminology (article or advertisement, title or heading), and more abstract concepts (genre, document type). Moreover, each database organises these terms into different hierarchies of classification. Layers of nested items, containers, and technical metadata unique to different standards (and often unique to specific repositories) raise challenging questions about what data matters and what data can be dismissed as too technical to be of interest to the digital humanities researcher. Finally, although there is some truth to the claim that “everyone uses METS/ALTO”, or something very similar, when encoding digitised newspapers, this surface level consistency lulls us into a false sense of security. We are rarely comparing apples to apples—sometimes we are not even comparing fruit to fruit.
However, it is not the aim of this Atlas to provide a single “better” standard for digitised newspaper data, a catalogue of what should be across all collections. Instead, our aim is to provide a specific type of map for this rough and often perilous terrain; to allow everyone, regardless of their previous experience, to explore these collections in relative safety. It is hoped that the electronic version of this Atlas will continue to be updated with the most recent surveys of these collections, as well as with additional databases. To this end, we hereby release the collection, open access, on GitHub, so that the community of periodical researchers and digitisers may strive not to homogenise our knowledge systems, but to make them mutually intelligible and navigable, now and in the future.