Resources

The Atlas of Digitised Newspapers and Metadata

The Full Report, 30 January 2020

Beals, M. H. and Emily Bell, with contributions by Ryan Cordell, Paul Fyfe, Isabel Galina Russell, Tessa Hauswedell, Clemens Neudecker, Julianne Nyhan, Mila Oiva, Sebastian Padó, Miriam Peña Pimentel, Lara Rose, Hannu Salmi, Melissa Terras, and Lorella Viola. The Atlas of Digitised Newspapers and Metadata: Reports from Oceanic Exchanges. Loughborough: 2020. DOI:10.6084/m9.figshare.11560059.

The Static Dataset, 30 January 2020

M. H. Beals and Emily Bell. (2020). Map of Digitised Newspaper Metadata v.1.0.0 [Data set]. Figshare. DOI:10.6084/m9.figshare.11560110.

The Dynamic Dataset

M. H. Beals and Emily Bell, eds. (2020). Map of Digitised Newspaper Metadata (Dynamic) [Data set]. https://github.com/AtlasOfDigitisedNewspapers/AtlasOfDigitisedNewspapers.github.io/tree/master/downloads.

External Projects

The Oceanic Exchanges Project

Through computational analysis, Oceanic Exchanges crosses the boundaries that separate digitized newspaper corpora to illustrate the global connectedness of 19th century newspapers. The project uncovers how the international was refracted through the local as news, advice, vignettes, popular science, poetry, fiction, and more. By linking research across large-scale digital newspaper collections, OcEx offers a model for data custodians that host large-scale humanities data.

Scissors-and-Paste

The Scissors and Paste Database is a collection of manual transcriptions from British newspapers (1789-1850), alongside originals from colonial and American newspapers. It aims to be a central repository of reprinted news across the 19th-Century Anglophone world. In order to facilitate the discovery of new reprints and reuses, the site also contains the Scissors-and-Paste-o-Meter, which allows users to view possible instances reprints and reuses across multiple digitized newspaper databases. The SAP-o-Meter currently maps reprints with the British Library Newspaper Collection (JISC1, 1800-1900), The Times Digital Archive (1800-1900), The London Gazette (1800-1837) and the National Library of Australia’s Trove (1800-1837).

The Viral Texts Project

This site presents data, visualizations, interactive exhibits, and both computational and literary publications drawn from the Viral Texts project, which seeks to develop theoretical models that will help scholars better understand what qualities—both textual and thematic—helped particular news stories, short fiction, and poetry “go viral” in nineteenth-century newspapers and magazines.

Impresso Project

The objective of this project is to enable critical text mining of newspaper archives with the implementation of a technological framework to extract, process, link, and explore data from print media archives. Supported by an interdisciplinary composed of computational linguists, digital humanists, designers, historians, librarians and archivists, Impresso will tackle the challenges of content enrichment and data representation, visualisation and analysis.

External Tools

DigitalNZ API Console

A tool for exploring the Papers Past API

DigitalNZ in the GLAM Workbench

This is a collection of Jupyter notebooks that help researchers use data from the DigitalNZ API. These include Visualise a search in Papers Past and Harvest data from Papers Past.

Passim

A tool for detecting and aligning similar passages in text.

QueryPic

This is a simple web-based tool that visualises searches in Trove and Papers Past’s digitised newspapers. Just enter keywords to see the number of matching articles per year. The charts can be saved, shared, and regenerated.

Random newspaper articles from Trove

This Jupyter notebook demonstrates a method for selecting a newspaper article at random from Trove’s digitised newspaper database. You can supply your own queries or specific facet values to limit the result set. This method is used in @TroveNewsBot and the remixable Trove Title Bot starter kit.

Scissors-and-Paste Console

A tool processing manifests of reused newspaper material, allowing for the creation of likely dissemination chains and accounting for typical travel times.

Trove API Console

This tool helps you construct and test queries to the Trove API without the need for an API key.

Trove Newspaper Harvester

This tool harvests metadata (and optionally full text and PDFs) from searches in Trove’s digitised newspapers. It’s available both as a Python command-line tool, and embedded in easy-to-use Jupyter notebooks that can be run online without installing any software. The harvested metadata is saved in a CSV file. If requested, the OCRd text of every article (stripped of HTML tags) will be saved in a separate text file. This repository also includes examples of analysing the harvest results using Jupyter notebooks.

Trove Newspapers in the GLAM Workbench

This is a collection of Jupyter notebooks that help researchers find, use, and share data about Trove’s digitised newspapers. For example, there are notebooks that explain how to visualise searches using API facets — Visualise Trove newspaper searches over time and Map Trove newspaper articles by place of publication. Other notebooks provide high-level overviews of the contents of Trove’s database of digitised newspapers — Visualise the total number of newspaper articles in Trove by year and state and Beyond the copyright cliff of death. Some notebooks extend the functionality of the Trove web interface and API to extract additional data and connect it to other systems — Save a Trove newspaper article as an image and Upload Trove newspaper articles to Omeka-S.

Trove Places

This is a map-based interface to Trove’s digitised newspaper titles. Either click on the map to view the 10 newspapers published closest to that spot, or browse the publication places of all titles. The location data is available for download as a CSV file.