History of the Collection

Papers Past began in 2000 with a project to provide access to nineteenth-century New Zealand newspapers and periodicals. The initial aim was to make available 250,000 pages within a year. The website was launched in 2001 with 300,000 digitised pages, which were viewable and printable but not searchable, being page images only. In 2005, the library ran a pilot project using OCR to generate full text and to make the newspapers searchable. The website was then relaunched in 2007 with a new interface to reflect this new search functionality. In 2016, Papers Past was relaunched for a second time, with the addition of three more sections—magazines and journals, letters and diaries; and parliamentary papers–bringing together full-text content from several standalone websites, such as the magazine Te Ao Hou. This redevelopment also allowed for a mobile view.

Consulted Libraries

The Papers Past digital archive draws from the National Newspaper Collection, which contains newspapers from New Zealand, Australia, and the Pacific, part of the Alexander Turnbull Library collections in Wellington. The National Newspaper Collection maintains these newspapers in various formats, including paper, microfilm and digital media; their microfilm collections have thus far been the primary source medium for digitisation.

Several smaller projects have used other libraries’ collections in conjunction with those of the Turnbull. The early Auckland newspapers combined the holdings of the Auckland War Memorial Museum, the Auckland City Libraries and the Hocken Library, University of Otago with those of the Turnbull Library in order to get the most comprehensive coverage of three titles from the 1840s: the Auckland Chronicle and New Zealand Colonist, Auckland Times, and New Zealand Herald and Auckland Gazette. Likewise, the digitisation of Niupepa: the Māori Newspapers used digital files from the University of Waikato, which had been originally created from microfiche made from the Turnbull Library holdings. In some cases, if the Turnbull Library does not hold the master or intermediate microfilms, they will borrow another institution’s intermediate reels and get a duplicate set made for digitisation, as was the case for the Samoan newspapers, which were digitised from film held by the State Library of New South Wales (Australia).

Microfilming Projects

The library has been microfilming newspapers since 1953. For the years 1953–1977, this was focused on regional and local papers, and the filming was to save space rather than for preservation; physical copies of the major metropolitan dailies were kept. Between 1977 and 1984, efforts shifted to microfilming older newspapers rather than the current titles. The programme was reviewed in 1984 and the revised goal was to microfilm all New Zealand newspapers (current and historic). Initially, this retrospective filming only covered issues up to 1940 but was this extended to 1977 in 1992. In 1990, the microfilming of community newspapers was stopped; filming continues today, with a focus on the major contemporary newspapers.

Currently, microfilming is undertaken by an external vendor and the master reels are maintained for the long-term preservation of the collections and as an intermediary step prior to digitisation. However, microfilming for a number of current newspapers is still carried out by regional libraries, who also carry out some retrospective newspaper microfilming and contribute hard-copy holdings to microfilming projects organised by the National Library. The National Library also borrows physical newspapers from other institutions for filming if it does not hold a complete run.

Digitisation Projects

Newspaper digitisation at the National Library has three strands; the National Library programme itself, the collaborative programme and a partnership programme. Digitisation scanning was initially contracted to one vendor for the initial scanning and a second vendor to undertake the OCR conversion. Now, both scanning and conversion are done by a single vendor.

The Collaborative Digitisation Programme adds additional newspapers to Papers Past using the combined resources of community groups around New Zealand. Every year, the library invites applications for new additions, which are listed online. The programme started in 2010-2011 and is focused on small, local newspapers. Local libraries and community groups can apply to have microfilmed newspapers digitised or to have them microfilmed and then digitised in a subsequent year. The library handles all the microfilming, digitisation and uploading to Papers Past. There is also the opportunity for larger institutions to partner with the Library, normally over a three-year period. This follows a similar model to the Collaborative Digitisation Programme but is over multiple years and is generally used for the digitisation of the regional dailies. With both programmes, the applicants contribute towards the direct costs of digitisation, generally 50% of the cost.

Selection

As New Zealand has a relatively short print history, the first newspaper being published in 1839, it was thought plausible to digitise an extensive range of the nation’s historical newspaper collections for public access alongside continued microfilming for preservation purposes. In order to avoid copyright, the library initially only digitised nineteenth-century New Zealand newspapers, though this date range was later extended to titles published before 1920. Moreover, the initial choice of titles was largely based upon the recency of the microfilm processing, to ensure a high-quality transfer to digital media; as the library is now focused on filling geographical and temporal gaps in the digitised newspaper collection, the date of microfilming has become less significant as a criterion for digitisation.

Currently, the scope of the newspaper programme for Papers Past is newspapers published in New Zealand or the Pacific up to the end of 1950, using the Library of Congress definition of a newspaper. In 2015, as part of this expanded remit, the library added four titles from Samoa and the library aims to add more Pacific titles in the future, with a focus on the areas of the Pacific covered by the Library’s comprehensive collecting policy: American and Western Samoa, the Cook Islands, Fiji, Nauru, Niue, Pitcairn Island, Tokelau and Tonga.

The ongoing selection process tracks suggested titles and is focused on expanding geographic and temporal coverage and addressing user demand, considering also copyright status and the quality and availability of microfilm. It is managed by the library’s Digitisation Team, with selections approved by a small, in-house committee consisting of the Digitisation Team, the Service Manager of Papers Past, the Curator of Newspapers and Serials, and the Microfilm Librarian. Previously, digitisation has been prioritised for particular events; for example, the centenary of WWI led to the digitisation of materials from 1914–1918. Current priorities for selection are: geographical gap filling; additional material in te reo Māori; extending existing titles up to 1950; and additional Pacific material. A record of user requests is kept, and those requests are used to help prioritise titles for digitisation.

Preservation and Access

A separate record for the digitised title is created in the library catalogue, which contains a link to the title page on Papers Past. This page includes an essay about the history of the publication, using material from the library catalogue and Te Puna National Bibliographic Database, as well as copyright information and acknowledgements. The digitised copy acts as the primary access copy for the object with the microfilm acting as the long-term preservation medium.

Composition of the Collection

Selection Available

The National Library of New Zealand has now made available 5,789,376 newspaper pages from 847,719 separate issues. Of the 147 titles, 80 have issues digitised before 1880, with 118 having digitised issues between 1880 and 1920 and 93 having issues published after 1920. The majority of individual articles (roughly 80%) fall between 1880 and 1920; however, depending on one’s definition of an article, the percentage of pre-1880s content may be slightly larger than this number reflects; automatic zoning of articles cannot always separate individual articles without headlines, a common practice in the mid-nineteenth century, and may therefore combine several items into a single digital object.

In 2015, the National Library added a collection of historical newspapers published primarily for a Māori audience between 1842 and 1935. This is based on the digital Niupepa Collection developed and made available in 2000 by the New Zealand Digital Library Project, at the Department of Computer Science, University of Waikato. The source material for this digital collection is “Niupepa 1842–1933”, a collection of newspapers published in Māori or for a Māori readership, microfilmed by the Alexander Turnbull Library (1988). That same year, four titles from Samoa were also added, namely the Samoa Times and South Sea Advertiser (1888–1896), the Samoa Times and South Sea Gazette (1877–1881), the Samoa Weekly Herald (1892–1900) and Samoanische Zeitung (1903–1930).

Data Quality

Text

There is currently no independent study of the OCR accuracy of the collection.

Image

The Papers Past newspapers were digitised from microfilm as 400 PPI bitonal images. Images were originally delivered to users as archival-quality TIFF files, to eliminate the need to reformat or host duplicate images. These were served through a proprietary viewer called Daeja ViewONE. This changed with the 2007 relaunch and the move to Greenstone (now Veridian). Images are now provided to users as embedded GIF files, derived from the Modified Master files, but are still stored as 400 PPI TIFF files for preservation reasons. Titles are also now scanned as bitonal or greyscale, depending on the quality of the original newspaper and the filming.

Users can request a copy of the 400 PPI Preservation Master via email, and this will be supplied as long as there is no conflict with third-party copyright. Users can also download a copy of a page as an image file or as a PDF. The image files are large GIF files, while the PDF pages are 200 PPI. GIFs appear, when downloaded, as having a resolution of 96 PPI, but they are physically large files, which would print at approximately 23x29 inches at 300 PPI.

Metadata Schema

The data contains several different metadata types: METS XML schema is used for structural metadata, ALTO XML for the OCR content, and MIX for technical metadata. The library supplies basic metadata to the vendor, including title, date range, and bibliographic ID, with the rest captured as part of the scanning and OCR process. The technical metadata embedded in the image files, such as make and model of scanner and software used, is harvested as part of the scanning process. Manual headline correction and manual illustration caption correction is also done as part of the digitisation process.

Backend Structure

The data for each issue is stored in multiple files within a directory structure that provides date and title information, as follows: TitleAcronym/YYYY/TitleAcronym_YYYYMMDD. Within this directory, there are four sub-directories: one containing preservation TIFFs; one containing modified master TIFFs, the METS file, and text data in the form of ALTO-encoded files for each page of content, numbered using a four-digit standard such as 0001.xml; one containing the IE METS file, needed for integration into the National Digital Heritage Archive; and one for page-level and issue-level PDFs.

User Interface Structure

Web Interface

The newspaper web interface for Papers Past allows users to perform a simple search of the underlying descriptive metadata and OCR text or to browse images by date, region and title. Users can filter their searches by date, title, region, or content type—as of December 2019 these include article, advertisement or illustration. The full-text search can be filtered using standard Boolean operators. Once a result is selected, an image of the article with highlighted search results, is displayed in an image viewer. The viewer allows users to view the unhighlighted image or the OCR text. A breadcrumb navigation menu, such as

Newspapers > Auckland Star > 9 November 1907 > Page 13 > This article

allows users to navigate the issues or browse the title by date. Bibliographic and copyright information is also provided.

API

With the DigitalNZ API, it is possible to access and use a sub-section of the Papers Past metadata, including titles, dates, and URL. Currently API data is limited to material digitised before 2013. An API key can be obtained by registering through the DigitalNZ website.

Direct Download or Drives

The National Library of New Zealand is currently developing processes for making out-of-copyright data available to users in bulk format.

Rights and Usage

Web Interface

Newspaper material on the Papers Past website has been provided in good faith for users by the National Library on the basis that the newspaper publications provided from the nineteenth and early twentieth century are out of copyright, that in most cases digitised copies replace microfilm versions previously provided to the public and that for more recent newspaper publications, permission has been sought from the publisher to reproduce the material on this website. In many cases, the publisher has made the material available under a Creative Commons licence, most commonly CC-BY-NC-SA. Usage and copyright information for a title is available under “Using this item” next to the article.

API and Direct Download

Metadata available through the DigitalNZ API has been licensed for use by its owners, and API access has some restrictions, such as not sharing API keys and ensuring you identify the source. Papers Past data accessed through the DigitalNZ API is for non-commercial use only.

Re-Publication

When re-publishing material from this website that is out of copyright or for which you have gained permission from the copyright holder, the library requests that you acknowledge the National Library of New Zealand as the source of the information. If the material is republished online, they further request a link to where you found the information on this website.

Suggested Citation

Beals, M. H. and Emily Bell, with contributions by Ryan Cordell, Paul Fyfe, Isabel Galina Russell, Tessa Hauswedell, Clemens Neudecker, Julianne Nyhan, Sebastian Padó, Miriam Peña Pimentel, Mila Oiva, Lara Rose, Hannu Salmi, Melissa Terras, and Lorella Viola. “Papers Past.” The Atlas of Digitised Newspapers and Metadata: Reports from Oceanic Exchanges. Loughborough: 2020. DOI: 10.6084/m9.figshare.11560059.