One of the challenges in building a wider newspaper collection or to contribute to an existing collection is to balance various sets of metadata. In homogenising the metadata for a platform, a great deal of mapping and transformation is involved. Despite these efforts, the quality and the depth of contextualisation of the digitised newspaper varies widely. In this blog, we identified some approaches to include contributions from academics, as well as independent users, to face these challenges.

Structural Metadata: What Librarians Can Provide

Librarians and archivists have long worked with structured metadata, using common standards to make collections discoverable and interoperable. In the case of digitised newspapers, this includes basic bibliographic information as well as specific fields that are relevant for newspapers. Deciding which fields to prioritise and how to display them requires some understanding of user needs and a dialogue with researchers about what is feasible. This is clearly easier in the case of collections where there is a defined use case, and user group. Therefore, it may be necessary to have a fresh dialogue and develop additional metadata enhancements with each new project — providing a good argument for explicitly costing librarians into research grants! Where funds and staff time are more limited, there is a tension between providing metadata for access and easy searching for a particular project and providing the long-term preservation metadata that will allow the collection to be used in the future by additional user groups.

Metadata: Contextual Information

Users need to be aware of the limitations and strengths of any collection and how best to interact with and search it. When accessing an online newspaper archive not all users would be aware that there may be limitations, driven by accuracy of underlying OCR, and the completeness of what has been made available as it isn’t always clear whether the archive include all the issues from the original run. For newspaper metadata specifically, we might want to collect: full newspaper title (and variants), the date on the masthead, the volume, the issue, the printer and publisher, the number of pages, whether it is a “printer’s” copy, the edition, the original typeface and whether the newspaper was free or sold during its original distribution. It would also be helpful to identify the political view of the newspaper and the date range for which that view was expressed or avowed and whether there are typically illustrations or photographs. In determining the most useful contextual metadata for a particular collection, feedback and interaction with potential users is essential and existing library practices for identifying which users (of varying personality types) are likely to engage with the collections should be used as guide for seeking this feedback.

Access: Supporting Different Users

Finally, there is a need for dialogue and compromise when considering the access needs of different users. Researchers have different levels of technical expertise and libraries and archives are frequently in a position of only having the resources to support one or two access methods. For example, users of a digitised newspaper collection may be interested in free-text searching, structured searches, a platform with analysis tools, or text and data mining via an API. In this instance, there is a need for careful conversations with a range of potential users to ensure that the collections remain accessible for all without stifling advanced research.


More contextual information would be extremely beneficial, but it is not clear what is the best way to include this in library datasets; the technical infrastructure does not currently support this and issues relating to funding and accessibility shape how information can be integrated and made available. Moreover, when incorporating user content, there needs to be some kind of peer review on quality or an editorial board to curate and manage the inclusion of such information. In the end, there is the potential for significant collaboration between the collection and user management expertise of librarians and the technical and contextual expertise of users, but finding the right balance and form of collaboration will take honest, realistic compromise.

This blog is the product of a 40-minute collaborative writing session held via Zoom as part of the Exploring the Atlas of Digitised Newspapers and Metadata Workshop, held 19 August 2020. The editors of this site would like to thank the authors for their contribution to our ongoing conversation around the future of digitised newspaper collections.