Democratising the Archive

One of our hopes for 2030 is to help empower users to unlock the full potential of digitised collections through a knowledge of—and confidence in using—a wider range of tools and approaches. We know that users come to these collections with different disciplinary backgrounds, interests, experiences, and skill sets. How can we improve digital literacy in this area? These four areas seem particularly important for future conversations:

Interfaces and Tools

One means for addressing many of the issues faced by researchers working with digitised newspapers collections would be making them available in two formats:

a) for browsing online, with the full context of the newspaper page available, and b) for download of the texts.

The download option would enable researchers to use the texts with the software they wanted. This moves the onus for personalising the interface from the archive to the researchers. Metadata that would make this easier to handle includes standardised mark up of features like source and date. Additional features like page number, headline vs body, article type, author and so on would open up the data to different kinds of use.

Ownership and Attribution of Post-Digitisation Data

Often academic and family history researchers are interested in similar sources, even though their eventual use of them might be quite different. How do we design front-end access so that different kinds of use are open to all? How do we collaborate ethically across those areas? For example, I have seen professional family historians who had paid for access to particular documents, and then transcribed them, being understandably concerned that their work should be properly referenced and acknowledged. There are some quite complex ethical and intellectual property questions that are being worked out at individual levels in improvisational ways.

By 2030, it would be useful for genealogy websites to have taken on board that researchers using their resources for more than family history research are a market that they can develop other resources for and create opportunities for sharing or collaborative research on their sites.

Layout Research

For conducting research on changes to layout, it would be very useful to be able to measure the density of the text in different ways, including density of type, length and complexity of sentences, and level of vocabulary. This last, of course, would likely be an offline corpus study with many different and contested measures—you will net get agreement among scholars on the best way to measure it. For representing graphical elements such as headlines and pictures, it would be helpful to have a way to measure size and placement and aggregate them into collective measures (range, mean). Finally, the overall page number and size needs to be taken into account. All this could then be contextualized with a study of the ways people talk about reading newspapers in diaries, memoirs and fiction, in the hope of tracing the paths of what were most likely incomplete readings.

Collaboration and Initiation

For new users of these resource, there is a tension between creating intuitive interfaces that do not require tutorials or documentation, and empowering complex, nuanced research. The hosting of crowdsourced, cross-disciplinary annotated reading or resource lists, such as useful open-access articles, short videos, and how-to guides, would not reinvent the wheel but gathering together and signpost the excellent resources that already exist. Likewise, the development of a MOOC (Massive Open Online Course) and the presentation of model projects would allow users to develop confidence in cross-archival exploration through clear examples of good practice. Likewise, framing guides to using collections with particular users in mind, would provide multiple points of entry to a more limited range of access options, essentially mapping the potential user journey by starting with an initial questionnaire: what do you want to do with the collection and how much experience do you have?

This blog is the product of a 40-minute collaborative writing session held via Zoom as part of the Exploring the Atlas of Digitised Newspapers and Metadata Workshop, held 19 August 2020. The editors of this site would like to thank the authors for their contribution to our ongoing conversation around the future of digitised newspaper collections.