Wrong Hierarchies; or, Appreciating the Value of Non-Authoritative Metadata for Digitised Newspaper Collections

In our last, we discussed the possibilities and initial efforts made by librarians and other collection holders to provide access, and in some cases even integrate, existing newspaper indices into digitised newspaper collections. These historical ontologies—systems of categorisation and structures of knowledge—were constructed by individuals who maintained a certain claim to authority. This authority is usually characterised as being intrinsic by virtue of known, scientific or otherwise correct methodologies but is most often identified simply through association with official institutions, with the methods and practical execution of these categorisations left largely unquestioned by those who rely upon them. Recent debates on the decolonisation of the archive have highlighted the dangers of assuming that authoritative ontologies are equivalent to natural or correct structures of human knowledge. More importantly for this discussion, though, is the idea that categorisation is a fundamentally human endeavour and as with most human endeavours is prone to and benefits from infinite variety.

As preface to this discussion, I would draw your attention to two veins of research on access and interaction on the internet. The first argues that the creation of the world wide web presented a particular opportunity for heritage institutions in widening access to their collections, transcending both the need to travel to locations where artefacts are held, but also the limitations of space in displaying and contextualising those materials. At the same time, this drive to provide access has raised concerns among providers about what materials should be made available. Certain artefacts and collections, including periodical material, were either historically obtained in ways that would not be deemed ethical in the 21st century, or, in their original cultural context, would not be made available to all—restrictions on ethnicity, religion and gender being commonly cited. Questions about democratic knowledge, and who has the right not only to access but also to limit access to historical materials, will likely continue to be a matter of debate as the technology for making vast heritage collections available online becomes more widespread and affordable.

The other debate centres not around permission but instead the means for locating and interacting with those collections. In the early 21st century, two main forms of non-authoritative indexing have appeared in the digital realm. The first is what I would refer to as non-authoritative curation and the second as social bookmarking. From Facebook and Twiiter, to Youtube and Instagram, to Reddit and the now largely forgotten Delicious, internet users quickly utilised centralised spaces—as distinct from individual websites—as a means of bringing together diverse materials from across the web into curated collections relevant to their particular interests or the interests of specific sub-cultures and communities. While early efforts usually copied or replicated materials into new virtual locations, centralised platforms now promote social bookmarking or linking to original materials, often held by authoritative institutions, and adding individual or community context through commentary or curation with other objects categorised by this specific community as related. The practices involved in social curation are not, of course, so very different than the annotated bibliographies produced by groups such the Research Society for Victorian Periodicals or the innumerable community newsletters and magazines produced by enthusiasts of all descriptions. The change is the unprecedented access people now have to these non-authoritative curators. In the past, offline curators developed reputations for accuracy or expertise that, while different from the authority bestowed upon institutional collection providers, was nonetheless very powerful. These were not based upon qualifications or institutional longevity but on the practical fruits of curation and democratic or market-based appraisals of their competence. As we have gained greater access to these non-authoritative curators, questions over the superiority of authoritative curation, or indexing, have naturally arisen, perhaps most popularly in discussion of “traditional” and “new” journalism.

Trove as an Example of Crowdsourced Ontologies

In the realm of digitised newspapers, the Oceanic Exchanges team found that many providers found the prospect of non-authoritative curation and social bookmarking very alluring. Many had posited the idea of adding such systems to their own collections but as of the writing of the Atlas, only Trove, by the National Library of Australia, had successfully implemented a full social curation system. Users wishing to engage with their collections can continue to do so in authoritative ways, through library subject headings and other cataloguing categories, but other means of navigating the collections, right down to the level of newspaper article, have evolved over the past ten years. Registered users are able to tag and add comments to collection materials that are visible and searchable by other users, registered or not, and machine-readable through the system’s API. Users can also create lists, or curated collections, of materials across the Trove database. What was surprising to, but very welcomed by, the Trove team was the level and rigour of engagement these tools engendered. Existing and new communities not only made extensive use of the centralised systems but created their own ontologies—their own systems for knowledge categorisation—that would be consistent, intelligible, and well documented. Like offline non-authoritative guides, these lists and tagging systems can be judged democratically based not on their correctness by virtue of institutional authority but on their practical use by those who wish to understand and engage with the heritage materials available. More imporantly, unlike authoritative ontologies, these not need be destroyed or removed in order to be supersceded by more useful and relevant ontologies in the future.

Future Possibilities of Integration and Exchange

Research into social bookmarking and tagging has produced mixed feelings about the value of non-authoritative ontologies. There is a gnawing sense amongst many that allowing users to create and share human-crafted ontologies should result in better, more intelligible and more robust systems of knowing than automated (or AI) web crawling. That has not been the case. To date, much of the literature considers social tagging and bookmarking to be a tremendously noisy process. Many tags are only used once, misspelled or unintelligible as relevant to the object to which they are attached. Other tags are over-used and genericised, well worn general stand-ins for more specific and nuanced categories. Yet, what these researchers only tacitly acknowledge, and what gives me great hope for the prospects of social curation for heritage collections, is the idea that, in the end, truth will out.

Human beings are deeply flawed, biased and inconsistent creatures. I love that about human beings. Everyone who reads a newspaper article comes to it with a different personal ontology, a different way of understanding the world. Sometimes these overlap; sometimes we mistake overlaps where they don’t exist or miss them where they do. But slowly, over time, we have learned to communicate with each other, more or less. No system of non-authoritative subject heading will ever be correct, and most will be far nosier and less consistent than authoritative ones. Yet, does that really matter? The research into social bookmarking suggests this. Even though many of these ontologies were arguably less robust or consistent than mechanical ones, they had the virtue of producing a very different structure and selection of entry points. I would argue that the creation of doorways is just as important as the creation of rooms. There is a lot of inefficient overlap and noise in the web of connections that exist across the internet, but do the ten-thousand sub-Reddits I never visit preclude the value I find in the six I do? To those who wish to block out the noise of the fake and irrelevant, I ask you to have a bit more faith in your fellow man—academic or enthusiastic—to understand the difference between “correct” and “useful” for a particular task.

The questions of access and organisation within authoritative collections and archives are likely to continue for some time more, but in the realm of the digitised newspaper, I can only applaud the efforts and faith of the Trove team and hope that we can work together to find other site of social curation.

Bibliography

Ayres, Marie-Louise Singing for their supper’: Trove, Australian newspapers, and the crowd.” Paper presented at: IFLA WLIC 2013 - Singapore - Future Libraries: Infinite Possibilities in Session 153 - Newspapers. 2013. http://library.ifla.org/id/eprint/245.

Noll, Michael G. and Christopher Meinel, “The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries,” in Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, null, 2008 pp. 640-647. DOI: 10.1109/WIIAT.2008.341.

Recker, Mimi M. and David A. Wiley “A Non-authoritative Educational Metadata Ontology for Filtering and Recommending Learning Objects”, Interactive Learning Environments, 9:3, (2001), 255-271, DOI:10.1076/ilee.9.3.255.3568.

Taylor, Joel and Laura Kate Gibson. “Digitisation, Digital Interaction and Social Media: Embedded Barriers to Democratic Heritage” International Journal of Heritage Studies 5 (2016), 408-420. DOI:10.1080/13527258.2016.1171245.

Democratic Access and Social Curation

Trove as an Example of Crowdsourced Ontologies

Future Possibilities of Integration and Exchange

Bibliography