Exploring the Atlas of Digitised Newspapers and Metadata Workshop Opening Plenary

Video Transcription

Good morning, good afternoon and good evening to everybody welcome to the Exploring the Atlas of Digitized Newspapers and Metadata Workshop. We probably will be having a few more people coming in over the course of the next five minutes but hopefully this will be a good place to start so that we can get the full two hours to really dive into our conversations. I would like to introduce myself to begin with. My name is Melodee Beals I usually publish or have things online as M. H. Beals, so that’s the same person in case you’re wondering. I am a lecturer in digital history at the University of Loughborough I was one of the initial principal investigators for the Oceanic Exchanges project.

The Oceanic Exchanges project as I mentioned in the introductory video was a two-year collaborative project between six different nations on historical newspapers and I was one of the principal investigators for the UK team as well as the work package leader for the ontologies programme. I will hand it over to my colleague Emily will introduce herself.

I’m Emily bell and I worked with Melodee at Loughborough as a research associate on that ontology strand which is what led to the Atlas that we’ve created. In my background, I’m a literary and a cultural history researcher so I approached this very much as someone who’d been using collections and knew a little bit about metadata and then did this deep dive into how these collections work and then for the last year I’ve been looking at ways of creating resources from the Atlas and from the work that we did for schools and for other ways for people to get into the collections.

Very briefly I’m going to elaborate on the videos which we sent you previously. The videos were meant to be a technical or at least semi-technical insight into how we approached the Atlas where it came from. I want to reiterate the Atlas was in no way part of our original remit for the project for Oceanic Exchanges. It was something that really came out of necessity when we were trying to do the actual part of the grant, which was the mapping. So we had intended to map one set of metadata to the other and we found that we could not do that without the contextual information from a historical and a literary or periodical scholar point of view. We just couldn’t make sense of it without that extra context so we tried to take all of that information that we gathered for personal understanding and package it in a way that would be useful to other people. So we didn’t think that the metadata really spoke for itself for any of these collections and we really wanted to provide individuals a head start to what we had at the beginning of our project.

So most of the material in the Atlas comes from internal documentation, trial and error, and speaking to librarians and digitization specialists and it was pieced together, which is why it’s not completely footnoted. We do have an extended bibliography but most of the information was pieced together from so many different sources many of them oral that we were just trying to put together a narrative that gave the information as easily and most digestibly as possible. So what we wanted to do was put all of that context in one place so other people could use it whether they were doing research across collections like we were whether they were thinking about how to apply what we found in terms of questions to ask of other collections or if they just wanted to use a database that was in a different language from their native language and they wanted a little bit more of the history or background of that that maybe they didn’t have access to in a lingua franca such as English.

So I really want to talk about what the Atlas is and what it isn’t. It is a guide to collections that we used but it is not a comprehensive look at every digitization program in the world or necessarily even representative of the majority of projects the Transatlantic Partnership for Social Sciences and Humanities, who funded us, was specifically looking at transatlantic collaboration. So we did have scholars from North America and Europe working together and we did very gratefully receive some assistance from the national libraries of Australia and New Zealand after the project began. But we really hope that the Atlas will now evolve to include far more projects that are either not national libraries but state or district libraries local research projects and of course big and small libraries in the global south particularly South America and Africa and also in Asia, which is not currently represented. So the Atlas project is coming to a conclusion at the end of this month in terms of our initial funding but I will be continuing it onward as part of my normal lectureship duties. So we are hoping that it will continue to expand and are seeking collaboration and seeking submissions in whatever format they can come to really make it representative of digitized newspapers going forward.

So what we’re going to do now is we’re going to dive head first into our discussion. So the discussion question I really want to start with and again please feel free to raise your hand if you want to speak verbally or just to type your messages into the chat window and we’ll read those out and discuss those more widely. So what based on your experience—whether it is working in a library to develop these collections, whether it’s working with hard copy newspapers and only beginning to work in the digital sphere, whether it’s somebody who’s been working with these collections in a very intense data-centric way or somebody who is doing their PhD and using digital collections—considering what about digitized newspapers now, what do you think or what you hope the digitized newspaper environment will look like in 10 years from now and what do you think we need to do as a community of users and providers to get us to that?

So I’ll start us off just while you’re all thinking and starting to type. There’s been a huge movement in digitized newspapers over the past 20 years. When we created our timeline on the Atlas it was shocking to me how rapidly everything really took place and I think over the next years it’s going to expand much further. But I think it’s also going to have the potentiality of either converging towards a single model of how digitized newspapers are digitized and presented to the world or it’s going to really expand into lots of niche different projects as people have the ability to create their own small digital collections. Personally I really like the bespoke collections because they can speak to specific audiences provide specific details and adding interest. People are always going to be more passionate about a specific project than they are about a general one, but I really hope that moving forward we don’t splinter so much that the projects become incompatible because the one thing we learned from Oceanic Exchanges is that in the 19th century, surely, but even today, these conversations do not happen in isolated groups, however you want to define them. The more we can cross connect these and connect them with people who are specialists on the paradigms surrounding the collections the richer our understanding is going to be. So I understand the urge for homogeneity and bringing everything together under one standard, but we have to do that in a really thoughtful way. I’ll hand it over to Emily to give her initial views and then we’ll start pulling comments from the group

So I’d like to preface this by saying that the question itself was a prompt given as a workshop run by Impresso which is a great project in Luxembourg, which has done a similar thing in trying to make it possible to look at the various different newspapers held by regional libraries. They made us think about this question and so I can just probably share the brief slide that I put together for that workshop on the thoughts that I had. This is very much shaped by my work on Oceanic Exchanges. We’re thinking about collections as data, which is a movement and there’s a great report on this, and what it means to think of collections as data. Essentially it means you’re thinking about it not just as half a million newspaper pages but what is the shape of this collection how could somebody access and do computational analysis on a whole collection as data. As people have been mentioning in the chat, there’s lots to do with open access that becomes important there you need to be able to access collections it’s very difficult to get copyright permissions for everything. I think there will be a greater expectation of transparency because of increased public digital literacy. We talk about digital natives now, but I think it’s not quite as true as we’d like it to be at the moment but it’s getting more true. As Melodee said increased integration between these collections hopefully. I think there’ll be greater attention to accessibility issues, and I do think that that partly comes down to metadata because if there’s information that’s just visual like mast heads on newspapers it’s not particularly accessible just to look at a scan of a newspaper page. So actually improving things like what’s recorded in the metadata makes it possible to have an interface or a website or a search function that can do more things in terms of accessibility. And then finally crowdsourcing is a slightly conTroversial one. it’s a good way of creating new collections and enhancing what you’ve got but we found that a lot a lot of providers are quite hesitant about sharing control and giving people the ability to create new tags to alter transcriptions and things like that but there are lots of other projects outside of digitized newspapers that are getting a lot of people on board to transcribe manuscripts and things like that. So I think in years there’ll be more of that more expectation that users can be helpful in terms of how these collections become searchable for future users. So those are my original thoughts and I’ll stop sharing my screen now.

I really hope that digitized newspapers become much more integrated with the value-added data or the indexing or other things that people outside of the libraries are creating for their own uses and there’ll be some way of feeding that back into the collections round tripping the data back. That’s my hope for the next years.

I’m seeing quite a lot in the chat and I do really want to open it up for voice if anybody wants to, but I’ll just talk about a couple of the things that are in the chat first of all. A lot of things that I’ve seen are about open access and we found that really interesting as we were looking through these collections. Copyright law, for those of you who are not really aware of the history of copyright law, obviously copyright law began in different nations at different times, but the first international Berne Convention to standardize copyright protections across the world, or in many countries, specifically excluded newspaper content. It was considered, along with scientific journals, ironically, as a public good—something that everyone, once it was published, would be accessible. That exception got more and more constricted as the 20th century progressed until newspapers became just as copyrightable and copyrighted as any other texts really. That has presented a lot of difficulties now. There are a lot of librarians in the group so I won’t speak on your behalf ,you’re welcome to speak by yourself, but our understanding is that in general there was a huge desire to make everything available open access to make everything available for free but there were a lot of moving parts and there’s a lot of goodwill by some of these publishers in making them at least partially accessible online. That that needs to be mediated with. So I wonder if maybe that’s a good place to start does anyone have any thoughts on how to make things more open access considering we don’t control the legislation and the copyright intellectual property of the world.

The other thing that I thought was interesting that was coming up in the chat was the interfaces being more visually friendly. That was something I thought we noticed a lot as well. There were a lot of people who were commenting that they there were lots of improvements going on to different interfaces but that they didn’t want to lose things as they were gaining more things. They didn’t like things being simplified necessarily. They liked having lots and lots of options. So if you want to add a little bit more to that—is it just about it being aesthetically pleasing or nice to look at or is it about being navigable and being able to interact with it in that way?

There’s been some interesting points, as you say, about open access and that there are some collections that are still publicly available. I think you have a story about—is it Finland?—where somebody hugs? Do you want to tell that story?

So in one of the conference papers in the IFLA conference proceedings a Finnish librarian says that the making the Finnish newspapers open access was so popular in Finland that librarians were actually being hugged on the streets by people who were using the collections. I thought that was really lovely because it’s something that, you know, I think people find them so valuable and they find them, so not just for their work, but just for their sense of community ,and their sense of family, that I think hugging a librarian—you know so long as they’re okay with it is—is a really appropriate response considering the amount of work that goes into making these collections available.

So there’s comment about improved searchability better quality scans, more user-friendly platforms, and being able to access published layouts as well as just text. So, as well as optical character recognition, we’ve got optical layout recognition, which is sometimes called but it’s not quite the same thing, that does try and capture some elements of the layout. But with newspapers it’s so variable and I keep sticking on this point that you can’t look at mastheads specifically. Mastheads tell you so much about the political leanings of a newspaper and the audience and all the kinds of things that come up there. You just have to look at a front page; you can’t actually compare how they change over time, which might show how audience changes over time or how political affiliation changes. So I agree the layout would be very useful to be able to actually pinpoint specific things.

Torsten says a special issue is small archives that often have relevant local newspapers but no resources for digitization processes; so could institutions, who digitize larger collections, support such small archives. That’s a really interesting point. We’ve spoken with some of the other larger institutions, so the national libraries of various countries and there’s a difficulty there in terms of an idea of I guess imperialism, in a way. A lot of these smaller collections are very specialized they often have community groups attached to them and the idea of these big, well-resourced libraries coming in and providing access or providing digitization seems very sensible because that’s where the money is. But at the same time, it does create questions of ownership in terms of how the collections are going to be digitized and how the collections are going to be provided and integrated with these other resources. I think that that is something that really needs careful discussion and I absolutely agree I would really love for there to be more sharing of resources, especially for smaller institutions. I think the National Library of New Zealand and the National Library of Australia both provide ways for, essentially, providing materials and a bit of the cost in order to publish—especially the National Library of Australia does provide that—so there are mechanisms available but it is something that is difficult to navigate just because there’s a tension between centralization and the passion in the community that surrounds some of these smaller periodical collections.

I think we’ll just take one last round, but I think one comment I saw go by really quickly was the idea of a collaborative index and I love this idea. I love the idea of a collaborative index. When we spoke to Gale Cengage—and I don’t know if there’s anyone from Gale here, hello if there is—they talked about the Times Digital Archive and the fact that there is the Times index which is one of the most well-known multi-volume indexes of newspapers. They didn’t think anybody would want a digitized version of it. They thought that once you had full-text searching that was all that you needed. So I think it’s really important that researcher voices get heard; that actually that knowledge creation and knowledge categorization is really important. I think maybe you meant by index just where are all the newspapers, but that would be the same thing is that organizational data is really valuable to researchers.

The other thing that was coming up a lot was the idea of easier access to downloads or for mass downloads and that’s something we definitely noticed a lot of. There are certain of our collections—so the National Library in the Netherlands, for example—where you can phone or email them up and say this is what I want to do, and they can put together a collection for you. There are other ones where you can download directly from them in, but this is something I would really love people to put in the chat or to feedback in in feedback: what is your use of these newspapers in terms of the web search or the API—if anyone uses APIs—to access them or if the best way really is just “can I please download everything into one giant file onto my computer” because that was my personal preference as a researcher and it seemed the weirdest and the most foreign whenever I asked for it. So maybe just how you guys would like to interact with these newspapers, the ones that are freely accessible anyways, and what accessibility you would like and why. I think saying “I just wanted all my computer” can sound a bit strange unless the reason why is fully explained first.

A couple that I saw that linked together were about accessibility issues, which is something that hasn’t really come up in any of our other workshops yet. So that was really interesting, the idea of audio transcriptions or general better accessibility and I think that’s really valuable. There’s a lot of concern, I think with the way images of newspapers are displayed sometimes in in java or flash viewer sand that’s not useful for assistive technologies. It makes it very difficult and I think most people who interact with it don’t find it particularly nice either; they find it restrictive on getting the images they want. They just want to right-click and save the image. So I think developing better integration with assistance technologies and making things more accessible would be beneficial to everybody really.

The other main one I’ll point to, before handing it back, is, again, the better idea of what’s out there. What are other libraries or collections doing, how would they put their stuff together. That’s something we were hoping to get kick-started with the Atlas. There was a really good sense among librarians that they were constantly copying each other or trying to learn best practice from each other, but that only seemed to be working through serendipitous connections at conferences or going to workshops at each other’s libraries and there wasn’t a central repository for “this is what we did and this is how well it worked, and this is where it didn’t work.” I think this is something that would be really helpful going forward, just some sense of shared conversation.

Some other issues people are bringing up: Beverly mentioned sustainability—projects disappearing and that’s something that you see a lot with sub-collections, where people put a lot of work into a sub-corpus of newspapers and then it disappears as the funding disappears. That’s a big problem.

Lara mentions greater linguistic diversity, so more non-European languages present and tagged in digitized corpora. As we mentioned in the video, metadata itself is often in English. The fields are in English, which I think is possibly a barrier to access for some collections or it might encourage people to use different standards.

Open access to collections to integrate with linked open data. Open access is a big one, I think, and some of the collections we’ve worked with are fully open access. So Trove, Delpher, the Finnish collection—where Melodee has a story about librarians being hugged in the street—

I just thought it was lovely that the Finnish librarians, the librarians of the National Library of Finland, because it was such a community driven project to digitize the newspapers to provide something back to the community for local identity and local heritage—librarians were recognized on the street and people would come up and hug them and thank them for digitizing their newspapers, which I thought was lovely and, so long as the librarians are okay with it, something we should consider.

In terms of integration, Eileen mentions integration with something like the Waterloo Directory, which has circulation figures, political affiliation, which would be very useful, and I’d like to mention again the Impresso project, which is the Luxembourg aggregator of various newspaper collections. They do have some things like political affiliation within the metadata depending on whether the library that they’re pulling from has that information. But it’s really interesting in terms of these things because the question then becomes how do you trace that change over time as the political affiliation changes as the circulation figures change, even as the title of the newspaper changes? That is, in the metadata. The British Library, particularly, has variant titles recorded quite well, but so many of these things like political affiliation do change over time and how do you capture that in the metadata?

There’s a couple notes as well from Megan and from Bob and a couple others, I think, about the level of detail in the metadata. That’s something we really struggled with because there was a lot of times we assumed that the metadata should provide information on, for example, like Megan says, the particular issue that was scanned. I thought maybe that metadata was just hidden somewhere and it’s not in any of the databases that we looked at. It’s very difficult to figure out, especially when collections were borrowed in order to complete runs. You can’t just assume that it’s the copy that that library has. There are lots of individual issues that are not individually provenanced. So that issue, provenance data being metadata, would be really helpful. And having clear metadata for the editor, for example, of that particular year or month because sometimes if it’s included in metadata in the MARC or the library catalogue record, it’s for the publication as a whole. So you just have to hope that that editor was still alive and working when it’s being attached to the newspaper you’re looking at or you can’t say anything about it.

Eileen mentions visualizations to allow users to get a quick initial sense of the coverage of collections. Some of them do this. Again, the Impresso project has literally a graph that shows what issues they have and what they’re missing. We’ve got an Impresso person [in the room]—to clarify, so Maude says the political orientation was added manually by the historians of the Impresso team. Sorry for getting that wrong! And it’s Swiss-Luxembourg, not only Luxembourg! So thank you very much for coming , Maude; I’m going to evangelize the Impreso Project.

Downloading search results and metadata as manipulatable data. I think that’s something you could do with an API, is download some of that, is that right?

So, for collections like Trove and for Europeana, as well, either now or in the near future you’ll be able to download batch collections of the metadata you can also do that somewhat with the National Library of New Zealand, but it’s not always labelled the same way that it is in the back end collection. So they have, I was going to say user friendly metadata fields, in the API. So sometimes it’s not always clear what it’s referring to and the specificity of that information. So, yes, if I could just download just the metadata as it is in its completeness from the back end, I think that is a great suggestion moving forward.

There’s also a historian in Britain—whose name is escaping me at the moment—but she actually went through the 18th century Burney Collection for one of the newspapers and did issue by issue what was digitized, what existed in the library, and had ever existed, to show the difference in date coverage. It was this really telling graph about the inconsistency about what was actually digitized and what it looked like was digitized based on the date ranges in the collection. So I think visualizations are so helpful and they’re not done enough.

There’s been a few comments about persistent links and identifiers or things like standardized fields so that named entity recognition can be more consistent across collections. That’s again something that we’re trying to, we’ve been trying to draw attention to, where the fields are used. It’s quite interesting with a lot of the collections that authors are not often identified. It’s not often in the metadata because unless it’s printed on the article, which it wasn’t for most articles, particularly in the early 19th century, the fields are just not there. So some of the collections had author where there was a pseudonym or a pen name, but most of them did not because there just wasn’t the data for that. So that again requires a lot of input from something like the Waterloo Directory, which has been doing that very well, but it’s just not there in the in the metadata at the moment.

I think I’d like to add one thing about the diversity of newspapers that have been digitized especially for non-English language or non-“the language of the main archive” that’s holding the newspapers. One of the difficulties we found talking to libraries is that sometimes the non-main language newspapers were donated as particular collections by individuals and they have different rights and curatorships attached to them. So the libraries don’t often have the remit or the permissions to go ahead and digitize that as part of their normal digitization programme. In some places there’s also been cultural sensitivities about it, as well, where they’re concerned that about a “dominant culture” going in and taking a smaller culture’s periodical material and digitizing it. Will it be seen as colonial or imperialistic? Therefore they’re very hesitant to do that without a lot of vocal support from the communities that they’re digitizing. So this is something that I think if minority language or minority culture groups are keen to have their newspapers digitized to make that clear to the librarians that, yes, please, this is not seen as a colonial act that we actually do want this to happen. Or, to work with the libraries directly to make sure it doesn’t go down any weird avenues. That is something that came up in multiple institutions—this concern about cultural appropriation or digitizing things that had been, in the case of a couple places, obtained through what would now not no longer be considered ethical practices—things that had been inherited by the libraries from 19th century predecessors.

What I would like to say now is that this initial plenary session was just to get you thinking and get some ideas of different topics that you might want to discuss more in the heart of this workshop. And the heart of this workshop is going to be the what is affectionately referred to at the software sustainability institute as speed blogging.

Chat Transcription

From Workshop 1

  • For 2030, I am hoping for high OCR and NER quality standards, cross-language search capabilities, and widely accepted open licences, for a start ;-)
  • Open access, including wider and easier bulk download. Going through the web will never be as quick or flexible as doing things on my own computer.
  • More visual friendly. A lot of websites are too ugly and hard to deal with. More user friendly.
  • +1 on open access, also user-friendly front ends that immediately let people see what they can do with collections and how they can contribute (even just a big “help out checking OCRed text!” button; it’s so important to let people feel like they can participate/help with their own expertise)
  • Hopes for 2030 to be able to search through newspaper databases in a much easier format. Sometimes certain keywords bring up the newspapers required but it can take a while to find the ones needed for your research topic and can be very time-consuming.
  • Hopes for 2030: I agree we need a balance between standardisation and design for specific audiences. I think more open access would help here as the data could be made available to be used in different interfaces.
  • I hope there will be at least some standardised cross collection systems for comparing data. Even something as simple as an NGram facility would be very useful
  • 2030 - linkages across knowledge systems
  • Hopes for 2030: I agree on the need for balance. Personally, I hope for interoperable (Meta-)Datasets and Fulltexts to view in different, specialised Front-Ends
  • Also, automated OCR correction: especially as so many errors in c18th text are predictable due to long s and ligatures.
  • By 2030 (I can only speak for English Language Newspapers) I would want to see a collaborative “one online index” to all newspapers whether digital or non digital held by institutions and other holders with appropriate meta
  • More hopes: Also more detail on version changes as some collections change continuously and that can make ongoing projects problematic
  • Open access, some common metadata standards, users feeding back their enhancements to metadata at the end of projects
  • I wish for platforms which provide newspaper corpus composition and analysis, but much more independent from specific collections and their specific interfaces
  • By 2030: I hope to see improved OCR accuracy and search functionality.
  • I couldn’t do my work without the assistance of librarians. So more librarians - and better paid - in 2030!
  • I would like to see improved searchability, better quality scans and more user friendly platforms. I also want to be able to access published layouts as well as just text.
  • I would echo the desire to access published layouts as well as just text. For those of us who work within book and publishing history, seeing the actual artefact is essential.
  • I think there is an argument for multiple frontends to cater for different users needs
  • I would like to see more collaborative working together in areas such as workflows, processing pipelines, interfaces, etc so that we can see more standardisation of approaches (and Libraries not all having to develop unique solutions from scratch)
  • A special issue for me are small archives: they often have relevant local newspapers, but no resources for digitization process. Could institutions who digitize larger collections support such small archives?
  • Just finding relevent guidance is difficult, so even publishing guidance on best practice is valuable
  • Trove in Australia seems to have a good partnership model for smaller archives. The accessibility it opens up for remote communities is a real boon
  • Also, integrate machine translation in systems to help with multilingual access. MT has a lot of issues overall, but particular systems are getting remarkably good for some language pairs, or at least good enough to make projects substantially better. You mentioned in the videos that you had to rely on some project collaborators to do unpaid translation work; this is a perennial problem, and relying on human translators will never be enough.
  • This is what we do with the Texas Digital Newspaper Program–coordinating with local partners (public libraries and publishers together) to obtain grant funding to build open access to their newspaper collections. Working with publishers+libraries here enables us to obtain publisher permission for open access.
  • In my digital project I included the OCR coupled with a snapshot of the actual image which can be expanded which I think users find helpful, to validate
  • We are working on downloadable textual dumps in various formats—including plain text and tsv, to enable research, for people to use in various analytic apps with no programming knowledge.
  • My team created an annotated index for a 40 year run of one magazine–invaluable, but a lot of work!
  • I use data from the hard drive of the TDA because we needed the full text for many years (a corpus linguistic approach). I use the interface to contextualise by browsing particular days / examples
  • Yes, download options would be amazing because then I could standardise the software used for analysis
  • I would like to be able to both search by text but also browse issues because sometimes I don’t have enough key words to find what I’m looking for.
  • Reason why bulk downloads: it’s quicker and more flexible doing things on your own computer rather than over the web via a restrictive interface.
  • APIs have been useful in the past for my research (on dictionaries, not newspapers though), but depends on response format!
  • I really like using Gale DSL but my University doesn’t have a subscription
  • I use the Trove API and find that it is extremely useful (and I’’m only getting started).
  • Just an example for something that can only - or best be done with the entire collection: word embedding
  • Keeping non-academic users in mind can be important for funding–persuading governments and other bodies that there is a wide interest in such resources. Family history might finance our research!
  • No data provider could ever foresee what the researchers would want to do with it, so bulk downloads is actually the easiest option to make everybody potentially happy :)
  • From an initial UK and Ireland perspective the Findmypast / BL collaboration has made much more content available

From Workshop 2

  • Not behind paywalls?
  • A way of searching across multiple collections
  • Better fulltext searching (OCR, hyphens etc!)
  • Ability to use the texts as a linguistic corpus
  • Ways of incorporating user-generated content (tags, transcription, etc)
  • Better integration with assistive technologies.
  • Greater diversity in news content that’s been digitized
  • Clear bibliographical data on original copy used for digitization— which exact copy, held at which library, scanned under which set of parameters (Were blank pages scanned? What was the method for dealing with foldouts?).
  • Greater awareness of what other institutions are doing, leading to less duplication of efforts
  • Access made quicker and easier.
  • A way of searching under editorial role–to be easily able to check who was editing at a certain type. Though practically this is just a wish list thing. It would be very labour intensive unless you plugged into something like the Waterloo Directory.
  • Integration with other full-text collections would also be amazing, e.g. search across multiple periodicals collections + Internet Archive + Hathi Trust
  • easier to find what is out there are narrow down what you want to look at
  • I’d like to see much more detailed metadata about the content and genres within newspapers.
  • Use of standard identifiers for things like places or people, to enable easier connections across collections (especially in a Linked Open Data perspective)
  • Sustainability. Some excellent projects have disappeared. Accessibility in terms of open access is extremely important too, I think.
  • Datafying texts by Named Entity Recognition and then integrate it with other types of collections.
  • Integration with something like Waterloo Directory info (on circulation figures, political affiliation, circulation etc) would be very useful to help my students understand the context of what they find in their searchers
  • I would like to see greater linguistic diversity: more non-European languages present and tagged in digitised corpora
  • Being able to not only tag, but link that tag to other documentation about an individual or place (e.g. census)
  • Using IIIF for presenting the images. That would enable reuse and combination.
  • Open access to collections to integrate with linked open data
  • Integration and translation, of course, but also more concern for user-interface for flexible ways of reading and searching: easier visual browsing “mode” to peruse layout / toggling to reading details, not having your browsing being tied to a search engine.
  • I would like to see proximity searching to achieve better hits.
  • Downloading search results and metadata as manipulable database (perhaps this is already happening and I’m not tapping into it.)
  • Visualisations that allow users to get a quick initial sense of the coverage of the collections
  • Hello everyone, just a small specification about Impresso: within Impresso the political orientation of newspaper was added manually by historians of the Impresso team, it was not available in newspaper metadata provided by libraries. (plus Impresso is a Swiss-Luxembourg project, not only Luxembourg ;) )
  • An idea of coverage not just of the issues of a particular publication, but also the publications covered as opposed to what was printed: that’s how Waterloo would be most useful according to me
  • Persistent links and identifiers
  • Identifiers (e.g. for authors/writers) would be so useful -if standardised with elsewhere
  • Serialized articles and stories linked across issues
  • Would it maybe be possible by 2030 to have actual mapping (like GIS style) so you can see on a world or regional map where newspapers were, cross-referenced with where the article content was set (for news articles, like crimes etc)
  • what about newspapers that are currently being published in both digital and print formats? It would be useful to have a sense of what is currently being preserved for future researchers, and perhaps the situation could be improved with input from the Atlas i.e., best practices for going forward with newspaper preservation
  • Could you tell us something about the differences between using the ordinary user front-end, versus the back-end? e.g. how would I know if it was possible to get other access, and whether it would be worth trying to do?
  • Making it easier to search images. Tagged by theme or who it depicts so you can search the images themselves and not just find them through the articles they are attached to.
  • Provenance of the digitized copy. 19th c American papers often went through the mail and have the addressees’ names written on them.