Preserving and Analyzing Digital Texts: Symposium Recap (with Video)

I recently hosted an online symposium with support from the Andrew W. Mellon Society of Fellows in Critical Bibliography, in which we shared research concerning the preservation and analysis of digital texts.

Speakers included RBS-SoFCB Senior Fellow Ryan Cordell (University of Illinois at Urbana-Champaign), Emily Maemura (University of Illinois Urbana-Champaign), and Benjamin Lee (University of Washington).

Maemura began the session with a project examining preservation standards for web archives, highlighting divergence between researchers’ practices and desired materials, on one hand, and the WARC standard for web archives, on the other.

Cordell followed with a discussion of Large Language Models, the textual building blocks of contemporary Artificial Intelligence systems. He also highlighted the current lack of transparency in terms of textual sourcing for AI training datasets and discussed the implications for historical research.

Lee presented third, reviewing his recent work creating searchable databases of imagery from historical newspapers. In the talk, Lee showed off new tools that he has recently developed in collaboration with Library of Congress, which can be used by researchers today.

Taken together, these three presentations did an excellent job of representing current research in the area of digital text technologies. They also signal a way to apply bibliographic approaches in digital settings.

More than 80 participants logged in to attend the session. This level of attendance would likely not have been possible with an in-person format, and I was very pleased to bring together such a global audience for the speakers’ research.

James HodgesMay 5, 2023