Computational vs. Social Preservation: What do Algorithms Require?
Note: This essay was originally posted on the SJSU iSchool’s Center for Information Research and Innovation (CIRI) blog. You can find the original post here.
The digital technologies that we use every day are controlled by increasingly complex algorithmic systems. These run the gamut from the very banal (say, Netflix recommendations), to the very consequential (say, sentencing recommendations in the criminal justice system). As these systems become more widespread and impactful, there are more and more reasons that we may want to preserve them and refer to them later. Perhaps you’re a software developer and you want to look back at how your product has changed over the years. Maybe you’re a politician or a lawyer and you would like to prove that an algorithmic system treated somebody unfairly. In these kinds of cases, it becomes imperative that algorithmic systems are preserved in a way that opens them up to future analysis.
Last summer, I began working with my colleague Ciaran Trace (University of Texas at Austin) to review the existing projects dedicated to preserving algorithmic systems. We both knew that several such projects already existed, but we were also struck by the fact that they didn’t seem to share any sort of unified approach. This began a research project that has produced a series of publications aimed at helping to cut through the current mass of disparate initiatives. The first publication from this project, “Preserving algorithmic systems: a synthesis of overlapping approaches, materialities and contexts,” was recently published in Journal of Documentation.
In our new article, we review existing preservation projects to see how they vary and how they’re the same. One of our main findings is that existing algorithm preservation projects can be placed into two main categories: those that view algorithmic systems primarily in terms of their computational functionality, and those that view such systems instead primarily through the lens of their social impacts. This difference in emphasis can have major impacts in the kinds of artifacts that an initiative actually collects.
On the computational side, preservationists often collect technical artifacts like source code and compiled binary code—but of course, these objects are only meaningful to people with the technical literacies to understand them. On the social side of things, many preservation initiatives focus on artifacts like oral histories and user-generated content. Unfortunately, these objects can only begin to explain the technical functionality of an algorithm, and often are unable to do so with any degree of specificity. In order to try and bridge the gap between these two approaches, we provide a large selection of examples from projects at many locations on the spectrum between being totally technical on one end, and totally social on the other. Ideally, any algorithm preservation project will include both the complex technical objects related to specific computational functionality as well as the more easy-to-understand objects that help to explain an algorithm’s place in the world.
Our hope is that by centralizing a selection of examples in this recent article, we can reduce the amount of legwork that preservationists have to perform in the future when they’re looking for a model to follow for their own projects. If you’d like to learn more about these topics, please read the full article in Journal of Documentation, or consider taking my INFO 256 – Archives and Manuscripts course, where I will discuss these topics in much greater depth.