Building and Extending Frameworks (Fall 2024 Research Update)

Frameworks have been a recurring theme for me over the last few months. In my current research practice, I’ve been building on the conceptual and methodological frameworks that I established while working with the late Ciaran Trace on preserving AI and other complex algorithmic systems. I’ve also been thrilled to see some other researchers picking up and building on the framework I introduced for studying online disinformation and conspiracy movements last year.

My most recent article, “Algorithmic futures: the intersection of algorithms and evidentiary work,” cowritten with Ciaran, and it came out a few months ago in Information, Communication, and Society. It’s the second part of a longer series of publications we wrote exploring the ways that archival knowledge can help to make automated systems more transparent and accountable. The series will conclude with our next publication, “The Role of Paradata in Algorithmic Accountability” in a book coming out via Springer sometime soon. 

When Ciaran asked me to start this project, I was initially skeptical, because I had such distaste for the current wave of AI hype. Just a few weeks ago, the RAND corporation published a report saying that 80% of commercial AI projects are basically failures, and that they have wasted billions of dollars in the process. This, combined with the ongoing environmental impact of computing-intensive AI systems, makes plenty to be skeptical about. But Ciaran believed in the value of archivists’ knowledge when it comes to making these systems make sense. Whether we want to shape their impact now or look back at it accurately decades from now, archivists are uniquely positioned to help.

Our intervention in the AI hype was one of grounded practicality, which I try to bring to all my research: identify the material traces of an object or process and highlight often-overlooked ways to analyze them. This can be tough in a field, like AI, which is simultaneously technically complicated and willfully obscured by many of its most prominent proponents.

Ciaran passed away in the spring, and it’s been sad, to say the least, working to finalize these publications without her. I owe her a debt of gratitude for bringing me onto the project and for acting as a mentor more broadly during my postdoc at UT Austin. She was extremely generous with her advice but always careful to let me chart my own course. It was an excellent balance that I hope to maintain in my own mentorship as I work with more and more grad students myself.

Vrije Universiteit Amsterdam, the host of 4S/EaSST 2024, under construction in 1970 (Public domain photograph by The Algemeen Nederlandsch Fotobureau)

The AI research I did with Ciaran also had downstream effects on my other research projects—namely, the work I started on environmental data curation last year. I didn’t realize it when I started, but that project has come to involve a bit of reverse-engineering on the predictive systems that major cities use to alert the public of adverse events like flash flooding and sewage overflow. I presented the second installment of that research at 4S/EaSST in Amsterdam this summer and I think we may finally see some of it in print this academic year. 

Building a framework and applying it to new cases is a recurring theme for me at the moment. In addition to the algorithmic systems work that Ciaran and I did, last year also saw publication of my enumerative-bibliography-informed approach to studying online disinformation movements. That study used early QAnon threads from 4chan as a case study, but this summer at SHARP, I presented a newer work that applied the same method to studying a much earlier online conspiracy movement—the Ong’s Hat urban legend/alternate reality game/conspiracy narrative.

More recently, Katie Greer and Stephanie Beene published an article with Frontiers in Communication that touches on this framework while analyzing the social media content generated by QAnon participants later on in the movement’s development. It’s wonderful to see that my methods and findings were of value to researchers who built on them to generate new findings in a new case study.

Over the next few months, I’ll be working to get the newest installments of these three projects out in journal articles, but I’ll also be on the lookout for what comes next. I first started all three of these research streams during my postdoc, which happened to coincide with the pandemic. Finishing my years-long dissertation work at the start of the pandemic, I had to start new projects during a time characterized by intense social upheaval, restricted access to traditional archives, and multiple cross-country moves. Now that I’m a bit more settled, and I’ve built some methodological frameworks that have proven applicable in a variety of settings, I’m eager to relate all this research back to the computer-history work that defined my dissertation. Stay tuned for more updates on how that goes.

James Hodges
On Writing Reviews

My review of McKenzie Wark’s book Raving recently went live on Popular Music, and it’s prompted me to reflect more deeply on the review as a discursive idiom than I ever did before. In fact, it even motivated me to start a music review blog, where I posted some thoughts on raving and electronic music that were outside the scope of my 1,000 word academic review.

I was motivated to review Wark’s book for two main reasons. First, it applies theoretical frameworks from media and gender studies to the contemporary New York City techno scene, which I am peripherally a part of, and I felt that my decade-plus experience with both media theory and NYC techno could make me a uniquely qualified reviewer. Second, I wanted to keep my review-writing experience fresh, because I’m the book reviews editor for a different academic journal, and I recently asked the students in my Archives & Manuscripts course to write book reviews of their own.

Needless to say, I wrote, edited, and graded a lot of reviews this semester. I believe in the review as an expressive form, despite its often-denigrated status. During my undergrad years I often used book reviews to save time on my coursework. I would read several book reviews before ever opening the book itself, then skim through quickly, paying close attention only to the sections that seemed most relevant according to the reviews. Was it the best way to read a book? Maybe not, but I got all my work done on time and avoided a lot of the anxiety that students get when facing down a mountain of reading.

Raving is so short that it hardly requires skimming, but I hope my review can still add value. In fact, for a really slim book, I think a review can help to build up a discursive network around the work itself in a productive way. Rather than saving a reader time, the review of a very brief work gives the reader more to engage with. I know that when I finish reading something that really piques my interest, I often go hunting for reviews that can add new context or insight.

For my students, I hope that writing a book review gets them to engage more closely with a work they’re really interested in, which I never could have selected on their behalf. In some cases, it might even help them get a publication out. If they can publish something they wrote in my class, all the better—what an effective use of time!

My review of Raving is up now at Popular Music, with additional thoughts on my new music blog.

James Hodges
Does Q Still Matter?

Recently I published a new research article analyzing the earliest discussion threads that gave rise to Qanon. The article shows that despite the clearly outlandish quality of most Q movement narratives, they’re not being pulled from thin air. Instead, they’re created (at least initially) by applying a specific and idiosyncratic interpretive lens to completely real and verifiable sources. I arrived at these conclusions by counting the number of sources linked to narrative claims in the movement’s first discussion thread, and then classifying them according to the specific narrative function they served. It’s a less technical methodology than I usually use, but it’s warranted because the movement’s engagement with outside sources has never been examined closely. By pointing out the Q movement’s engagement with authentic sources, I’m hoping to intervene against the condescension so rampant in mainstream discussion of both Q and conspiracy discourse more broadly.

At the most immediate level, I believe that my position within the disciplines of librarianship and archival studies dictates that I work towards meeting information seekers wherever they are, in an attempt to serve the pursuit of knowledge— even if some of them believe that Democrat politicians literally drink children’s blood. More broadly, I’m highly sympathetic towards people who question mainstream narratives. It’s widely accepted that the US government used false pretenses to justify its military engagements in Vietnam and Iraq, for example. Why should anyone trust the same “reliable” sources and mainstream media outlets that have proven so untrustworthy in the past? I don’t think we do ourselves any favors if we act condescendingly towards an information community that is motivated by somewhat justifiable skepticism towards mainstream narratives.

But that project is done now. And it seems like Q might be too. Disinformation and debates about the veracity of information, however, are as prominent as ever. Most recently, I noticed a nearly-Q-level explosion in claims, counterclaims, and attempted debunkings about who blew up the Al-Ahli Arab Hospital in Gaza City on Oct 17. Lots of independent researchers took it upon themselves to investigate the topic using social media and digital maps in the following weeks, producing convoluted evidence collages similar to those that were common in QAnon. I suspect some of these so-called “open source intelligence” efforts were probably disinformation efforts in their own right. My 2021 method for forensic image analysis might have helped wade through some of the noise, but evidence and narrative claims were changing so fast it was hardly worth trying, especially since my primary vector of written output takes the form of peer-reviewed articles that require months, if not years, of lead time.

Rather than trying to keep up with the pace of current developments in the conspiracy and disinformation spaces— a task literally designed to be maddening by its architects— I’m now trying to go back to the source. What are the specific information practices and design techniques that make a narrative spread effectively? Can we develop generalizable models and forms of literacy that help us to understand these phenomena as they unfold, without being baited into the exhausting analysis of deliberately shifting claims about minutia?

I’ve already started thinking historically about these questions. My Q research actually began with research on Alternate Reality Games, or “ARGs”. I was looking into the origin and mechanics of Ongs Hat, widely considered to be one of the first ARGs— perhaps the very first ARG, actually— which nevertheless blurs lines around the definition of “games” and “play” such that it is also one of the earliest born-digital conspiracy movements more broadly. I outlined my findings first on Twitter, and then in a conference talk. Revisiting the historical origins of online conspiracy movements also gets me deeper into issues around archival preservation of digital media content. In this case, I expect that the research will entail a lot of work with php-based discussion forums, which were highly formative in my early online life, and which I have always wanted to look at more closely.

I’m excited about the future prospects for this stream of research and happy to bring it more in line with my broader interests in preservation and forensics. I actually never intended to do disinformation research, but I needed a new project to work on during my postdoc back when most of my usual research sites were closed due to COVID-19. As it turns out, the themes of manipulation and control running through this research were also quite prominent in my dissertation work on the interwoven histories of 20th century psychedelic and computing research. More broadly, all this disinformation work is also part of the general digital archives and digital forensics research I do verifying sources and examining provenance in digital environments. Closely reading the digital materials used to construct a historical narrative tells us more about it than we could learn by simply operating at the level of their contents.

If any of this sounds interesting to you, please check out my new Information & Culture article! Readers with academic credentials can grab it via UT Press, and others can read a free preprint version via ScholarWorks.

James Hodges
A Statistical Reflection on the Academic Job Hunt

People can be kind of squeamish about their experiences with the academic job market. I mean, everyone agrees that it’s bad, but when I was looking for a job I didn’t find as many gory details as I would have liked. There are a decent amount of “why I’m leaving academia” blog posts floating around online, but not so many “how I stayed in academia” posts.

In a more perfect world, this would not be the case. Just like unionization starts with talking about salary, I think any realistic discussion about graduate education or the higher ed economy needs to start with a discussion about the numbers. So I’m going to share my stats. I hope they’re helpful to at least a handful of grad students facing the market, or prospective students deciding what to do with their future.

I’ll start with the rawest stats before I go any further: I was on the market for three academic years between fall 2019 and spring 2022. In that time, I applied to 60 jobs, then got 11 screening interviews, 8 final interviews, and 4 job offers. That means I got interviews from about 18% of my applications, and job offers from just shy of 7%. I’m currently employed as a tenure-track assistant professor at a large state university, where I recently started my second year. I spent two years before this in a research-focused postdoc at a flagship state university. Although it’s been anxiety-inducing at times, the overall experience has been pretty great. I hope it continues that way!

Now here’s where I’ll make some additional notes and start breaking down the data. I finished my PhD in the School of Communication and Information at Rutgers in 2020. I was an interdisciplinary student working in both Media Studies and Library/Information Science. These disciplines are institutionally linked at Rutgers, but this interdisciplinarity is the exception rather than the rule. I chose to go to Rutgers in part because i figured that straddling two disciplines would help me qualify for a wider swath of jobs. I think it worked but I see so few people sharing stats from their job searches that I’m not really sure.

Anyway, let’s get more granular with the data. In Fall 2019, I was still neck-deep in the dissertation process. Nevertheless, I sent out 26 applications, then got 4 screening interviews and 3 on-campus interviews (one of which had no prior screening interview). These were varied in nature: all told, I interviewed for 1 postdoc, 1 teaching professor gig, 1 tenure-track assistant professor job at an R1 university, and 1 tenure-track assistant professor job at a liberal arts college. In the end, I received no offers.

Spring 2020 was the beginning of the COVID-19 pandemic. I only sent out 5 applications, because the job market cratered. In the end I still got 3 first-round interviews, 2 follow-up interviews, 1 offer based on a single interview (Visiting Assistant Professor at a public university), and 1 offer for a postdoc (which I took, at UT Austin). Not too bad considering the circumstances!

So for those keeping score at home, my year one total was 31 applications, 7 first-round interviews, 5 follow-up interviews, and 2 job offers (1 postdoc, 1 Visiting Assistant Professor). One search got suspended due to the pandemic. That’s just shy of a 23% success rate at getting to the first interview, and a 6.25% rate of getting the final offer overall. Another way of looking at it is that 28.5% of my first-round interviews eventually led to offers.

Year two was pretty bleak in terms of pandemic-related job market disruptions. There just weren’t any jobs to apply for. In Fall 2020 I sent out a grand total of two applications. I got zero interviews. Spring 2021 was not much better: 6 applications, 1 first-round interview. So the year two totals were 8 apps, 1 interview, and 0 offers. I don’t feel like I need to calculate any further on the stats for that run.

Things rebounded a bit in Year Three, the 2021-2022 school year. I think some of the “Great Resignation” finally started to hit by this time. In the fall I sent out 20 applications, then got 3 first-round interviews, 2 follow-up interviews, and 2 offers (one was my current job at SJSU, another was a different large public university). In the spring, I applied to one more job while waiting to hear from the others, but I didn’t have any interviews. So the 21-22 stats were 21 total apps, 2 screening interviews, and 2 offers. This means I had about a 14% success rate landing the initial interview, and a whopping 67% success rate at getting offers once I had my interview.

Typing this all out right now, it’s kind of interesting to see how my stats changed as time went on. My success landing a first-round interview actually declined (from 23% to 14%), but my success rate at landing an offer after I interviewed went way up (more than double, actually, from 28.5% to 67%).

To take a more zoomed-out view, during three years on the market I sent out 60 apps, then got 11 screening interviews, 7 final interviews, and 4 offers. That’s about 6.6% success from initial application to final offer and a 36% success rate moving from first interview to final offer.

Is this a good outcome, and average outcome, or a bad one? It’s hard to say because not many other scholars share this kind of information. Maybe I’m foolish for doing so and I’ll delete it down the road. In the meantime, however, please drop me a line if you find this information helpful in any way. I realize my experience will be pretty much impossible to replicate, since it was thrown into total chaos by the pandemic, but I also know that when I was facing down the academic job market I would have been eager to read this kind of reflection from someone who had faced it before me. And by the way: if you notice any errors in my math, just blame it on the fact that I did all the calculations while I was waiting for my plane to Charlottesville for Rare Book School last month! I’m actually great at arithmetic, I promise.

James Hodges
Folding Room Divider report (22-23 Teaching Recap)

In my last post I reviewed some recent updates from my research, but of course my job also includes another major component: teaching. I think it’s worth reviewing my experiences here for two main reasons. First, I use writing as a memory aid for things I’d like to remind myself later on. Second, I hope that other newly-hired or aspiring tenure-track faculty will be able to learn from my experiences, as well as any grad students trying to build their teaching practice.

But first, some disclaimers: I do all of my teaching 100% online, and largely asynchronous. My academic department does everything remotely, and has done so since 2009. This means that any observations or tips I share here will be indelibly skewed by this somewhat atypical arrangement. It also means, on a more philosophical level, that my entire experience of the world is now influenced by the fact that I spend all my working hours on the internet. But that’s a different conversation.

I started this job in August 2022, when I was living in Austin, Texas. I could have stayed in Austin if I wanted to, but I don’t have any roots out there and I don’t love the existential climate angst I feel when blasting A/C to cope with 110ºF heat, so we elected to move back to New York, which is where we started before my postdoc.

Making a major move right before you start teaching is rough. There’s no way around it. If you record videos or take Zoom meetings (which I assume most faculty do nowadays, even if your job is mostly IRL), one major problem presented by moving is the lack of a ‘professional’ background. This is doubly true if you live in a small space, like most NYC apartments, or most grad student dwellings. For the first semester, I tried to cultivate some semblance of an authentic professional-looking backdrop. In practice, this meant moving clutter out of frame each time I turned the webcam on. I also used a decent amount of virtual backgrounds on Zoom, but they often glitch around the edges if you move suddenly and reveal whatever’s actually behind you. They also don’t really work outside of Zoom (i.e. recording course tutorial videos). For the second semester, I bought a folding room divider, and it made my life much easier. Just put it up and go. I should have done it sooner.

As for the actual content of my courses, I was fortunate to begin my teaching with two sections of the same well-developed core curriculum course. When asked what I wanted to teach last year, this was my first request: please give me something that will run every semester and that has been developed by existing faculty. The course was slightly outside my usual teaching area, but still well within the realm of material I learned during grad school. Any rustiness in my subject area expertise was completely eclipsed by the benefits I secured by teaching a course that was already thoroughly road-tested. I also got to join a group of other faculty members teaching the course concurrently—built-in mentorship. This greatly reduced the challenge of finding my feet at a new institution.

Another thing that was new to me was the California State University system’s approach to teaching. During our training last summer, leadership emphasized flexibility and compassion. We had workshops about meeting the needs of working students, first-generation students, immigrant students, students in crisis, etc. The end result, for me, was a new outlook less concerned with competitive positioning in relation to other schools than the institutional cultures I’ve experienced elsewhere. Don’t get me wrong, I see the value in competitive drive for certain purposes. I’m a product of fifteen years at those competitive research universities, after all, and I think I got a pretty good education. But they’re not for everyone, and I am finding that my students reflect my attitudes back at me. When I’m flexible with them, they’re flexible with me. My kindness in evaluations of their coursework has, thus far, thankfully, been reflected back in the kindness they show in my course evaluations.

As I get ready for the 23-24 academic year, I’ll be trying to put these principles into action with another course. This time, I’m teaching a syllabus of my own design. I’m glad I learned to color within the lines, so to speak, by picking a well-established course last year. I’m also glad that I don’t have to worry about a Zoom glitch revealing my messy apartment during a lecture. Check back in a few months to see if how it all works out.  

James Hodges
2022-23 Research Roundup (Dolphins, Pollution, Botnets, etc)

MUCEM (Museum of European and Mediterranean Civilisations), site of the 2023 RESAW conference in Marseille.

As I put the finishing touches on my Fall 2023 syllabi, it finally feels safe to say that my first year as Assistant Professor at San José State is over. That means now is the time to put together a recap, before I forget all the details. This post will recap some recent research developments, and my next post will focus on teaching.

Just a few weeks ago I attended my first international conference since 2018: European Research infrastructure for the Study of Archived Web Materials (RESAW) in Marseille, France. I presented a paper using forensic analysis of physician/dolphin scientist/float tank inventor/psychonaut John C. Lilly’s early website to work up a sort of social graph for the counterculture/computing nexus circa 1999. Although I began this work a while ago, it feels different presenting it now, since I work for a Silicon Valley institution. This work should be appearing in an edited collection eventually, but these things happen slowly.

In other counterculture/computing news from the 22/23 academic year, I used my startup funds to hire an excellent research assistant, who has helped speed the Psychedelic Software manuscript along.

Balcony at MUCEM, Marseille FR

I’ve been impressed with my students at SJSU thus far, and I’ve already started a new collaborative project with one of them; we’re looking at the data curation practices of coastal water quality testing initiatives, with an eye toward proposing potential improvements. Our first conference paper in this stream has already been accepted as part of a panel at Society for the Social Study of Science in Honolulu this fall, which seems like a very appropriate place to discuss the health of coastal waterways.

This new ocean research compliments my work on dolphin-lover John C. Lilly as well, which makes me want to think more deeply about connections between the internet and the ocean. It’s a theme that I first started investigating with an essay on Ecco the Dolphin in 2017, and I feel fortunate that I’ve had the intellectual space to slowly develop this sub-oeuvre within my broader research.

In other research news, my collaborator Mitch Chaiet recently used the BRISQUEt image compression analysis software that we worked on in 2021 to identify a Modovan botnet. It’s always nice to see the tools and methods you worked on staying relevant!

View from the conference venue, RESAW 2023

Taken together, these developments show a continued shift in my research as I become more interested in social engagement. When I started the Psychedelic Software research for my dissertation way back in grad school, it was a way for me to grieve my recently deceased father. He was part of the original counterculture-to-computing pipeline in the 70s and 80s, and working on that topic made me feel a bit like I was still in conversation with him. His old office in New Brunswick, NJ was just a few miles from my own office at Rutgers. The computing history was also personal history.

Now, several years have passed and I still find that material as interesting as ever, but the state of our society and our planet seem to have deteriorated significantly. To answer these negative developments, I’m now applying the same digital forensics frameworks, but I’m pointing them at new problems, all within the broader context of capturing ephemeral digital culture for future analysis. There are a few areas where they intersect— like coastal water quality data curation. I look forward to seeing where this research takes me next.

James Hodges
Preserving and Analyzing Digital Texts: Symposium Recap (with Video)

I recently hosted an online symposium with support from the Andrew W. Mellon Society of Fellows in Critical Bibliography, in which we shared research concerning the preservation and analysis of digital texts.

Speakers included RBS-SoFCB Senior Fellow Ryan Cordell (University of Illinois at Urbana-Champaign), Emily Maemura (University of Illinois Urbana-Champaign), and Benjamin Lee (University of Washington).

Maemura began the session with a project examining preservation standards for web archives, highlighting divergence between researchers’ practices and desired materials, on one hand, and the WARC standard for web archives, on the other.

Cordell followed with a discussion of Large Language Models, the textual building blocks of contemporary Artificial Intelligence systems. He also highlighted the current lack of transparency in terms of textual sourcing for AI training datasets and discussed the implications for historical research.

Lee presented third, reviewing his recent work creating searchable databases of imagery from historical newspapers. In the talk, Lee showed off new tools that he has recently developed in collaboration with Library of Congress, which can be used by researchers today.

Taken together, these three presentations did an excellent job of representing current research in the area of digital text technologies. They also signal a way to apply bibliographic approaches in digital settings.

More than 80 participants logged in to attend the session. This level of attendance would likely not have been possible with an in-person format, and I was very pleased to bring together such a global audience for the speakers’ research.

James Hodges
PhD Writing Seminar Recap: Maintaining A Scholarly voice

Recently I was invited back to my PhD alma mater to speak with a group of graduate students in an academic writing workshop. This blog post is a condensed summary and reflection on the content I covered with those students.

My appearance in the class was based on the following question, posed on behalf of PhD students who are anxious to get a few early publications under their belts:

“How do you maintain your voice when incorporating feedback from peer reviewers?”

Upon giving the question some thought, I decided I would like to sidestep the question slightly by differentiating between one’s voice and one’s ideas.

Your voice is yours and yours alone. You can’t possibly speak with anything but your voice. This is extremely true for literal speech, with the timbre of your voice determined by anatomical features and life history, but it’s somewhat true for written words as well. Research in computational stylometry shows that there are elements of our prose, like the frequency of word usage and prominence of certain two- or three-word clusters, that are indicative of a personal style, and rather difficult to fake. While many peer reviewers do make comments about our tone, word choice, and the like, they’re not usually talking about the fundamental distribution of words that defines our unique textual fingerprints.

Ideas, on the other hand, are a bit more malleable. At some point in our lives many of us have probably taken a devil’s-advocate position on an issue that we don’t actually believe, just for the sake of discussion (or trolling). To me, the things to be most carefully attended to during the peer review process are not one’s voice, but rather one’s ideas.

Our ideas are precious, but they should not be completely exempt from reconsideration. If you care enough about an idea to submit it for peer review, one could assume that you care about it enough to reconsider it when faced with critique. That’s part of the whole academic enterprise, and if you didn’t want to participate in an open dialogue of ideas, you would probably just churn out polemics for an echo chamber audience on SubStack.

There are times when a reviewer proposes meaningful critiques to your ideas, which have the potential to improve your work. These should be embraced. For example, when I got peer review feedback on early drafts of my forthcoming article on the information practices of conspiracy movement participants like QAnon (more info here), the reviewers encouraged me to make a clearer moral critique of some participants’ extreme behaviors, including such crimes as kidnapping and murder. I knew on a personal level that these actions were well-documented and totally abhorrent, but in the interest of maintaining an impartial academic voice, I hadn’t actually included much discussion about them. A good peer reviewer can prompt us to reconsider our work in productive ways. In this case, setting aside my detached perspective and addressing really ugly subject matter.

On the other side of the coin, there are times when a reviewer poses a challenge to your ideas that doesn’t easily fold into the project you had in mind. If I get feedback like this, I assume that the journal simply isn’t a good fit. If my ideas are worth pursuing, I generally find that someone will be receptive. My recent Journal of Documentation article on preserving algorithms (cowritten with Ciaran Trace) is an example of this. It was initially submitted to a different journal, but the reviewers there didn’t find the ideas to be relevant to their editorial vision. So we moved along and found a more fitting venue.

What about when the reviewer does seem to be criticizing your voice? Sometimes these critiques are just petty, and they are used to cover up the reviewer’s real agenda, which is gatekeeping publication in a particular niche. But it’s tough to know unless you at least try to take their criticism seriously. In cases like this, I would recommend trying to have a colleague read your manuscript. Ideally, this would be a colleague whose writing you admire. Hopefully, they can be honest with you. Is the style getting in the way of the ideas? Are there prose-writing practices that could improve readability? Are the ideas coming through? Sometimes, a few mechanical tweaks can improve readability immensely. Other times, what seems to be a problem with your voice is actually a problem with your ideas. Muddled prose often masks muddled ideas.

As academics, our ideas are our main form of currency. We are not, on the whole, renowned for our gorgeous prose. This means that every sentence should serve a purpose. In our case as academic writers, our writing should prove a point. By focusing on the ideas, and reminding myself that some element of my “voice” will always shine through regardless, I find the review process becomes a lot less intimidating.

James Hodges
Computational vs. Social Preservation: What do Algorithms Require?

Note: This essay was originally posted on the SJSU iSchool’s Center for Information Research and Innovation (CIRI) blog. You can find the original post here.

The digital technologies that we use every day are controlled by increasingly complex algorithmic systems. These run the gamut from the very banal (say, Netflix recommendations), to the very consequential (say, sentencing recommendations in the criminal justice system). As these systems become more widespread and impactful, there are more and more reasons that we may want to preserve them and refer to them later. Perhaps you’re a software developer and you want to look back at how your product has changed over the years. Maybe you’re a politician or a lawyer and you would like to prove that an algorithmic system treated somebody unfairly. In these kinds of cases, it becomes imperative that algorithmic systems are preserved in a way that opens them up to future analysis.

Last summer, I began working with my colleague Ciaran Trace (University of Texas at Austin) to review the existing projects dedicated to preserving algorithmic systems. We both knew that several such projects already existed, but we were also struck by the fact that they didn’t seem to share any sort of unified approach. This began a research project that has produced a series of publications aimed at helping to cut through the current mass of disparate initiatives. The first publication from this project, “Preserving algorithmic systems: a synthesis of overlapping approaches, materialities and contexts,” was recently published in Journal of Documentation.

In our new article, we review existing preservation projects to see how they vary and how they’re the same. One of our main findings is that existing algorithm preservation projects can be placed into two main categories: those that view algorithmic systems primarily in terms of their computational functionality, and those that view such systems instead primarily through the lens of their social impacts. This difference in emphasis can have major impacts in the kinds of artifacts that an initiative actually collects.

On the computational side, preservationists often collect technical artifacts like source code and compiled binary code—but of course, these objects are only meaningful to people with the technical literacies to understand them. On the social side of things, many preservation initiatives focus on artifacts like oral histories and user-generated content. Unfortunately, these objects can only begin to explain the technical functionality of an algorithm, and often are unable to do so with any degree of specificity. In order to try and bridge the gap between these two approaches, we provide a large selection of examples from projects at many locations on the spectrum between being totally technical on one end, and totally social on the other. Ideally, any algorithm preservation project will include both the complex technical objects related to specific computational functionality as well as the more easy-to-understand objects that help to explain an algorithm’s place in the world.

Our hope is that by centralizing a selection of examples in this recent article, we can reduce the amount of legwork that preservationists have to perform in the future when they’re looking for a model to follow for their own projects. If you’d like to learn more about these topics, please read the full article in Journal of Documentation, or consider taking my INFO 256 – Archives and Manuscripts course, where I will discuss these topics in much greater depth.

James Hodges
Teaching: Part of A Balanced Intellectual Diet

Fall 2022 wasn’t just the beginning of my new job at SJSU. It was also my return to teaching after two years doing research full-time during my postdoc at UT Austin. I often joked when I was at UT that doing research full time felt a bit like eating an entire meal of desserts. It’s fun, it’s a privilege, and it’s tasty, but it’s something most people don’t keep up forever. To extend that comparison a bit further, teaching feels a bit like eating one’s vegetables. It may not offer quite the same sweet thrill, but I truly believe that teaching is good for us, intellectually speaking.

Why is teaching good for us? For one thing, it gives us a direct line of contact to students working in the field. They have jobs and internships outside of our home departments and institutions. Since they’re ostensibly seeking training in Library and Information Science, this also means they’re closer to the job market. All this means that students are directly exposed to changing trends in the tools, techniques, and economics of information technology; this is knowledge that they bring into the classroom and share with us every day.

During my postdoc, I started a handful of new research projects that were responsive to emergent information issues like mis- and disinformation, AI ethics, and support for overburdened healthcare institutions. I’m still using the forensic approaches to studying digital materials that I honed during grad school, and I’m still working with archival collections of digital objects, but my focus has expanded beyond the twentieth-century history of computing context that defined my grad school research agenda. It felt like a bit of a pivot at the time, but I was committed to using my position and my expertise to face some of the daunting technological challenges that rose up like tsunamis in the past five years.

Now, teaching students who work (or aspire to work) in fields like information ethics and health informatics, my new socially engaged projects are paying back dividends, because they put me closer to the interests and needs of many students in the LIS field. From a more personal standpoint, they’ve also created a path for synergistically matching my research with the interests of my students. This means that my research benefits from the experiences that my students bring to the classroom when they talk about their experiences on the job.

This academic year, I’m teaching SJSU’s required “Information Retrieval Systems Design” class, which represents another productive expansion on my previous work. The course covers topics like database design, data structures, and controlled vocabularies, and it’s structured around a series of projects that generate deliverables like a working searchable database. I learned information retrieval in grad school, but it hasn’t necessarily been a focus in my research. Yet I’ve spent countless hours in extracurricular training sessions keeping technical skills like these up-to-date over the years, and it feels great to work some of that into my teaching. It keeps my skills sharp, even if they aren’t currently finding their way into my research, which means a lot less re-learning when I do call on those skills (like when I was invited to write a chapter on working with web crawlers over the summer, for example).

Another nice thing about my current teaching work is that it has room for customization. So while most of the course covers well-defined topics, there was also room for me to add a unit on archives (my specialty). Framing archives in terms of information retrieval systems is intellectually productive, even for students who don’t plan to work in the archives space, because it gives us a chance to think outside the computing box. In other words, it offers an entry point to thinking about information systems beyond the digital, including card catalogs, finding aids, museums, and more. This all feeds back to my enduring core research interest (and book manuscript) concerning 20th century computing, which was characterized by the rapid iteration of new conceptual models, often pulled from surprising places like new age psychology, for example, which is something I covered extensively in my research to date and plan to continue investigating for a while to come.

Much like the elements of a balanced diet, teaching and research work together to create a wholistic intellectual practice. My research informs my teaching, and my teaching informs my research. Students bring new dishes to the table every semester, so to speak, and it’s always exciting to see what we can cook up with the ingredients at hand.

James Hodges
Hello World

Welcome to my new blog. I just finished my first semester as a tenure-track Assistant Professor at the San José State University School of Information, and having a new job feels like an appropriate occasion to begin experimenting with a new form of scholarly communication. I played around with Substack last year, but all the ownership shakeups at Twitter recently have driven home the value of hosting my own content. I’ll still post on Twitter and Substack, at least for the near-term future, but I’ve never been one to put all my eggs in one basket.

The aforementioned Substack went pretty dormant following my offer from San José State last spring, because I became extremely busy wrapping up projects at UT Austin, as well as planning for my imminent cross-country move (the third major move in three years, as a matter of fact). Rest assured, dear reader: I’ve been plenty busy in the past 6 months, even if my Substack wasn’t getting updates. I’d like to write a post sharing what I learned during my job application, interview, and relocation processes eventually, because I think it could help other recent grads and postdocs. Stay tuned for that.

In the research department, I’m still finishing up a variety of projects that started while I was in my postdoc at UT. After finishing my dissertation at the height of early COVID, it seemed important that I use my post-Ph.D. years to engage with emerging needs in areas that were hit hard by recent political upheavals, including healthcare informatics, algorithmic fairness, and mis- and disinformation. Digital preservation and computer history are still my main concerns, but I’ve found ways to channel my knowledge about those topics into places where they can help meet present and emerging challenges.

In the algorithmic fairness area, UT Professor Ciaran Trace and I recently received a faculty fellowship from the Good Systems initiative that supported our development of two journal articles and a book chapter, all under review at the moment. I’ll share more details once the review dust has settled—I don’t want to say too much just yet.

I also have a paper that will be out in Information & Culture early next semester, which uses web archives to examine the information practices of early participants in the QAnon conspiracy movement. I’m prepared for this paper to potentially cause a small amount of controversy, because it offers conclusions that are unpopular among many in the disinformation research space: I show that conspiracy theories like QAnon, while outlandish and untrue in many regards, are still constructed via interaction with authentic sources. In other words, these conspiracies aren’t just invented from whole cloth. Instead, they emerge from idiosyncratic, if sometimes willfully false, interpretations of legitimate information. No full paper yet, but I covered a bit more about this project when I first announced it on Substack last November. Stay tuned.

Another few interesting research updates come in the form of invited talks and chapters. My digital forensics research got me invited to contribute to a methods volume at SAGE last spring, and I now have a chapter on using web crawlers to create a research corpus available in their Doing Research Online series. It felt nice to write something so clearly practical for a change, and I hope that some advanced undergraduates and early-stage grad students find it helpful. It might be useful for more advanced scholars who simply want to learn a new tool as well.

I was also invited last spring to speak at the Reconsidering John C. Lilly symposium, which allowed me to revisit some of the web archive analysis I did during my dissertation. What I did there was essentially peel apart the HTML code and ICANN records from a cluster of websites made by Lilly and his contemporaries in the 1990s psychedelic counterculture, revealing some of the contours that connect psychedelia and hippie culture with big digital technology businesses. That project is in the process of becoming a book chapter, and I’ll certainly have more to say about it soon.

Speaking of the future, I’m thrilled to be a part of the San José State community, because it offers me a deeper foothold in the Bay Area and Silicon Valley from which I can launch future research projects. I’m eager to build deeper connections with people at the local institutions that have helped me write my dissertation and test out ideas over the past several years, ranging from the Computer History Museum to the archives at Stanford. I work remotely from New York City most of the time, but I’ve already made a few research trips to the Bay and I’m always planning for more. If you’re in the area and you’d like to chat about possible synergies, please get in touch! Same goes for those of you reading from the New York metro area. Since I generally work remotely, I’m eager to build a bit of extra-institutional collegiality in my home city. And of course, working remotely also reminds me every day that I’m a citizen of the Internet. If you need help getting access to any of the articles I linked, please don’t hesitate to reach out.

James Hodges