Technical History

The PGP consists of four interlinked types of data:

  • Metadata records (including shelfmarks and descriptions)
  • Digitized, searchable transcriptions of geniza texts
  • Scans of index cards, transcriptions and translations from Goitein’s unpublished research materials, as well as some scans of published translations
  • Images of fragments, or links to images. 

Keeping the metadata accurate and well-structured is a labor-intensive team effort. Our team-members also produce new descriptions and transcriptions of material, as do Princeton undergraduates and graduate students in classes with Eve Krakowski and Marina Rustow.

In 2020, at the behest of the Center for Digital Humanities, the PGL started deploying project managers to track the moving parts of the enterprise. The PMs have been transformative not just for organizing our work, but also for creating a sense of camaraderie.

The team communicates via Slack and tracks tasks in Asana. Some of the team members meet at least twice weekly. Written agendas help us prepare for meetings and stay focused while they're happening, and we use icebreaker questions to help keep the fun quotient high. One of the animating spirits behind the way we now work was Rebecca Munson (1984–2021), Assistant Director for Interdisciplinary Education at the CDH.

The PGP is a digital humanities project with an unusually long history. The original PGP browser (PGP 1.0) was developed by Peter Batke at Princeton in the late 1990s, replacing a local terminal and floppy disks. PGP 2.0 was based on the TextGarden web application developed in 2005 by Rafael Alvarado, at the time Manager of Humanities Computing Research Applications at Princeton. The current browser (PGP 3.0 and updates) was developed and is managed by Ben Johnston, Educational Technology Consultant at Princeton’s McGraw Center for Teaching and Learning. The transcriptions are lightly encoded in TEI and stored in Bitbucket, and the texts and browser website are managed in Drupal. Changes to the transcriptions and metadata are pushed out to the PGP site every 24 hours.

Between 2016 and 2021, the PGP team used Google Sheets to work with the metadata (descriptions, tags, and the like). In mid-2021, we moved our metadata to a bespoke database that we spent a year developing as part of a research partnership with the CDH in which Rebecca Sutton Koeser and Marina Rustow serve as PIs. Designing the database helped the PGP team reshape our entire back-end workflow, and forced us to turn our attention to nooks and crannies of our data that we'd ignored or hoped would improve by themselves. We are currently working with the CDH on designing and developing a bespoke front-end and a new system and workflow for managing and displaying transcriptions.