Can the Digital Humanities Make Us Better Humanists? A Case Study in Papyrology

There is a humorous scene in the 2006 film The Devil Wears Prada in which the fashion magazine executive Miranda Priestly (Meryl Streep) verbally eviscerates her new administrative assistant, a recent college graduate in journalism named Andy (Anne Hathaway). [1] What prompts this takedown is Andy’s cynical laugh when a staff member suggests that what look to Andy (and probably to most of us) like two very similar blue belts are in fact two completely different blue belts. Her laughter causes Miranda to lash out and critique Andy’s own wardrobe, in particular the blue sweater she is wearing. Miranda points out that the sweater is not actually blue, but cerulean, and she details events in the fashion industry that saw the color cerulean find its way into the clothing lines of large, mainstream retailers such as the one Andy likely bought her sweater from. Miranda’s point is that Andy mocks the very industry she ultimately and unwittingly depends on for her personal attire. The exchange illustrates well the common disconnect between innovation and consumption, a theme that I return to later in this article.
What I have to say here stems from recent reflections on the purpose of the Digital Humanities for the broader humanities community, with my examples drawn from a very specific and relatively narrow discipline, papyrology. The central question I pose is not, “What is the point of Digital Humanities for papyrology?” but rather, “What have papyrologists gained in scientific terms from Digital Humanities and what kind of relationship can they have with the Digital Humanities in the future?” This kind of question is one people are increasingly asking in some form about their own disciplines. [2] Whatever the reasons others have for reflecting on the subject, my thoughts have been prompted mainly by a sense of stagnation, or the nagging suspicion that we, the scholars-cum-consumers, to return to the opening fashion metaphor, are not thinking enough about what we are wearing and why we “bought” it in the first place.

What have our tools done for us?

Papyrologists have some very good tools. In fact, our digital instruments have transformed how we conduct both scholarship and teaching, even how we conceive of our discipline. Our field could not survive without them. At the same time, over the past several decades our relationship to them has changed in a crucial way, so it is not always clear what role they play, whether a secondary or primary one. Are they simply a convenient support of traditional scholarly method, a useful Hilfsmittel, cared for in distant research centers or do they drive and define new research and require our collective participation? This emerging and sometimes awkward uncertainty has surfaced a need, in my view, for reappraisal of how we engage with digital tools in the future.
In considering the question of how our tools have benefited us, I have tried to articulate specific ways in which they have improved our discipline, not only by making some tasks easier, such as accessing primary source material and secondary scholarship, but also by serving the more creative process of advancing research and answering scholarly questions. During this thought process I was assisted by a lecture given by John Unsworth at a symposium on humanities computing in London over twenty years ago. [3] In it, he enumerated seven so-called “primitives,” on analogy with the Aristotelian concept of first principles (archai): “discovering,” “annotating,” “comparing,” “referring,” “sampling,” “illustrating,” and “representing.” They constituted, in his words, “basic functions common to scholarly activity across disciplines, over time, and independent of theoretical orientation.” Unworth’s overall purpose in enumerating them was to imagine a common architecture in which these functions could be operative across networked data.
I do not mean to dwell too much on the extent to which each of the primitives is involved in our work. Instead, I will look at how some of them, especially discovery and comparison, factor into papyrological method and are supported by our tools. In addition to these functions, I will adduce two further concepts relevant to this discussion. The first is that of accountability, which is at the heart of scholarly research and a necessary prerequisite for good exegesis. The other is that of the humanities as a forum not only for critical thought but also for specialized data curation.
Let me begin by surveying a selection of digital tools in light of Unsworth’s basic functions. I have divided these tools into three types: text-centric, metadata-centric, and image-centric. They are not equally sophisticated, and none is perfect. Furthermore, my sample is not meant to be exhaustive; on the contrary, for the sake of brevity it omits a number of very good initiatives. [4]

Text-Centric Tools

I will start with the text-centric tools. The most important papyrological text resource is the Duke Databank of Documentary Papyri (DDbDP), which began in the 1980s and currently contains over 65,000 transcriptions of mainly Greek but also Latin, Coptic, and a few Arabic texts. [5] It is the Thesaurus Linguae Graecae (TLG) of published papyrological documents, and by “documents” I mean everyday texts, such as petitions, contracts, receipts, private letters, etc., anything but literary and so-called subliterary texts. [6] The DDbDP is a relatively unfiltered corpus with much to offer not only the traditional papyrologist interested in Greco-Roman history (be it political, military, legal, religious, social, etc.), but also Greek philologists and linguists who care about the development of the language outside literary sources preserved in medieval manuscripts. Considered in Unsworth’s terms, the Duke Databank enables more than anything the functions of discovery and comparison, quite often discovery through comparison. Any given search across the databank has the potential to unite the trajectories of two acts: the original entry of the data and the quest of the researcher. The result may be the discovery of information not previously known to the user, but known to others, or the acquisition of new information. Acquisition of new information can take the form of the discovery of new evidence (a hitherto unrecognized fragment of a Roman will, for example), or the establishment of parallels and associations that enable the unraveling of a previously unsolved textual problem, or even the physical joining of two pieces of the same text (not infrequently a philological act confirmed by visual comparison of photos) (Figure 1).

Figure 1. The left part of this papyrus is housed at Columbia University in New York; the right part is in Milan.
The discovery of previously unseen connections or patterns is part of the slow process by which papyrologists gain control of their data. The examples I have referred to may sound trivial, but they are what advance our scholarship, and the process has been facilitated again and again by the Duke Databank. As a result, we have gained a more refined understanding of the textual source material, which serves in turn as the basic foundation of later interpretive exercises, in, for example, legal, historical, social, and economic studies.

The importance of the refining benefit of the Databank cannot be overstated, and I wish to linger on it for a moment. The renowned papyrologist Herbert Youtie wrote in 1963 that the papyrologist “knows that if he could guarantee the perfection of his transcriptions, he could hope to be forgiven even the total omission of all the rest,” meaning the commentary, general summary, etc. [7] What is implicit in Youtie’s statement is the fact that in a majority of cases, the papyrologist, no matter how good he or she is, cannot totally ensure the perfection of his or her transcription. Papyrus documents are fragmentary and lacunose, and the script can be highly cursive and therefore difficult to decipher. Things like orthography and syntax are usually below classical standards, sometimes far below, so it can be hard to understand what a text means. Throw into the mix the absence of context, poor word choice, and the occasional hapax legomenon, and one can appreciate the difficulty that goes into deciphering a papyrus document. Achieving full comprehension of any given witness can be a continual process involving more than one scholar over a long period of time. And many texts are never fully understood (Figure 2).

Figure 2. An enigmatic private letter apparently referring to a practice of exposing newborn girls; extensive bibliography available at;4;744.
The instinct to ensure the precision of the transcription has been an important part of the papyrologist’s classical, text-critical heritage, and the Duke Databank of Documentary Papyri has been the perfect enabler of this instinct. Easy access to parallels, which have always been a crucial tool in the classicist’s kit, could unlock a compelling reading in a lacunose witness. More than all other resources, the Databank has led to an explosion of editorial emendations to Greek documents. It has made it much easier to solve textual problems, many of which were first recognized as problems only against the background of the assembled parallels—a meeting of the two trajectories I mentioned earlier. Dieter Hagedorn, who championed the genre of emending Greek documents via his critical Bemerkungen zu Urkunden, has shown over the years how careful analysis of patterns observed in large data sets, first via the original Packard Humanities Institute CD-ROM, then with help of the Perseus instance of the Databank, and now through, can elucidate ambiguous readings and obscure concepts in individual Greek papyri. [8] These tools helped Hagedorn assure to the extent possible the precision of the transcription, to restate Youtie. And as precise as they are, Hagedorn’s Bemerkungen are never tedious or pedantic. His type of scholarly method can, however, in the hands of novices make for banal and downright wrong observations about ancient documents, if, for example, beginners look only for parallel expressions without trying really to understand the documents at hand.
In the 1990s, the DDbDP stopped accruing new transcriptions. The reasons for this were both technical and financial, and the result was that the pool of sources available for the basic functions of comparison, discovery, etc. grew static. A gap emerged between the number of Greek texts available in print and those in electronic format, and papyrologists became anxious. Editors who had come to rely on the Databank for easy access to parallels found their entire modus operandi in jeopardy. Among other things, the tool had transformed the discipline by allowing people to ask questions of the texts that could not be asked before, due to the time and labor needed to gather the data. [9] When the Databank stalled, papyrologists risked regressing to an earlier state. Imagine if Google no longer existed—the existential threat to papyrology was similar to this admittedly bigger threat. The Duke Databank had come to serve some basic need.

Metadata-Centric Tools

Metadata-centric tools developed along a parallel track to the Duke Databank. The Heidelberger Gesamtverzeichnis (HGV), which was started in the late 80s, and the more recent Trismegistos (TM) Texts, are two important examples of this genre. [10] They might be imagined as the curricula vitae of papyrological manuscripts (Figure 3).

Figure 3. HGV and TM metadata concerning the famous “Letter of Claudius,”;6;1912.
These tools perform the basic function of referring, by pointing to principal scholarship about a given object and to related artifacts. If documents were people, the two repositories would perhaps be a cross between the local public records office and the National Security Agency (NSA). They identify the objects (you might call the TM id a textual artifact’s Social Security Number). They say when and where the documents were born, where they resided, and which scholars had contact with them. Because this information is often in flux, it is in constant need of curation, just as the transcriptions of the texts are. Recent discussions of the importance of humanities disciplines have emphasized their role in the curatorial process. [11] Whereas, in the past, humanities researchers were thought of as critics who explained their subjects (often in dogmatic fashion), they are increasingly seen as mediating agents who curate their subject matter and put people in touch with it. HGV and Trismegistos play an important part in curating metadata associated with papyrological artifacts, just as the DDdDP does for texts. In this way they too facilitate discovery, comparison, and other basic functions that give birth to higher order insights.

Image-Centric Tools

The third type of tool that has proved essential to our discipline is image-centric, and it has been the least exploited. Universities and museums have over the past two decades published thousands of photos online via a number of collection-based projects. Besides allowing papyrologists to verify readings or confirm physical connections between texts, which used to require significant time and cost, these projects have opened up new avenues for performing the same functions of discovery and comparison that text initiatives such as the Duke Databank have fostered. They have permitted researchers to start treating images like texts that can be mined and sorted. Sorting and comparing has been the goal of PapPal, for example, which gathers paleographical samples of dated documents (Figure 4). [12

Figure 4. Screenshot of documentary hands from the years 300–301 CE.
One can imagine a future in which papyrologists start annotating images as well, thereby introducing more social-based engagement with the scripts. But there is even more potential here. Handwriting (like genealogies) is a gateway to identity. A hand has unique features, similar to a fingerprint, even if it is more susceptible to false interpretation. This “uniqueness” is more obvious with cursive scripts than with bookhands, although some cursive hands can also be confused. Image banks are just waiting for processes that will allow us, with a fair degree of certainty in many cases and on a much larger scale than by traditional methods, to say that certain groups of texts or certain passages within multi-authored works were penned by a single individual whose name in some instances will be known to us. This has the potential for important insights into areas of social and cultural history, administrative procedure, and the production of writing. It might also help establish the origin of previously unprovenanced texts. Perhaps more importantly, however, the creation of central image repositories will serve a long-term preservation need, which is especially urgent when the actual artifacts are located in museums, libraries, and storage facilities that are hard to gain access to, or when there is a chance that items might end up in private hands and no longer be available to the public. Thus, in addition to the primary functions they serve, our initiatives contribute in significant ways to the sustainability of our discipline by ensuring access to evidence for future generations.

From Scholarly Aid to Curatorial Environment

Having spoken about some of the primary functions that papyrology supports via its various digital resources, I wish to turn my attention now to the shift that occurred recently in how we engage with these tools. [13]
Resources such as the Duke Databank and HGV have traditionally acted as Hilfsmittel, or reference material, and have not been subject to disciplinary standards. They have supported our research, and for that reason could be forgiven typos, omissions, and other inaccuracies. The information that they delivered would never have been mistaken for scholarship (god forbid!). In fact, as students we learned always to check the printed edition when working on our documents and not rely solely on any electronic resource.
Our attitude towards electronic tools has changed. When Josh Sosin at Duke University and colleagues launched in 2010 with the intention of catching up on the backlog of printed texts that had not been digitized (see above) and of realizing a more sustainable data-entry model, the team set up a peer-sourcing mechanism. [14] This is not to be confused with a crowd-sourcing model, for the simple reason that peer-sourcing requires certain skills that most humans do not have, such as at least some knowledge of ancient Greek. Skilled volunteers would enter texts, mainly those that had been published previously in print form, and papyrologists would vet them. That was the idea, anyway. The reality has been slightly different: volunteers have shown up and done an extraordinary job entering new texts that previously appeared mainly in printed editions of papyri, but the specialists who were supposed to vet these entries have stayed largely on the sidelines, a situation that persists to this day.
In addition to digitizing print editions, has also allowed papyrologists to perform digital scholarship, such as make emendations to already digitized texts. This is not to be confused with the simple correction of encoding errors; rather, it refers to scientific improvements to the transcriptions of the ancient texts themselves. Here we have observed, in a digital setting, some of the curatorial behaviors that used to take place in print, as we saw with Hagedorn’s Bemerkungen. We have also witnessed the publication of a small number of new, full-scale digital editions of texts via [15] The irony is that now students are told to check for the most up-to-date information about a papyrus text and not necessarily rely on the printed edition, the opposite advice that we got as students.

I will show you what I mean by a superior digital text with the example of an ostracon from the Red Sea harbor town of Berenike in Egypt’s Eastern Desert, which preserves a receipt for water delivery. It was originally published in the second volume of ostraca from Berenike (O.Berenike 2.226) under Miscellaneous, because the editors had not recognized it as a water receipt. After members of the Berenike project found dozens of similar texts in 2009, which Roger Bagnall and I published in the third volume of Berenike ostraca under the heading of Water Archive (O.Berenike 3.274–455), we realized that this earlier text was of a similar type. [16] Moreover, we could confirm this on the infrared photos. We were thus able to improve on the text of O.Berenike 2.226 enough that we created a substantially better digital edition. Because of our revisions, the text received an entirely new publication number, ddbdp;2016;2 (Figure 5). [17

Figure 5. New edition of O.Berenike 2.226,;2016;2.
With increased scientific activity occurring online, there has been a greater tendency in recent years for the papyrological community to accept electronic texts uncritically, sometimes without any reference to traditional print scholarship pertaining to them. Many users have bestowed on online editions primary status. This has resulted in various kinds of redundancies. For example, emendation proposals that have already been registered in print elsewhere are being presented as new discoveries online. The reasons for this appear to be twofold. First, people have greater access to papyri because of digital media than they have the papyrological training to deal with them or the papyrological libraries to consult regarding them. This is not a bad thing. It has always been a stated aim of those involved in digital papyrology to improve access to the discipline’s core textual evidence, thereby dislodging the field from the grip of the privileged few who have easy access to the collections. The second reason is because people have lost a critical filter (and this is not in all cases bad, either) that might have previously led them to exhaust all resources before making a claim: online scholarship is often deemed more authoritative, perhaps because it is more recent. Similarly, because this data is perceived as current, the belief is that it must have undergone all necessary quality controls. This attitude ignores the fact that the user community of papyrologists is the only body capable of ensuring quality, and if it does not assume its responsibility, it cannot expect texts and metadata to reflect current scholarly opinion. Some data is indeed current, but much of it dates back years, even decades. The trust people put in the resources is therefore misplaced, but we have reached this state of affairs for one simple reason: we do not want to use electronic data only to enhance traditional scholarly method by, for example, pointing us to printed editions that may be worth consulting; we want to conduct papyrological research on and with our electronic data.
Now, the papyrological purist might not be happy about this, and the journal editor who has to deal with flawed submissions that rely on sub-standard electronic texts will not be happy, either, but it is clear we will not prevent people from viewing online texts as primary editions. Our tools have created a practice that cannot be stopped unless we eliminate the tools, but then we will face an even greater crisis than the one of the late 90s, which I described above. The better approach, in my view, is for us to engage the scholarly community more and help them become more responsible data consumers. There is no way we are going to clean our data up systematically. The amount of labor it would take to do this on a century worth of scholarship makes it an unrealistic and counterproductive goal. Rather, there has to be a selective and iterative process of refinement driven by a sense of individual accountability. As scholars, we should treat our common textual data like scientists do their lab samples, trying to ensure its general integrity, in the hope it can then be used to draw valid scientific conclusions.
Many papyrologists, however, still do not relate to their digital data this way. What we get as a result is a gap between those on the DH side of the spectrum who are pursuing innovation, creating powerful tools but not always with much concern for the users, and those on the traditional papyrological side who are just grabbing whatever gets outputted, often complaining about the resource if it is cumbersome to use or the data is flawed. This is the gap between innovation and consumption that I mentioned earlier, which both sides can play a constructive role in bridging. As a community, papyrologists could stop waiting for the few to show basic accountability. Established scholars could participate in the curating process by, for instance, volunteering time to vet online submissions, in order to ensure the accuracy of the transcriptions and metadata. On the other side, the Digital Humanities can assist not only by creating tools to extract more data, or to arrange it in more interesting ways (we want these, too; don’t get me wrong), but also by helping design and implement the interfaces that will support the digital scholarship we are all migrating towards. We need platforms that better facilitate collaborative born-digital research and publishing. And critical to this is more attention to user experience, [18] which heavily depends on good design. We underestimate the degree to which design contributes to the production of knowledge. [19] I doubt that the scholarly community as a whole—again, those who have been slower to engage in digital curation—will take responsibility for their digital environments unless we give them resources with lower barriers to entry, that is, tools that make it easier for them to participate in all aspects of the curatorial process. If we do manage to give them more user-friendly resources, we will put a larger constituency of the humanities more squarely behind the digital and this might make us all think more seriously about the digital sweaters we wear. We might all come out dressing just a little better.


