D-scribes Project and Beyond: Building a Virtual Research Environment for the Digital Palaeography of Ancient Greek and Coptic Papyri

Isabelle Marthot-Santaniello, University of Basel
Thanks to the dryness of the climate, texts written on papyrus leaves were preserved in Egypt and tens of thousands have reached us by chance today.* By far the most numerous are those written in ancient Greek, covering a millennium from Alexander the Great to the Islamic conquest. [1] They illustrate the many facets of the use of writing in antiquity, including copies of literary masterpieces, drafts of personal letters, tax receipts, and accounts of daily expenses. The papyri are therefore a key source for philologists and ancient historians who have to face the challenge of handling a scattered body of sources. Because a large majority of papyri have come to us through the antiquities market at the end of the 19th and early 20th centuries, we lack sound information on the archaeological context of their discoveries. [2] Texts that must have been found together or fragments that used to belong to the same document stand now divided in various collections around the world, and it is only with significant effort and sometimes a bit of chance that papyrologists manage to gather and reconstruct some of them. [3] The digital turn has started to improve the situation. Various initiatives have led to the production of efficient online resources that allow searching through metadata (e.g., dates, mentioned people and places) and transcriptions of papyrological texts. Online catalogues provide access to more and more images. Yet, on the specific topic of palaeography, a domain that traditionally contributes to reassembling split documents, proper tools that adopt the advantages from the recent advances in computational analysis are still lacking. The D-scribes project aims to fill this gap by making a first step toward a Virtual Research Environment that can be integrated into existing resources and that is devoted to the digital palaeography of Greek and Coptic papyri.

1. Papyrological and Palaeographical Resources

To begin to understand this massive body of evidence, papyrologists were among the earliest in the field of classics to participate in the digital turn. [4] Nicola Reggiani has recently produced an impressive overview of “digital papyrology,” which is not only a descriptive catalogue of online resources but a theoretical, methodological, and epistemological analysis of this emerging discipline. [5] The various directions in which papyrologists have engaged with the digital revolution are scrutinized, from the production of bibliographical resources, metadata catalogues, and word indexes to text encoding, quantitative analysis, and papyrus imaging. If the majority of these resources first appeared as standalone products of defined projects, the current trend is to work toward their integration. Two main platforms illustrate this phenomenon: Trismegistos and papyri.info.

1.1 Trismegistos

Trismegistos (TM) [6] started in 2005–2006 as a database that enabled the searching of metadata on all the published papyrological texts from Graeco-Roman Egypt. [7] It has since grown into a set of interrelated databases including, among others, sections on people, places, archives, and linguistic features like text irregularities and formulae. This “ever-expanding” platform, to quote the home page, hosts online publications and tools like a date converter. It also offers a series of display options (e.g., original or semi-regularized transcription of the text, highlight of personal names) for a more flexible user experience. However, as Nicola Reggiani underlines, “the utmost relevance of TM in the scenario of Digital Papyrology” is the creation of unique numerical identifiers for texts, places, persons: “By assigning a ‘TM number’ to each document recorded, it easily overcomes the bibliographical inconsistencies…and fosters cross-platform compatibility and integration between different digital representations of ancient texts, settling a universal, uniform and truly (etymologically) ‘digital’ standard.” [8]

1.2 Papyri.info

The papyri.info website, launched in 2007, has two primary components defined on the home page as follows: “The Papyrological Navigator (PN) supports searching, browsing, and aggregation of ancient papyrological documents and related materials; the Papyrological Editor (PE) enables multi-author, version controlled, peer reviewed scholarly curation of papyrological texts, translations, commentary, scholarly metadata, institutional catalog records, bibliography, and images.” The Papyrological Navigator has gathered transcriptions of nearly all the Greek documentary texts published already (over 75,000), while the encoding of Coptic texts is ongoing. Its equivalent for literary and paraliterary texts, the Digital Corpus of Literary Papyrology (DCLP), is still in production. [9] Papyri.info was built upon a combination of material inherited from different entities: the encoded transcriptions of papyrological texts from the Duke Databank of Documentary Papyri (DDbDP), descriptive metadata from the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV), and additional metadata, along with translation and images, from the Advanced Papyrological Information System (APIS). The latter initiative, which began in 1997, aimed to “create a collections-based repository of information about and images of papyrological materials (e.g., papyri, ostraca, wooden tablets, etc.) located in collections around the world; it was envisaged as a first stage in creating a comprehensive papyrological working environment online.” [10] To reach this goal, common standards were established in the metadata structures and the images formats. High-resolution images were stored for further purposes beyond mere online consultation, as underlined by Ast and Davis: “As part of a new initiative, APIS has taken the first steps towards creating a repository of archival TIFF images, in order to ensure the long-term preservation of the project’s digital content and to allow future experimentation in innovative image display techniques.” [11] Thus far, the latter intention has not been further developed. Images inherited from APIS are indeed visible in papyri.info in a viewer embedded within the page, next to the text transcription (Figure 1). However, Ast and Davis mentioned that 15,000 images were available in APIS, even though only 4,600 show up in papyri.info when one chooses the search option “Show only records with images from” and checks the box “Papyri.info.” [12] The platform is therefore still far from being a Virtual Library presenting reproductions of all the papyri. In the vast majority of cases, to know if digital images of a given papyrus are available, one has to search papyri.info metadata, which contain as exhaustively as possible links to images hosted externally in owning institutions catalogues.
Figure 1. Example of an APIS notice in Papyri.info (captured from http://papyri.info/ddbdp/p.bacch;;15). Lund University Library (P. Lund. inv. 299).

1.3 Catalogues of collections

With the massive effort worldwide to digitize cultural heritage, access to many papyrus images is indeed now possible as part of online catalogues of their owning institutions (libraries, archives, and museums or LAMs). Since the institutions have various goals and means, and papyri are often a minor subset of their collection, there is no harmonized standard in the way metadata on papyri are provided in these catalogues and images vary greatly in format, resolution, scale, colour profile, etc. Following the path of APIS, there have been trials to gather several collections into portals [13] like the Italian PSIonline, [14] the Spanish Ductus, [15] and the German Papyrus Portal. [16] In short, as an obvious starting step, efforts have been focused so far on providing access to digital images of papyri, but not yet on their searchability, i.e., using them as a primary material for new research. This is where digital or computational paleography comes on to the scene.

1.4 Palaeographical catalogues

On the palaeography of Greek and Coptic papyri, there are no dedicated resources available online yet that could offer a clear introduction and definitions of scripts comparable to what has been done for manuscripts by Timothy Janz from the Vatican Apostolic Library on the website Greek Paleography, From Antiquity to Renaissance. [17] What exists so far that enables an idea of the variety of scripts encountered in the papyri are two catalogues: PapPal and CDDGB. Launched in 2013, Papyrology / Palaeography (PapPal) [18] provides examples of Greek and Latin documentary texts from the 3rd c. BCE to the 8th c. CE that contain explicit dating information, thus realizing the long-standing wish for such a repertory of images of dated papyri. [19] The Collaborative Database of Dateable Greek Bookhands (CDDGB), [20] built by Grant Edwards in 2019, collects Greek texts from the 1st to the 9th c. that are written in a literary script (excluding minuscule) [21] and for which indirect evidence (dated documentary text on the other side, archaeological context, belonging to a known archive) can establish a reasonable dating.
Palaeographical analyses are often undertaken to provide an approximate date for the artefact when other evidence is lacking. This approach relies on the fact that handwriting styles evolve through time. Therefore, comparisons with dated— or dateable—texts with similar features are in many cases the only solution to propose a date. However, dating based on palaeographical assessments is often criticized because of its subjectivity. [22] Palaeography is also important for material reconstruction, finding joins or hands in order to reconstruct fragments belonging to the same document, or texts written by the same person, or by persons trained in the same school, place, or period. These two dimensions of palaeography can greatly benefit from the recent advances in the domain of Computer Vision applied to Document Analysis and Recognition.

2. D-scribes project

The project “Reuniting fragments, identifying scribes and characterizing scripts: The Digital Palaeography of Greek and Coptic Papyri (D-scribes)” began in Basel in 2018, funded by the Swiss National Science Foundation as an Ambizione Grant. The project is a first step toward building a virtual library, incorporated within a Virtual Research Environment, with tools allowing palaeographic comparison that can provide rationalized criteria for palaeographical decisions. The project will run for four years, and the team is comprised of the Principal Investigator, a technical assistant, and three student assistants from the fields of Computer Science and Ancient Civilizations.

2.1 Three case-studies

As a pilot project, D-scribes focuses on three complementary case studies: papyri bearing the Iliad of Homer, the Dioscorus archive, which is the largest archive from the Byzantine period coming from a single village, and the Papas archive, which includes hundreds of Greek and Coptic papyri broken into fragments that were found in a jar.
Because of their number—more than 1,500 items—and their chronological distribution covering the entire papyrological millennium, the Iliad papyri offer the opportunity to study the evolution of ancient Greek handwritings. The challenge is to classify the various images according to similarities in the shape of scripts and in layout (height of lines and interlinear spaces).
Seven hundred texts from the Dioscorus archive in Greek and Coptic are already published. They date from the 6th and early 7th centuries, providing a unique chance to encounter several samples penned by the same, explicitly identified individuals. All the images have already been collected by Jean-Luc Fournet in his Banque des images des papyrus de l’Aphrodité byzantine. [23]
The Papas archive, dating after the Arab conquest, is challenging because of its numerous fragments of small size and its limited number of already published texts. However, the jar in which it was found presents a rare case of close archaeological context and it is now stored in one single place, the French Institute for Oriental Archaeology (IFAO) in Cairo, which simplifies the access to the original documents and the collection of digital images. This collection is therefore promising material for the task of finding joins among fragments in a massive jigsaw puzzle.

2.2 Infrastructure

In order to coordinate the teamwork, D-scribes benefits from several resources offered by the University of Basel (Figure 2). Gitlab [24] is used for organizing the tasks, dealing with issues, and documenting processes when the team changes, which is unavoidable since student assistants join the project for a limited period of time. Sharing the data and software prototypes is made possible thanks to two depositories: a folder accessible only for the team members within the University network and a Switchdrive [25] depository to share specific folders with external collaborators. An SQL database has been constructed in the scope of the project. It gathers metadata on each papyrus of the three case-studies, uses the IIIF [26] standard and includes a Mirador [27] viewer to annotate papyrus images. Metadata include links to Papyri.info and Trismegistos (along with TM identifiers for texts and persons) to facilitate future integration.

Figure 2. The infrastucture of D-scribes project.

2.3 Showcase

A showcase is about to be launched that will enable the visualization of all the passages of contracts written by identified notaries in the Dioscorus archive. For each notary, three categories of samples will be displayed: the bodies of the documents that he penned, the signatures (subscriptions) which ascertain his identity, and pieces without signature but assigned to him by specialists based on palaeographical comparison. In a second step, data will be released concerning the other writers encountered in these notarial documents, which are the parties and witnesses (Figure 3). This showcase offers the possibility to catch a first glimpse on the sociology of the act of writing in an Egyptian village by showing samples of handwriting performed by priests, soldiers, and shepherds, among others. A second strand of the showcase will display Iliad papyri illustrating various handwriting styles.

Figure 3. Example of P.Mich. XIII 662 (TM 21375) annotated in D-scribes database. Seven areas have been delimited (from top to bottom): the body of the text penned by the notary, the subscriptions of the two parties, the subscriptions of the three witnesses, and the notary’s subscription. University of Michigan Library, Papyrology Collection.

2.4 The project website

In the meantime, the website d-scribes.org not only presents the project’s goals, perspectives and regular updates, but also offers access to the list of publications by the team, resources, and outreach activities. As research outputs, three datasets have been published: the first one is the basis of the Competition on Document Image Binarization (DIBCO 2019) [28] as part of ICDAR 2019, and it includes ten images of Homeric papyri that were manually binarized by the team. The second, called GRK-papyri, is tailored for the task of Writer Identification and gathers fifty samples of ten writers from the Dioscorus archive. [29] It has already been the object of preliminary experimentation, applying state-of-the-art methods in the domain of Writer Identification. [30] The third, called PapyRow, is an extension of GRK-papyri of 122 images from 23 writers segmented into 6498 lines. [31] Another resource is the software Hierax that has been used to share findings on enhancement methods on papyrus images. [32] Also presented on the website are videos and posters from the conference Neo-Paleography: Analysing Ancient Handwritings in the Digital Age held in Basel in January 2020. [33] Outreach activities are also available, including the videos of the public reading of Homer held in March 2019 and the content of an exhibition on Homer in Basel held at the University library.


D-scribes lays the foundation for a future Virtual Research Environment dedicated to the digital palaeography of papyrological material. This VRE will be a place to display palaeographical features and discuss terminologies used to describe them along with computational methods allowing their rational measures. It will also include means to share resources (datasets, tools) and to open research to a large audience via introductory (pedagogical) content and outreach activities. It should also be included in a wider network of digital palaeography regardless of the scripts, languages, and writing materials. Several initiatives (a mailing list, workshops) have been launched in order to connect the emerging community of digital palaeographers for which sharing both research results and methodologies can be the key to major achievements in the future of manuscript studies.


Ast, R., and S. P. Davis. 2008. “The Advanced Papyrological Information System.” Storicamente 4, article 33. https://storicamente.org/ast-davis.
Ast, R., and H. Essler. 2018. “Anagnosis, Herculaneum, and the Digital Corpus of Literary Papyri.” In Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri, ed. N. Reggiani, 63–73. Berlin.
Bagnall, R. S., ed. 2009. The Oxford Handbook of Papyrology. Oxford.
Cilia, N. D., C. De Stefano, F. Fontanella, I. Marthot-Santaniello, and A. Scotto di Freca. 2021. “PapyRow: A Dataset of Row Images from Ancient Greek Papyri for Writers Identification.” In Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021, ed. A. Del Bimbo et al., 223–234. Heidelberg. https://doi.org/10.1007/978-3-030-68787-8_16.
Cuvigny, H. 2009. “The Finds of Papyri: the Archaeology of Papyrology.” In Bagnall 2009:30–58.
Delattre, A., and P. Heilporn. 2014. “Electronic Resources for Graeco-Roman and Christian Egypt: A Review of the State of the Net (March 2014).” Bibliotheca Orientalis 71:308–331.
Mohammed, H., I. Marthot-Santaniello, and V. Märgner. 2019. “GRK-Papyri: A Dataset of Greek Handwriting on Papyri for the Task of Writer Identification.” In 2019 International Conference on Document Analysis and Recognition (ICDAR), 726–731. Sydney.
Nasir, S., and I. Siddiqi. 2021. “Learning Features for Writer Identification from Handwriting on Papyri.” In Pattern Recognition and Artificial Intelligence. MedPRAI 2020, ed. C. Djeddi, Y. Kessentini, I. Siddiqi, M. Jmaiel, 229–241. Heidelberg. https://doi.org/10.1007/978-3-030-71804-6_17.
Orsini, P., and W. Clarysse. 2012. “Early New Testament Manuscripts and Their Dates, A Critique of Theological Palaeography.” Ephemerides Theologicae Lovanienses 88(4):443–474.
Pratikakis I., K. Zagoris, X. Karagiannis, L. Tsochatzidis, T. Mondal, and I. Marthot-Santaniello. 2019. “ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019).” In 2019 International Conference on Document Analysis and Recognition (ICDAR), 1547–1556. Sydney.
Reggiani, N. 2017. Digital Papyrology I: Methods, Tools and Trends. Berlin.
Vandorpe, K. 1994. “Museum Archaeology or How to Reconstruct Pathyris Archives.” Egitto e Vicino Oriente 17:289–300.


[ back ] * This work was supported by the Swiss National Science Foundation as part of the Ambizione project PZ00P1_174149 “Reuniting fragments, identifying scribes and characterizing scripts: the Digital paleography of Greek and Coptic papyri.”
[ back ] 1. For an introduction to papyrology, see Bagnall 2009. For a numerical overview of the published texts by language and material as of 2014, see Delattre and Heilporn 2014:314.
[ back ] 2. For the archaeology of papyrus finds, see Cuvigny 2009.
[ back ] 3. An impressive example of Museum Archaeology is given by Vandorpe 1994.
[ back ] 4. Brice C. Jones conveniently gathers papyrological resources in his website; see https://www.bricecjones.com/papyrological-resources.html.
[ back ] 5. See Reggiani 2017.
[ back ] 6. http://www.trismegistos.org; see Reggiani 2017:56–73.
[ back ] 7. The originality was to include not only Greek and Latin but also Ancient Egyptian languages and both literary and documentary texts.
[ back ] 8. Reggiani 2017:57. It is to be noted, however, that since 1 January 2020, the use of several functionalities has been restricted to subscribers only, limiting TM’s accessibility.
[ back ] 9. See Ast and Essler 2018.
[ back ] 11. Ast and Davis 2008.
[ back ] 12. Figures last checked on 28.02.2021: 4603 images are hosted in papyri.info, of which 2011 have transcription. From this last group, almost all (1992) are labeled as Ancient Greek.
[ back ] 13. Reggiani 2017:98–102.
[ back ] 19. The idea, at the origin of metadata catalogs like HGV, can be traced back at least to the 1960s, see Reggiani 2017:39. See also the description of PapPal in Reggiani 2017:151–152.
[ back ] 21. For the difference between majuscule and minuscule bookhands, see https://spotlight.vatlib.it/greek-paleography/feature/1-majuscule-bookhands.
[ back ] 22. See Orsini and Clarysse 2012.
[ back ] 26. https://iiif.io/; for a catalogue of papyri using IIIF, see https://bodmerlab.unige.ch/fr/constellations/papyri.
[ back ] 27. https://projectmirador.org/; for an example of annotated manuscripts using Mirador, see https://spotlight.vatlib.it/overview.
[ back ] 28. Pratikakis, Zagoris, Karagiannis, Tsochatzidis, Mondal, and Marthot-Santaniello 2019.
[ back ] 29. Mohammed, Marthot-Santaniello, and Märgner 2019.
[ back ] 30. Nasir and Siddiqi 2021.
[ back ] 31. Cilia, De Stefano, Fontanella, Marthot-Santaniello, and Scotto di Freca 2021.
[ back ] 33. The proceedings are in preparation for a special issue of COMSt Bulletin.