A new project began in December 2020 entitled Titles of the New Testament: A New Approach to Manuscripts and the History of Interpretation (TiNT), led by Garrick V. Allen and funded by the European Research Council (ERC). [1] At first glance, this topic may appear too straightforward to warrant any scholarly attention whatsoever, let alone public funding. After all, we already know the titles of the books of the New Testament as they appear in modern vernacular Bibles. The Gospel of Matthew, Colossians, Hebrews, 1 Peter, the book of Revelation, and the titles of the other twenty-two works of the New Testament are familiar appellations that are attached to stable literary entities. [2] The problem is that these works have not always been known by these exact names. Titular drift is a feature, not a bug, of the New Testament’s transmission. And although the variation of the titles—at least as far as titles at the beginning (inscriptions) and end (subscriptions) of works are concerned—maneuver within traditional structures, there remains a surprising variety in wording and aesthetics when it comes to the New Testament’s Greek manuscripts. Case in point is the book of Revelation, the least attested work within the New Testament (preserved in only 300 accessible manuscripts), that has fifty-three different inscription and subscription titles that create forty-four unique English glosses. [3] Moreover, this variation does not even take into account changes in the positioning, script, and aesthetics of titular formulations. The titular tradition of the Greek New Testament is rich in variables, requiring significant attention when it comes to working with and capturing this data in a virtual research environment (VRE). To this end, this article explores the dynamics of the TiNT project’s research agenda, the material it engages with, and the technical aspects and design of its digital workspaces necessitated by the project’s critical goals. The project has only officially begun, so much of the following conversation remains aspirational and contingent, especially the functionalities of its digital tools that are currently being developed in partnership with the ADAPT Centre and the Institut für Neutestamentliche Textforschung (INTF), but it offers an overview of the project in the context of VREs.

1. New Testament Titles

The TiNT project seeks to aggregate the complexities of each of these titular forms for nearly every 3,500 non-lectionary manuscripts of the Greek New Testament in an effort to create a searchable body of data that enables us to research six larger disciplinary questions. Because the titles are diverse in their forms, wordings, layouts, and aesthetics, the collected data provides rich new evidence that allows us to advance discussions on the diachronic development of paratextual systems, the provenance of these works, the development of canonical ideologies, aesthetic developments in manuscript cultures, the relationship between scribes, readers, and titles, and traditions of segmentation. In order to achieve these critical goals, the TiNT project requires a suite of digital tools to gather, process, and query this large quantity of material.
But before we come to our digital tools, we should explain what we mean by “titles.” When we are asked to define what constitutes a title, most of us likely visualize the cover of a book or a title page, spaces containing formulations that may describe the content, subject, topic, genre, allusive network, or theme of the work, among other options. And these formulations on the covers of books and title pages (or the tops of websites) do indeed represent at least one form of the title in modern print cultures. TiNT defines titles as descriptive statements, summative formulations, or other textual labelling devices that are segregated in some way from the main text of a book or manuscript. They can occur at the beginning and end of works, on a cover page, title page, or in any other space that distinguishes the title from the main text. In addition to inscriptions and subscriptions, the ca. 3,500 non-lectionary Greek manuscripts that witness to the works of the New Testament contain a variety of titular items, including running titles, intertitles, titles of commentaries, tables of contents, and the various titular formulations of prefatory, indexical, and other literary items in a given codex. Together these features comprise part of the paratextual tapestry of the tradition, offering perspectives on authorship, provenance, imagined geographies, content, intertextual networks, and interpretive traditions of these works. [4] Titles of all forms impinge on reading experiences and influence engagements with the texts that they label, as both philological and cognitive scientific studies have shown. [5] Very few, if any, manuscripts will preserve each form of the title enumerated below, but nearly every non-fragmentary manuscript in the corpus will preserve some form of the title—titles are essential aspects of what constitutes literature going back to antiquity.
The flexibility and complexity of the titular tradition requires an equally flexible and complex set of digital tools that allow us to aggregate and analyze the nuances of the tradition. Working in a VRE is essential for this work because nearly all Greek New Testament manuscripts have already been digitized and are available to researchers in the New Testament Virtual Manuscript Room (NTVMR) managed by the INTF at Münster. This workspace allows for our markup program to contribute additional layers of metadata to the manuscript images available on the site by customizing and reconfiguring the markup tools that already exist there. The quantity of physical manuscripts of this tradition means that this type of editorial work must be undertaken primarily in a digital medium. We are currently creating an editorial tool within the NTVMR that allows us to capture the data we seek, and we plan to embed the resulting database within the suite of tools currently accessible on the site and to export this data to our own website to create an additional knowledge graph tool with complex searchability and recall features.
In order to understand TiNT’s tools, we must first be aware of the NTVMR’s history and functionalities. The current incarnation of the NTVMR was launched in June 2013 under the URL It was conceived and developed by Troy A. Griffitts and Ulrich B. Schmid at the INTF in Münster with the input and support of the colleagues that are working there on the edition of the Greek New Testament (Editio Critica Maior or ECM). [6] As such it was designed as a tool to facilitate the logistics when dealing with one of the largest text traditions known to us today. Hence, it consists of modules that address the tasks of (a) managing images and metadata of manuscripts, (b) gathering content information from the manuscripts by engaging with the images, (c) in particular creating diplomatic transcriptions of manuscripts, (d) running automatic collations of practically any number of manuscript transcriptions, (e) creating a philologically apt apparatus criticus by manipulating the collation process through interactively built filtering mechanisms, and (f) managing users and user groups that are working simultaneously on different New Testament works and tasks. [7] As these modules are part of a digital workspace that is server-based and accessed via the internet, it allows for robust backup mechanisms on the NTVMR owners’ side as well as ease of mind on the users’ side.
While the production of critical editions of the Greek New Testament is the current dominant use for the NTVMR, the modest beginnings of a precursor version as of September 2008 had its focus on making available digital images of manuscripts in a “virtual reading room,” as well as giving its users the ability to index the biblical content for the manuscript pages on display. [8] This first version of the NTVMR was developed by Martin Faßnacht and Ulrich B. Schmid while working at the INTF with the support of their colleagues on site. [9]
From the very start, both versions of the NTVMR tried to envisage research that extended beyond the mere collecting of textual data for the editing of the Greek New Testament. The current NTVMR’s concept of “features” that can be attached to images at every point proved absolutely crucial in this regard, since it allows users to annotate paratextual data found on the manuscript images with sets of “features” representing, e.g., art historical or liturgical concepts. These features can then be searched and exported via the web services provided by the NTVMR api. This promises to be an easy and powerful way to generate large datasets that all represent different scholarly views on the same objects, i.e., manuscript pages. And the TiNT project aims to leverage this flexibility as the primary mode for gathering data that is essential for the project’s critical aims.

But what exactly are we looking for when we speak of titles? Within the context of the NTVMR, the first forms of the title that the TiNT project seeks to gather data on are inscriptions and subscriptions. These constructions are the most common titular form in the New Testament’s Greek tradition. The inscription is readily familiar to those who read modern Bibles or other books. The inscription to 2 Corinthians, for example, in CBL BP II (Figure 1), perhaps the oldest extant copy of Paul’s letters, is προς κορινθιους β, a formulation that could easily be glossed in English as “To [the] Corinthians 2” or 2 Corinthians. It is situated here between the page number and the start of the text of the letter and is set apart from the main text by its slightly larger script size and the use of intermittent horizontal lines above and below the title. TiNT will aggregate not only the text of this formulation, but these other design features that distinguish the title aesthetically from the text of the work it heads.

Figure 1. Dublin, CBL BP II, fol. 61 (P46), inscription to 2 Corinthians (with page number above). ©The Trustees of the Chester Beatty Library, Dublin.
But some inscriptions are more complex, especially when they are entwined with prefatory traditions like the late antique Euthalian apparatus, comprised of prefaces (hypotheses) and segmentation systems. [10] For example, in Dublin, TCD MS 30 (GA 61; diktyon 13584), the sixteenth-century Codex Montfortianus, 2 Corinthians lacks an inscription altogether, both before its Euthalian hypothesis (249r) and the start of the text of the letter (249v). Only the running title (incorrectly attributed to κορινθιους α on 249r) identifies a shift in the literary work. 2 Corinthians also lacks a subscription in TCD MS 30, preserving only a short colophon at the end of the work (“Grace to God in Jesus Christ the Lord”; τω θεω χαρις εν χω ιυ τω κω).

Setting aside the peculiarities of TCD MS 30’s production, subscriptions are also common in Greek New Testament manuscripts, although less familiar to readers in modern print cultures. Not only do these items discriminate between literary works aggregated within a codex, but they also often reiterate, sometimes inexactly, the title of the inscription. For example, the inscription and subscription of Paris, BnF Coilsin gr. 205 (GA 93; diktyon 49345) differ slightly in Revelation: “Apocalypse of John the Theologian” (ιωαννου του θεολογου αποκαλυψις) in the inscription and “Apocalypse of St John the Theologian in the subscription (αποκαλυψις του αγιου ιωαννου του θεολογου). The difference in this instance is not earth-shattering, but it is part of a larger network of fungible titular formulations, which are pliable not only in terms of text, but of positioning, layout, and artistic emphasis. In some cases, the subscription also provides further information on the work, often drawn from commentary or other interpretive traditions. The subscription to Matthew in Dublin, CBL W 139 (GA 2604; diktyon 13571), an early twelfth century Gospel codex, is one example (Figure 2).

Figure 2. Dublin, CBL W 139, 119v (GA 2604), subscription to Matthew. ©The Trustees of the Chester Beatty Library, Dublin.

τελος του κατα ματθαιον ευαγγελιου
εξεδοθ(η) εν ιλημ φωνη τη εβραιδι
συνεγραφη το κατα ματθαιον ευαγγελιστη
μετα χρονους οκτω της χυ αναληψεως

End of the Gospel according to Matthew
Published in Jerusalem in the Hebrew languages
Written by Matthew the Evangelist eight years after Christ’s Assumption

At the end of this work, three declarative statements are set off from the main text by the use of gold ink, slightly larger script, and strings of non-alphabetic glyphs. The first reiterates the title of the work and signals the work has ended, the second identifies in place and language of composition, and the third identifies its putative author (the governing voice of the canonical gospels is anonymous) and the date of composition. There is an ongoing debate in biblical studies about which parts of this formulation actually qualify as a title—some distinguish between the titulus finalis (the first line) and the subscription (lines 2–4)—but we treat this entire formulation as a title because its aesthetics and positioning coalesce to signify a unified paratextual feature within the manuscript.

In addition to titles at the start and end of a work, they also occasionally occur as running titles, usually located in medieval and early modern Greek manuscripts, but also in late antique manuscripts. A good example of running titles is found again in Codex Montfortianus (TCD MS 30, GA 61). Its running titles are usually rubricated and split between the verso and recto of an opening. For example, the running titles of Matthew on 24v reads ματ, while 25r preserves θαῖος (ματθαῖος). [11] As in modern book culture, running titles assist readers in navigating a codex and in swiftly identifying a work. And although they are usually located in the upper margins, their wording, aesthetics, and location differ from manuscript to manuscript. Sometimes they even do double duty as the only explicit signifier of the work at hand, like in the Corinthian correspondence in TCD MS 30. The fact that a single formulation is split over two folios provides another layer of complexity for digitally capturing this data in an editorial space that is organized by digital images of single manuscript folia.
Another common form of the title in the New Testament’s Greek tradition is what we are provisionally calling intertitles, referring to statements that entitle particular textual segments within the text of these works. [12] Intertitles are usually attached to larger commentary and segmentation systems. Each New Testament work accrued intertitle systems in late antiquity, including kephalaia systems in the Gospels, the Euthalian tradition in the Pauline Epistles and Praxapostolos, and the Andrew of Caesarea tradition in the book of Revelation. [13] These titular forms often appear in the margin, but sometimes interrupt the work’s main text.

For example, Athos, Pantokratoros, 44 (GA 051; diktyon 29603), a tenth-century copy of the Andrew of Caesarea commentary on Revelation, includes the titles for each of the seventy-two textual segments (kephalaia) into which Andrew divided the Apocalypse. The folio that preserves Rev 13:18, the famous number of the beast passage (15r, Figure 3) contains a number of paratexts, including a lengthy marginal comment connecting the identity of the beast to the figure Raiphan mentioned in OG Amos 5:26, an attempt to do the calculation of the number of the name in the top margin, a short note to “deny me” (ἀρνοῦ με) referring to the antichrist, and a page number added by a later hand. There’s a lot going on here, but the third line of the text is an intertitle that describes the perceived content or significance of the scriptural text (lines 4–6), which is written in an uncial script compared to the commentary that is written in minuscule. The title (line 3) reads “Regarding the Name of the Antichrist” (περι του ονοματος του αντιχριστου), a formulation that directly reflects Andrew’s interpretation of the beast as an eschatological antagonist. The term “antichrist” is not used in Revelation, and Andrew’s interpretation stands starkly against the conclusions of modern scholarship that see this figure and his paronomastic name as a cipher for a historical Roman emperor, perhaps Nero. [14] This titular formulation cues readers to adopt Andrew’s eschatological perspective and to continue to attempt to decode the name of the figure by adding the numeric value of Greek graphemes. Notably, this process continued into the late and post-Byzantine period, where some commentators identify Muhammad as the beast because the graphemes of one form of his Greek name (μοαμετις) and the spelling of Mecca in Greek (μαχκε) equate to 666. [15] Intertitles influence the deployment of reading protocols and processes of interpretation, and they represent another complicating factor for the editing of this material. It may not ultimately be feasible to capture every intertitle in every manuscript, but we intend, at the very least, to focus our attention on intertitles for a particular set of New Testament works, perhaps the Catholic Epistles. [16]

Figure 3. Athos, Pantokratoros, 44 (GA 051), 15r. Comments on Rev 13:18 with intertitle. Public Domain: Library of Congress Collection of Manuscripts from the Monasteries of Mt Athos.

These kephalaia that sometimes segment the text of manuscripts are also regularly collected in tables of contents (pinakes). For example, each of the four gospels in CBL W 139 (GA 2604) are prefaced by tables of contents (and other paratextual prefatory texts) that enumerate the titles of each of the identified textual segments. The pinax that precedes Luke (178v–180r, see Figure 4) is bounded by a hypothesis to the gospel and a lexicon of Hebrew words (179r) and it contains chapter number and title of each of Luke’s eighty-three chapters. The table itself is entitled (“The Chapters of the Gospel according to Luke,” του κατα λουκαν ευαγγελιου τα κεφαλαια) and each entry in the table is a titular formulation that functions as such when it appears in the text. Because most of the titles that appear in the tables are fairly stable in terms of wording, we can build templates for most instantiations and edit them to match the text of the images when necessary. But this complexity of the titular tradition calls for greater flexibility in our editorial tools, data querying, and recall, particularly if we want to link the appearance of an intertitle in a pinax with its appearance among the main text.

Figure 4. CBL W 139 (GA 2604), 178v. Pinax to Luke. ©The Trustees of the Chester Beatty Library, Dublin.
The final titular tradition that we want to draw attention to in this context is the titles of commentary traditions. The Greek New Testament is often transmitted as part of larger commentary or catena traditions devised in late antiquity or the medieval period. Significant scholarly energy has been devoted to these traditions in recent years, especially the commentaries associated with the Pauline tradition and the book of Revelation, focusing on their text critical and reception historical value and their insight into the vectors of the New Testament’s transmission. [17] And when present, the titles of these works that are integrated with the transmission of the New Testament works provide further insight into particular contexts of reading and the development of interpretive traditions. For example, Athos, Panteleimonos, 110 (GA 1775, copied in 1847), the latest manuscript recorded in the Kurzgefasste Liste of Greek New Testament manuscripts, [18] preserves two titles back-to-back. The first (51v) is the most effusive title for the book of Revelation in the entirety of the tradition located under an image of John’s revelation, [19] and the second is the title associated with the Andrew of Caesarea commentary (52r). The title of the commentary in this instance is “Commentary on the Apocalypse of John the Theologian of Andrew the Second and Wise Archbishop of Cappadocian Caesarea” (Ανδρεου σοφου αρχιεπισκοπου Καισαρειας καππαδοκιας, η ετερων, ερμηνεια εις την αποκαλυψιν ιωαννου του θεολογου), a formulation that makes assertions about the authorship of the commentary and the authority of his interpretation. The choice of the titles of this work speaks both to the apostolic provenance of the Apocalypse’s author and to the wisdom and insight of the work’s late antique commentator. As these handful of examples have shown, the titular traditions of the New Testament provide new evidence for old disciplinary questions and affords the space for new questions to arise.

2. TiNT Project Digital Workspaces and Other VRE Models

Designing a digital workspace that allows scholars to develop research hypotheses and to explore the richness of the collected manuscripts presents a number of challenges. Researchers in ADAPT have addressed aspects of these challenges in previous digital humanities projects, a track record that informs our approach to developing digital editorial tools for TiNT. A good example is the CULTURA project. [20] CULTURA was a European Commission funded project that ran between 2011 and 2014. It employed personalisation to support the exploration of digital collections (primarily Irish archival manuscripts and other historical sources), the collaboration of users around these collections, and to understand users’ interests in external digital archives with similar content. The CULTURA VRE pioneered the development of next generation adaptive systems that provide new forms of multi-dimensional adaptivity, including:

  • personalised information retrieval and presentation which respond to models of user and contextual intent;
  • community-aware adaptivity which responds to wider research community activity, interest, contribution, and experience;
  • content-aware adaptivity which responds to the entities and relationships automatically identified within the artefacts and across collections;
  • personalised dynamic storylines which are generated across individual as well as entire collections of artefacts.

CULTURA advanced and integrated the following key technologies to meet these goals:

  • cutting edge natural language processing, which normalises ambiguities in noisy historical texts;
  • entity and relationship extraction, which highlights the key individuals, events, dates, and other entities and relationships within unstructured text;
  • social network analysis of the entities and relationships within the content, and also of the individuals and broader community of users engaging with the content;
  • multi-model adaptivity to support dynamic reconciliation of multiple dimensions of personalisation.

The CULTURA VRE offered users features that enabled them to move seamlessly between four phases of engagement with the materials available: Explore, Support, Guide, and Reflect. These phases have been discussed in detail in another context, [21] but we reiterate them here to show their relevance for the editorial and research agenda of TiNT.

  • Explore. In CULTURA, users are offered a range of knowledge-informed exploration features that enable them to examine the entities and metadata features of manuscripts. This allows them to describe the facets that are of relevance to their enquiry. For example, users could express what they are looking for through faceted search dialogues, which supported free and autocomplete text. In the context of TiNT, searchable features may include location on page, style, pigments chosen, and many others. These facets can be supplemented with keyword and entity search to allow users to refine their search protocols. Items discovered could be grouped into projects allowing users to have several parallel explorations. This would allow users to have several concurrent explorations ongoing with resulting documents gathered. Importantly, the historical situation that led to their being gathered would also be captured.
  • Support. Recommendations for similar items were included in CULTURA. This allowed scholars to access metadata, textual, and entity similarities even if they did not explicitly search for this material, a process that may lead to new connections between objects. An example of this may be seen in Figure 5.
Figure 5. An example of the recommended content displayed to users in CULTURA.

In TiNT, additional metadata features such as aesthetic considerations could also be factored into such recommendations. Users in CULTURA could control the criteria for recommendations. For example, they could mark a recommended item as not applicable and set the reasons, allowing them to identify which metadata elements led to the match and why. This allowed users, in the scope of a project, to tailor how the system functioned. The various textual, aesthetic, and layout features captured in the TiNT project can function in similar ways.

  • Guide. The CULTURA project was designed with non-expert users in mind and allowed a set of manuscripts to be presented as a narrative sequence to users with expert commentary. This commentary could tie into specific entities and metadata, as well as highlighting parts of the text, enriching user guidance for searches. The vision was that such non-expert users could still leave the guidance to explore and be supported in whatever topics piqued their interests. This commentary was represented in a widget alongside the content the user was examining. Even though the data that the TiNT project produces is intended primarily for experts, it could also be arrayed in a narrative fashion. Or, alternatively, project metadata could be used in such a way as to orient users of the database toward the critical questions central to the project’s goals or other lines of enquiry. Part of the process of creating a public, searchable database for the project data is to experiment with these possibilities.
  • Reflect. Finally, reflect offers users the ability to see what models CULTURA was constructing of their interests. Their interactions with manuscripts, including searches and clicking of entity links, were implicitly modelled to determine which metadata features and entities they were showing the most interest in. This model was available for users to both scrutinize and control their interactions with the data, pruning unwanted elements and adding others. This gave users the ability, within the context of a project, to further refine and control what was of specific interest to them, informing the system’s support recommendations and offering unique opportunities for self-reflection within the context of engagement with artefacts. Constructing a reflect tool within TiNT’s database will give users opportunities to refine their approach to the material. Of course, before this feature of the database can be built, the project must first gather the requisite data, which will be produced using the project editorial tool embedded in the New Testament Virtual Manuscript Room (see below).

CULTURA offers several examples of how different phases of research and hypothesis generation may be supported, particularly over complex and rich metadata-described content. This has direct relevance to TiNT as users explore the rich diversity of titles across a variety of collections.

Another ADAPT supported project that showcases the power of new approaches to digital workspaces that facilitate scholarly activity can be seen in the Beyond 2022 project. [22] On 30 June 1922, at the beginning of the Irish civil war, the western block of the Four Courts in Dublin was hit by a terrible explosion, ignited during a siege between anti-treaty IRA rebels and the Provisional Government of Ireland. The resulting fire destroyed the Public Record Office of Ireland (PROI) and, with it, seven centuries of Ireland’s collective memories.
Across the globe, more than seventy repositories hold substitute or duplicate materials that can replace the documents lost in the fire. The Beyond 2022 project aims to assemble a complete inventory of loss and survival of the contents with the goal of recreating virtual copies of materials in the PROI of the Four Courts. The project and the resulting virtual treasury will gather all the information it can about these substitute sources from archives and libraries in Ireland and beyond.
Within Beyond 2022, there are four pillars representing strands of scholarly activity involved in assembling the virtual record treasury: Discover, Digitize, Reconstruct, and Reveal. The Reveal strand incorporates research into techniques that facilitate information retrieval, discovery, and enhanced accessibility to the contents of the virtual record treasury.
As part of this strand, the project is developing a knowledge graph to facilitate information retrieval and discovery of the reconstructed items. The project decided to adopt Semantic Web technologies to support its distributed knowledge graph and reasoning.
The purpose of the knowledge graph is to model information contained within the reconstructed sources—people, places, etc.—not information about the reconstruction itself. Using the graph approach, we not only model metadata about the entities that appear in particular manuscripts, but we are also able to model the rich interconnection between those entities with other related entities across manuscripts. Modelling using a standards-based (rather than proprietary) semantic web technology approach means that this information can easily be added to and evolved over time, is easily published on the web for exploration and integration by other scholars, and also enables the interconnection and integration of scholarly work of others who work in this disciplinary area. Being standards based, more and more graph-based scalable storage solutions and graph exploration tools are emerging. Adopting such an approach to modelling entities within TiNT is expected to bring these benefits. Even if, for example, the TiNT team finds that fully capturing the data for all titles in all non-lectionary Greek manuscripts is too onerous a task for the scope of the project, further material could be incorporated into the data set at another time. And an approach along these lines means that tagged features of titular formulations, like place names, languages, and people, can be visualized in various ways, enabling researchers to make connections between manuscripts that would be unintuitive using only classical philological approaches.
In the Beyond 2022 project, information is added to the knowledge graph by two means: a manual aggregation of data performed by historians and an automatic extraction process performed by a Natural Language Processing (NLP) pipeline. Historians should be able to add data to the knowledge graph in a manner that is intuitive to them. We believe that we have the greatest chance of engaging historians to curate information in our knowledge graph if we enable them to work with the tools that are already familiar to them. This led us to design our data capture process around the use of prescribed spreadsheets, which set out the information that should be captured by a historian seeking to add information to the knowledge graph. Due to the rapid progress the project is making, and the wealth of information being discovered in our partner institutions, it should be reasonably easy to facilitate the capture of new types of information. Again, the use of spreadsheets is helpful here. Expanding our data capture process to encompass new entity types is simply a matter of designing a new spreadsheet and developing a mapping between its columns and the knowledge graph. TiNT’s primary mode of data aggregation is through the use of custom editorial tool nested within the NTVMR, an approach that is more labor intensive, yet more straightforward in terms of developing a knowledge graph and other interpretive tools. The data in TiNT is also highly vetted, and the scholar responsible for producing a titular profile for a particular manuscript is known to other researchers using the data.
For Beyond 2022, the information generated by an automated process should be clearly distinguished from manually aggregated, vetted information added by a researcher. This partitioning of information is facilitated by the design of a provenance model based on the Prov-O ontology. [23] Being standards based and machine-readable means that the provenance information (who added/changed, when added/changed, how added/changed) related to information in the knowledge graph itself is easily integrated with discovery crawlers, tools and environments from other LAM (Libraries, Archivists, Museums) environments and scholarly initiatives. We intend to make as much provenance information available to researchers, in part by giving credit to team members who produce metadata and manuscript profiles for the TiNT project.

As obvious as it may seem, the structure of the knowledge graph itself should be such that it is possible to respond to the kinds of queries we anticipate will be executed over our triples. During these early stages of development, we have performed an information-gathering task by asking historians to submit “competency questions,” which they believed would be asked of the knowledge graph. Examples of competency questions elicited from the subject matter experts for Beyond 2022 include:

  • “Are there Birth/Death/Marriage Records in Dublin in 1954?”
  • “Who was the Bishop of Galway in 1882?”
  • “Were there any [surname] within 20 miles of [place] just before the famine?”

For the knowledge graph, we adopted and extended the Web Ontology Language (OWL) implementation of the CIDOC Conceptual Reference Model (CRM) [24] because it was considered to be the most appropriate to meet the needs of the project. The historian generated spreadsheets (e.g., for people and places) are stored as Comma Separated Values (CSV) files and transformed into a Resource Description Framework (RDF) graph representation via R2RML, [25] a W3C Recommendation to transform relational data into RDF via a set of mappings. We avail of R2RML-F, [26] which allows us to access the CSV files as relational databases. The generated mappings prescribe how the data contained in those spreadsheets should be transformed into entities and relationships according to our CIDOC-CRM ontology. In later phases of the project, historians will gather and compile CSV files from other sources (e.g., historical census data), which need to be transformed into RDF according to the same ontologies. The use of R2RML thus allows for a scalable and declarative ingestion pipeline. If TiNT adopts a knowledge graph approach to model interconnections between entities (e.g., people and places) to aid scholarly exploration and knowledge discovery, then adopting a similar approach where declarative standards-based R2RML mappings are used to manage and control the ingestion of information into the knowledge graph (perhaps by independent actors or NLP processes) can benefit from the experience of Beyond 2022 project.

An important requirement for a knowledge graph approach to be accepted is that historians should be able to scrutinize the contents of the knowledge graph, assess the quality of its contents, and modify triples where they are found to be erroneous. As fallible humans, errors will undoubtedly occur in the spreadsheets (or in the editorial choices of the TiNT project team). It should be possible for historians to check the triples generated by the content they have uploaded in order to ensure that no mistakes have been made. The semantic technologies adopted in this project allows the team to adopt existing tooling that can process RDF. While the Semantic Web technology stack, including the SPARQL query language for RDF, might present too steep a learning curve for certain historians, tools such as Ontodia enable historians to visually explore the knowledge graph. [27]

Historians in Beyond 2022 have already seen how one can easily discover the people associated with certain offices (e.g., “chancellor of Ireland”) and the overlap between different offices using Ontodia. In Figure 6, we demonstrate how one can discover people that have filled several positions, with one person being associated with all three, in this case, a certain Alexandre Balscot. [28] The URI of that person can then be used to retrieve a page with additional historical information. [29] Tools like these allow subject matter experts (and other users, for that matter) to discover information that might have taken a lot more time going through manuscripts.

Figure 6. Knowledge Graph for Alexandre Balscot in the Beyond 2022 Project.
Beyond 2022 project’s adoption of the semantic web knowledge graph approach to model people and places and their interconnection that appear in and across copies of the Public Record Office of Ireland (PROI) manuscripts destroyed in 1922 is already showing benefits in aiding the discovery of links and insights heretofore difficult to achieve given lack of tools. Their experience in the process of development and necessary processes and tooling involved in developing, deploying, and maintaining the knowledge graph, should provide useful input to TiNT as it sets about to design a digital workspace that suits its ambitions. This approach offers a new way to visualize information embedded in the New Testament’s manuscripts and to perceive complex connections within the tradition that traditional scholarly approaches have not been able to explore. In particular, Beyond 2022’s focus on using web standards and avoiding proprietary approaches provides it with the ability to easily publish in a manner easily explored by scholars, and easily consumed by and integrated with other LAM systems and environments.

3. The TiNT Project’s Editorial Tool

As previously outlined, many of the manuscripts of interest to the TiNT project have already been digitized and are readily available for enrichment with metadata. The addition of new layers of annotations could be used to form the basis for a knowledge graph to enable deeper exploration of the manuscript images. The first step to providing a seedbed for a knowledge graph is to markup the digitized manuscripts with annotations is a suitable VRE. The New Testament Virtual Manuscript Room (NTVMR), an open source and customizable VRE is already an established workspace within the research community. As outlined above, the NTVMR provides a facility for teams to collaboratively research, edit, and analyze digitalized manuscripts. As part of the TiNT project the NTVMR will be amended to allow manuscript images to be further annotated with feature descriptors and other pertinent identifying information. These feature descriptors constitute our editorial tool since it will be the primary way that we will produce project data. This data will be stored in a serialized format ready to be incorporated into a knowledge graph, thus enabling the kind of deepened, intuitive data exploration only made possible by digital archives or VREs.

As part of the TiNT project, the ADAPT Centre has been tasked with enhancing the NTVMR editorial tool to better enable archivists’ enrichment of digitalized manuscripts specifically required for this project. The NTVMR already includes functionality to directly add new tags and labels by hardcoding in new label fields. The TiNT project specifies eight high-level information labels with several further nested subcategories. For instance, Title Type includes options for Inscription, Final Title (titulus finalis) and further nested specifications within each option for prologue titles, subscription titles, etc.

Figure 7. Bespoke TiNT dataflow.
As outlined in Figure 7, in order to avoid wholesale repetition in the code ADAPT is creating a lightweight JSON based framework that will abstract away much of the code repetition. The framework will declare the required annotation fields, then generate the user input fields so archivists can annotate aspects of manuscripts. When the manuscript images are annotated, the data will be persisted in the database serialized as JSON providing a foundation to facilitate later uplift to a knowledge graph. The intention of opting for a more dynamic implementation approach rather than hardcoding in the labels is to make the tooling easier to maintain and more readily extensible for enhancement as part of future research activities in the TiNT project or by the wider research community.

This editorial tool is key to the data gathering aspect of the TiNT project, and its relationship to the larger NTVMR VRE and the details of its information labels and their subcategories continue to develop. Nonetheless, this model for data aggregation and retrieval enables efficient editorial engagement with manuscript images and builds a unique set of information that can inform the critical studies of the TiNT project’s core team members. It is clear that for the TiNT project to meet its stated critical goals there must continue to be close cooperation between philologists, computer scientists, and developers, as well as between institutions and academic centers. (The shared authorship of this article is clear evidence of this reality.) To this end, we hope that this article has laid the groundwork for future collaboration, especially the further development of digital tools for TiNT and the continued growth of the NTVMR as a VRE devoted to the study of the New Testament’s Greek manuscripts in all their dimensions.

