Inseri as a Potential IT Framework for Research Projects in Humanities: The Sanskrit Manuscripts Project as a Test Case

Olga Serbaeva, with revisions by Hans Cools, Jan Clemens Stoffregen, and Roberta Padlina, Universities of Basel and Zurich
Dedicated to the memory of Hans Cools who suddenly passed away in April 2021 in the middle of his thorough and fascinating work on Semantic Web Technology
Some of the most tedious technical problems that a scholar in the digital humanities faces today is connecting multiple software solutions from various origins, supplementing the missing ones, and making the whole into a consistent and stable workflow solution. [1] The subject of the present article, Inseri, is neither a special tool for manuscript transcription, nor an easy-to-use TEI editor: it is a framework that can hold those software pieces together in a precise way and help the researcher with the dataflow from one step of the project to the other. The aim of the article is to present, on the one hand, the philosophy of Inseri, and, on the other hand, to go through the typical stages of a manuscripts-based critical edition project, following the flow and the transformation of the data.

1. Challenges of application of IT in Humanities

A scholar in the humanities goes through the literature without much IT support, formulates a hypothesis based upon what has been found in the secondary literature, and tries to advance the science by analysing most often manually a source or a group of sources. The output of the whole process is a published book, which, for every subsequent scholar will be just one more book on the table to go through, meaning that output is not essentially different from the input. The reuse of material is minimal and comes down to brief citations. There is no control of supposed common ontologies, [2] nor a possibility to influence the building of those ontologies.

To redress this situation, the humanities should start to follow the path of “precise sciences,” [3] with their strong emphases like collaborative character of research, the main actor is not a scholar versus a source text, but a team of scholars including IT versus the totality of the available data, of which the most relevant is chosen, not retyping the pieces of texts, but querying the databases, and not writing one more “stone,” but redefining the discipline’s ontologies in the global linked data (Figures 1 and 2):

Figure 1.
Figure 2.
In the humanities there is already a tendency towards Open and Linked Science, with available and verifiable sources, methods, calculations, visualisations, and the output, with all that linked in real time, supported and updated on the IT side. In this model the reuse and valorisation of the previous results is directly included, and science remains alive, albeit online. Nevertheless, in Switzerland, with some rather rare exceptions, there exists a lack of training in IT in the humanities, where data analysis is done mostly by hand. The desired IT development is unsupported by the university IT services, at least in humanities departments. IT involvement requires separate funding and/or presentation of the project as “interdisciplinary.” But how does using R or Matlab for the most basic calculations and visualisations constitute true interdisciplinarity?
In the present model scholars cannot influence software development. The chosen software often cannot be integrated into the later development of the same project. The required analysis is often not done because of the lack of money, lack of IT support, and data conversion issues. Another big issue is the treatment of the data: outdated formats cannot be reused by the same project, let alone shared after the end of the project with others. Around 90% of the on-going project data is non-public [4] and it is not reused after the end of the project, even if the final publication covers only a part of the work. In the humanities, the data is still not digital, not even half of it. Existing digital data cannot be easily linked, used, shared or reused, including also e-books, e-journals, and databases.
There is no culture that valorises the preliminary work, and the mammoth tasks of cleaning up the primary sources is rarely documented. Important data stored on the hard drives disappear together with the project members. It is possible to estimate that 90% of routine work, such as data preparation, will not be shared, and that the next scholar addressing the very same material will have to start anew. Imagine such situation in mathematics, physics, medicine, or biology. In a lucky case when the project data takes a semi-standardised form of an SQL-database or an XML file, the lack of documentation about how it was built, its encoding procedures, leads to the following: the databases simply die, the data cannot be recovered, a person who has not worked for the project in the active phase won’t be able ever to analyse the data.
There is, however, an even deeper issue concerning the integration of the research results into the science fields. There is no direct link between the real discoveries and the “discipline ontologies,” while this constitutes a real issue! The ontologies used in the project, i.e. how a scholar believes the discipline is constructed, must be spelt out, and the result of the project must come back to the original ontology and show what new knowledge has been added at this structural level.
To summarise, scholars in the humanities would benefit from IT support, designed to help with technical issues for the duration of a project. [5] And that support should be standard, stable, free of extra charge, and attentive to the projects’ needs. Such support, that can evaluate the utility and the price of the external software solutions, help with data conversion and data preservation and reuse, and receive government funding because it helps the scholars in Humanities to cross the IT gap as it currently exists. Besides this, IT courses should be proposed in every humanities discipline on a regular basis in the near future.

2. Existing IT solutions for Digital Humanities in Switzerland and some successful projects

Many Swiss universities have incorporated various organs dealing with the Digital Humanities (DH). To mention just few, LADHUL at the University of Lausanne, [6] includes teaching from BA to MA level, DHLAB in EPFL, [7] Masters in Digital Humanities in EPFL, [8] DH Basel, DH Bern. University of Zurich is hosting many Digital Humanities research projects, in PhF and Computer Linguistics in particular, there is also SARI [9] and DSI. [10] There are a large number of successful critical edition projects supported by NIE-INE, universities of Basel, Bern, and Zurich. [11] The University Libraries in Switzerland created and provide access to e-publications, creating digital access to virtually any item possible. The project called e-codices, [12] hosted by the University of Fribourg, makes the manuscript materials of excellent quality available online free of charge. All the addresses above are a mine of information on digitally available resources, IT training, DH colloquiums, publications, and successful projects.
Despite this blooming development, only the long-term storage of scientific project data has been tackled, thanks to the RDF-based KNORA platform, [13] developed by DaSCH. [14] So far, the successful projects have to produce the funding for the needed IT solutions, often mid-way, and the data migration done within the timeframe of the ongoing project impedes the research proper. A new solution is clearly necessary at the national level, and there is already a team working in this direction.

3. NIE-INE and its product Inseri as an innovative response to the open challenges

Inseri, a scientific framework developed within the NIE-INE project, has proven its potential to resolve some complicated IT issues for a large variety of academic projects, now mostly dealing with academic digital editions. Inseri is open to both big projects, supported by various national mechanisms, and smaller ones. Inseri consists of software, links, and connections, but it also has a team dedicated to its development. [15]
Inseri started in November 2017, in response to the insight that projects without programming staff don’t know how to publish their research output (digitally) and maintain a state-of-the-art online publication that contains statical and dynamic content. Swissuniversities [16] stated in the approval of the second project phase that NIE-INE should explore the suitability of Inseri for other science branches. Inseri is now offered by the UZH (department s3IT) as a service. The technical and other documentation can be found on Github. [17]
Inseri is organised in an innovative way, oriented towards working with online resources within a single platform, and it integrates various types of them with ease (IIIF, SPARQL queries, RDF), and it can bring the project directly into the context of the Linked Data. Each Inseri app (more than 30 have been developed so far) supports multiple inputs, for example, using a locally editable JSON and a JSON-tree coming from a third-party online resource, and the user can control how the inputs are connected based on their needs. The same input can be addressed by multiple applications, for example for visualisations. There is no copy-paste, upload, reformatting, and printing out—a classical workflow in the humanities. Instead that structure is replaced “live” queries. Users can access all their project data in one place, be these e-books or scanned manuscript pages, and comment on them. This is also the best approach to the data traceability (not to copy, but to link to the original), resolving potential legal issues.
The community includes both IT specialists and scholars in the humanities who create and upgrade IT solutions together, in a constant communication with research projects. The projects are updated on the new IT solutions, and they get necessary advice concerning the software that would fit their needs the best. Project members can learn how to use those solutions in a supported way.
Inseri is an open framework, which the projects can join and develop. The results, data, codes are valorised and are conceived as reusable between the participating and future projects, for example via published pages and templates. The choices made by the NIE-INE team tend towards independence from any commercial or proprietary software, yet they can integrate virtually any data format. The projects remain the masters of their data at any stage, and they have the last say on the choice of software and data format, and they can change their wishes in the process.
As for the workflow, NIE-INE and Inseri allow new people to join quickly. The team has also found a variety of innovative solution for data conversion, in particular from XML to RDF, and from Word to XML. [18] Although Inseri is still “work in progress,” and there are some bugs to be resolved in “under development” applications, the potential of this framework is significant.
Another strong part of the NIE-INE project is the implementation of the Semantic Web Technology (further SWT), mainly by Hans Cools, that merits an article of its own. [19] Inseri and SWT complement one other: SWT requires the project data to be in RDF, while Inseri can communicate with RDF data with ease. On the other hand, the SWT opens a whole new world of data enrichment, data clean-up, data analysis, and resulting visualisations via Machine Reasoning sessions that can be best presented in Inseri.

How does it all change the way people do DH? To demonstrate the data flow and reuse in the humanities, the following two pictures were prepared (Figures 3 and 4). The first is the “traditional data flow” and the second is a data flow within NIE-INE, based on Inseri and SWT. In the traditional mode (Figure 3), the final product of the scientific work is not different by its very nature from the original input, i.e. a text source.

Figure 3.
The whole palette of facilitators of the work, such as various databases, indexes, ontologies, IT solutions, just does not make it to the final product, to the paper publication. And these aspects can only live if the contribution is expected from the project at this level also, and the final product is not a brick of a book but a living project at a particular and well-defined stage of its development. To put it in simple language, a project in the humanities is a tree lovingly grown from a seed, and as soon as the team obtains the fruits, i.e. a printed books or articles, it chops down the whole tree (IT and locally developed software). The change we propose consists in growing fruits continuously on the same tree, sometimes attaching a branch from another tree (true interdisciplinarity), valorising and preserving every stage of the project growth, not just final fruit.

This very philosophy has naturally grown within NIE-INE and Inseri (Figure 4), where the IT solutions and databases are reused. Each project contributes also to the IT development by providing the questions and the problems and commenting on the proposed solutions.

Figure 4.
As it is evident from this model of data flow, the “final product,” in its new understanding, includes almost all parts of the tree, not just published outputs. Moreover, if the project is finished, but its tree (long-term IT support) still exists, the project can be revived and upgraded quickly. Outside of the framework, hidden on some esoteric server, the project data is as good as non-existent, and it does not contribute to the development of a given discipline.

4. Case Study: Stages of a manuscript-based academic project within the Inseri framework

The whole workflow is divided into seven distinct areas (Figure 5). The project might use all seven or only a few. These areas are conceived in such a way as to cover the totality of the academic work, from raw data collection, analysis, and, finally, publication. What follows shall be described in a view to implement Semantic Web Technology, which requires multiple preparatory steps, namely data conversion into RDF.

Figure 5.
The “Add” aspect covers the workflow from the raw data upload or queries, from adding metadata and managing bibliography to the formulation of hypothesis and methods. Inseri here acts as a IIIF viewer including a comment faction, DropBox, GoogleDrive, or OneDrive. The “Admin” stage is about administering the project, including such tools as adding users, defining access, etc.
The “Enrich” part covers transcribing and various kinds of TEI/XML tagging and annotation of the raw data, which becomes transformed from images into texts and numbers. Inseri can be used as TextEditor, or MacDown, or any HTML editor and viewer.
The “Analyse” block consists of various tools made for human and computer-assisted analysis (for example, Corpus Linguistics Methods or Statistical Analysis), such as the comparison of two or more images, text versions comparison, finding textual parallels, colliding variants in a view to produce a diplomatic/critical edition. Inseri here integrates Python and R codes via microservices.
The “Visual” stage is already well-developed in Inseri, which has more than twenty different visualisation applications, all of which can directly use JSON data. This process is comparable to PowerPoint, Keynote, or Prezi for slides and to Qlikview, R Shiny, or Kibana in terms of accessing databases visually.
The “SWT” module, includes the first step formalisation, that is, a local ontology [20] that then becomes a part of a general ontology (2nd step formalisation) in order to enable Machine Reasoning. [21] This step offers scholars an opportunity to formulate very complex queries that are beyond the capacities provided by the search queries in relational databases, for example. Inseri does not provide the environment to create the RDF, but can query it by integrating SPARQL, which can then be used to visualise ontologies.
Finally, the “Publish” area provides various options for generating the output that can be immediately used for producing dynamic webpages or presentations. Inseri here serves as a full-fledged dynamic front end, comparable to WordPress, Squarespace, and Wix. Compared to other VREs, Inseri is open to integration of any desired components, but it also lacks the specific features of such VREs for manuscripts which appears to be a good choice for making manuscript transcriptions and tagging. [22] Ideally, the output of READ should become an input for Inseri, and more tests are to be done in that direction.

Sample procedure, A-Z in a few examples

Suppose that the scholars have collected the digital images of necessary manuscripts (with permissions, etc.), have compiled a bibliography with all available sources and literature, part of which is available as open source, while some books are online with university access, and some things have been scanned for the project. The aim, timing, roles, and funding has been sorted out. The head of the project logs into Inseri, creates a project and the user groups. At the same time a webpage with the project description, aims, timeframe and other general information is set. In what follows I shall test Inseri with the real data of my research project on the Sanskrit manuscripts and the Jayadrathayāmala. [23] I will stop only on those features that would be difficult to achieve for a scholar working without Inseri.

Step 1. Data collection

Inseri provides both the query options if the resources are available online, for example IIIF or any other data with RESTful APIs, such as ARKs, and the simple upload from the user’s desktop. There are about 70 templates already available to query various IIIF institutions. The scholar can work, for instance, with millions of scanned manuscripts pages directly from within the Inseri project by changing a single URL (Figure 6.1). [24]

Figure 6.1. © Public Domain Mark 1.0, Staatsbibliothek zu Berlin.

The environment is similar to that of IIIF viewers but can be fully customised. The user can navigate from page via the arrows in the bottom-left corner of the apps (Figure 6.2) or by using an extended search to have an overview of all pages of the manuscript (Figure 6.3).

Figure 6.2. © Public Domain Mark 1.0, Staatsbibliothek zu Berlin.
Figure 6.3. © Public Domain Mark 1.0, Staatsbibliothek zu Berlin.
The user can comment on the content of a book and search for one’s own comments. The comments are linked to a precise page of a given resource, and by clicking on comment the corresponding page/folio of the resource opens (Figure 6.4).
Figure 6.4. © Public Domain Mark 1.0, Staatsbibliothek zu Berlin.

Step 2. Administration

This part has not yet been developed within Inseri beyond access control and groups, but it should include address book of the project participants, calendar and to-do lists linked to DataControlApp, the minutes of meetings, storage for admin documents, a simple budget management application, and a password manager. Ideally, Inseri should offer a secure communication tool such as video conference (Zoom, Talky, Jitsi) or project chat with the possibility of file exchange (Slack).

Step 3. Enrich Data

Here the “raw data” (images, for example) in various formats is transformed into stable and clean data (transcribed texts), ready for manual, semi-automatic, and automatic analysis. [25] For the transcript the Text Editor app was used (Figure 7.1).

Figure 7.1. © Public Domain Mark 1.0, Staatsbibliothek zu Berlin.

The Plaintext Viewer app enables the publication of the transcript immediately on the Inseri page (Figure 7.2). In the enrichment part we can also tag according to TEI standards all geographical names, references to kings, time and place, religious traditions, names of people and names of other texts, according to the specificity of the text and planned analysis. [26] When those selected entries are indexed, they are linked to a widely acceptable spelling of proper names and, whenever possible, linked to external data, for example GND and other “authority databases,” including those complied by the projects. All that directly contributes to Linked Data, besides helping to define with precision the date, provenance, and the position of the text being edited.

Figure 7.2.

Only having done this resource preparation, we can already begin to speak about an edition proper. Edition is a process involving two aspects of working with the text: the enrichment via tagging and refining via analysis (Figure 7.3). Both processes lead to corrections, conjectures, and reformulations of the notes; thus the final text cannot be produced until the necessary number of the upgrading rounds has been completed. [27]

Figure 7.3.

Step 4. Data Analysis

Inseri includes as micro-services semi-automatic procedures belonging to the domain of Corpus Linguistics (Python implemented, R under development). By means of (Computational) Corpus Linguistics, we eliminate the remaining incoherencies from the text, basically transforming the text into a stable database. Inseri, however, does not suggest any particular data format; therefore, RDF shall be mentioned as one of the possible options. In this case, every single element of data (line/word/text depending on the projects) has its own identifier (IRI), [28] and those IRIs can be used to search for and mark the parallels, and other traces of influences within the other texts. The whole network of internal and external (to the text) connections that were discovered, is made available openly so that the next project can build upon those networks already manifested and visualised. Potentially, a whole literature belonging to a single language can be mapped with its links and historical developments within a few years, and every new e-text coming up just enriches the structure and makes it more precise.

For the Sanskrit Manuscripts Project, I have tested the visualisation within Inseri of the result obtained outside of it (with R). The results of comparison of the 29 Sanskrit texts in relation to their internal and external parallels related to their length and thus expressed as coefficients, were originally stored in Excel. This was converted into JSON, and pasted into Inseri, immediately enabling us to obtain dynamic visualisation (Figures 8.1–4).

Figure 8.1.
Figure 8.2.
Figure 8.3.
Figure 8.4.
Figure 8.5.
It is worth noting that all of these steps can be accomplished even by those without any serious IT training, and Inseri offers tutorials and temples to boot. The IT support here becomes handy when there are data conversion issues. To summarise, the steps above serve, in a certain sense, as preparation for the implementation of the Semantic Web Technology, which, at present, can rarely be done by a lone scholar without expert assistance. But before going into SWT, we shall describe the importance of “Visualisations” and their place in the whole process.

Step 5. Visualisations

The visualisation is not a step properly speaking, but a transversal visual statistics line that can be of use from the earliest stages of the project. For instance, already in the data collection and enrichment phase, the user can see, for example, that on a given date he has uploaded 99% of planned files, and tagged 53% of the pages uploaded, etc. To have this visual representation might be important not only for the scholars, but also for the funding bodies who can thus easily track the progress according to the provided research plan.
We should not forget the fact that for conferences and publications users often use and reuse the same images, schemas, graphs, etc. The multiple input options of Inseri with online publishing of any selected page allow the user to update those visualisation “live”; they do not need to be redrawn because of format incompatibility, for example. The visuals created and stored in Inseri are done only once, and these can be modified and reused as many times as necessary and immediately published online, enabling the interested scholars to access the up-to-date charts via a link and not a printout of conference PowerPoint slides. Besides demonstrating what has already been done, the visualisation tools also enable the discovery of deeper structure of the project, that is its ontology. In a larger sense, visualisation is also a method for dealing with big data in the humanities. [29]

Let us provide two simple examples of visualisations (Figures 9.1.1–2 and 9.2.1–2). It is a visual representation of the length of the chapters in verses, and of the manuscript coverage of the same text, Jayadrathayāmala, ṣaṭka 2, both of which can be easily sorted by value.

Figure 9.1.1.
Figure 9.1.2.
Figure 9.2.1.
Figure 9.2.2.

Step 6. Semantic Web Technology and machine reasoning

The SWT is presented here for historical reasons as an independent step, but ideally it should, like visualisations, start from the very beginning of the project. Only projects that already have the data in RDF can be “admitted” to this step, and it is best if the results of preliminary analysis (e.g. CCL) can also be formalised. In this sense, the SWT is a “cerise sur le gâteau” for an edition project that still does require a lot of additional work and funding. But what does this additional step bring?
Suppose a project ends with RDF data in a triple store, incorporating the data model chosen by the scholar and a big part of the results of non-RDF based analyses. Once the project ontology has been created, the participation of a professional ontologist is the best option for any further steps. The raw data model of a given project in RDF (first step formalisation), is then transformed into a generic ontology, i.e. the data objects used in the project are linked to the objects existing in real world from within of a largely accepted logical frame.
At the same time, it allows the scholar to ask very complicated questions that would require massive human and computer resources to answer without the machine reasoning. With SWT one can bring the research project to a different level (compare to a non-RDF based project) in every single step that we have listed for Inseri. Let us briefly explain this important point. Even collecting data can be formalised from the start, that is, every piece of data, a manuscript, a verse, a citation, can be called as such and receive an IRI that will accompany this information for the whole duration of the project and even beyond it. This solves the problem of infinite corrections, enrichment and corresponding tracking of changes and versioning. The objects, i.e. “boxes” or “files” to put the information in can already be imported from the ontologies created by NIE-INE, and these can be enriched and customised. At the Enrichment step, every tag, every color, and every commentary can also be put in a fitting RDF “box” from the start. The earlier it is done within a project, the richer the links to be discovered. The whole Data Enrichment process gains a new meaning with SWT: the project data become linked not just within, but to the whole world of already formalised data. The project becomes a piece linking data together and a part of the universal RDF open network.
As an example, I have tried visualizing a simplified ontology describing the classes of Sanskrit Tantric texts listed in Jayadrathayāmala, ṣaṭka 1, chapters 35–45. This ontology was first scripted in Protégé, then uploaded on GitHub, and used with an Inseri application called SPARQL-Visualizer. A tiny part of visualized links fit the screen, giving an idea of complexity of the interrelations of some 500 texts and traditions (Figure 10).
Figure 10.
However, the strongest plus of SWT is its automatised analysis with machine reasoning, which proved to be of great help in automatic data cleaning and the elimination of incoherencies. [30] As for reasoning, it is possible by taking into account a multitude of connections of a personal, historical, or geographical nature, to create a reasoning session about the likely date and the provenance of the text. Or, by simultaneously taking into account various interlinked historical factors, we could suggest that the date of a given manuscript is actually expressed in a different calendar system than previously thought. Actually, provided that every piece of data has an IRI, it is almost more effective to do CCL and statistical analysis via machine reasoning, which could, in the future, replace a good part of human-made and semi-automatic analysis. [31]
Although it might appear complicated to create a reasoning session for a precise project from a non-specialist’s point of view, it will also be possible to create templates within Inseri, and the user would be able to run the sessions independently via a machine reasoning micro-service. These sessions themselves, the templates, and, of course, the results of the sessions done for the NIE-INE projects, constitute cutting edge DH approaches. They modify the way science in the humanities has been done so far, making it, at the same time, open and reusable.
But there are even deeper implications. [32] Potentially any project can have an impact on the discipline’s ontology, and the fact that there is this deep ontology link is a step towards assuring that the probable impact won’t get lost, but the results might enrich or even change the discipline ontology. Yet with so many ontologies being created and maintained, [33] how many scholars in the humanities who are supposed to contribute to the process as “domain specialists” have ever heard of “Semantic Web”? It is not surprising that even a project combining the humanities and IT cannot influence an ontology being created now. [34] This link simply does not exist for the humanities, and the implementation of the Semantic Web Technology on a large scale is the only way for the humanities to have a direct impact upon the “universal” ontologies, and thus to be included into the Linked Data as an authority.
On the other hand, it is very clear that humanities projects are unlikely to be able to reach the SWT step all by themselves, and thus Inseri, if it integrates RDF from the start, contributes to the link between the humanities, SWT, and machine reasoning by assisting the projects in data transformation to the level of a formalised ontology.

Step 7. Sharing the project results, publishing

Inseri assists the user in generating various kinds of output, way beyond classical print books. The user can do full data extraction, including the RDF, formatting and converting the data for the preparation of the publication of a book or an e-book, data extract for long term archiving, publishing web-applications, (e.g. Wordcount in Python), and all resulting data visualisations. Inseri provides an interactive training platform for the new projects, with codes running live, making scholars in the humanities more and more independent and aware as for the choice of frameworks and software.

Here is an example of a complicated web page layout bringing together the query to IIIF (E-codices), the image (Figure 11), and a description of the image made by a researcher. [35] The globe sign in the top-right corner signifies that the page has been published. The scholar can thus combine the “tiles” containing the materials from different provenance and of various formats with utmost ease.

Figure 11. Cologny, Fondation Martin Bodmer, Cod. Bodmer 708: Kedārakalpa, f. 44v ( CC BY NC.

5. A new academic paradigm in the humanities?

The humanities are becoming truly digital. Scholars must learn IT tools, and the Inseri framework is the best non-invasive way to do this. Inseri allows direct reuse of existing digital data and facilitates the creation of new resources that correspond to up-to-date standards. Inseri is the most useful existing service for the humanities at the national level, with a potential of an important international academic impact. It is an example of the living, long-term IT support, which is the only guarantor that the data of the projects can live, be used and reused, updated, and upgraded in real time. It is a place to teach DH projects practical IT applications so that they can make educated choices regarding software. It can be used as a sandbox for trying out various applications with real research material. Frameworks like Inseri can contribute to the formulation of a new research paradigm in the humanities, namely the shift from traditional knowledge to a Knowledge Management system. [36]

Figure 12 is a simplified representation of the Inseri Framework, as of October 2020, adapted to edition projects, but it can virtually incorporate any research project regardless of the discipline. Only minimal details and interconnections are given here, and the full arrow line stand for the existing and working connections, while the dotted and other incomplete lines represent areas for further development.

Figure 12.
On the left, the column “Step” outlines the most important steps of any research project, and the column “Content” lists some selected tasks done within a given step. For clarity, the “Editorial process” is placed between “Enrichment of the primary sources” and “Analysis” to reflect the domain the best developed within NIE-INE and thus within Inseri. Each of the steps of the project provides material to generate the statistical data and visualisations, which serve to control the data, the workflow and the progress of the project. These visualisations can further be directly integrated into the final step, i.e. various forms of “Publishing.”
The steps are conceived as following one another from top to bottom, and each step produces an output (Project output column) that becomes an input for the next step. An IT development corresponds to each step of the project output, reflecting the state of affairs in October 2020.
As it is evident from the schema, to reach the full-blown implementation of the SWT and machine reasoning, the project data must undergo multiples steps, even RDF data would require further formalisation. But going an extra mile in formalising the project data bestows a very rare fruit in the humanities which is a direct influence upon the discipline and even general ontologies that are still being built.
The project output (the whole central column, consisting of four sub-columns) of a research project within the Inseri framework thus lies well beyond a published book, including also the development of workflows, IT solutions, conversion tools, visualisations, ontologies, and even machine reasoning sessions. It is linked directly to everything else via “Authority Data” introduced at the project level already at the step of “Source Enrichment,” and it has the opportunity to have a direct impact on the domain and general ontologies and to become a part of Linked Data.
To summarise, the Inseri framework has the potential to allow projects in the humanities to create their own web application environments without forcing scholars to produce the hard coding. The framework provides a workflow that enables an academic project from the primary sources upload to various kinds of analysis to digital publication. Compared to the traditional academic workflow in the humanities, Inseri offers options of control and a precise overview of every single step of the process because it keeps all data in one place, documenting what has been imported and what is a real discovery, thus leading to more clarity and better data protection. The framework is a perfect place for collaborative research projects, where groups can work on the same data and the final project data can be easily reused by other academic projects. In brief, Inseri creates the missing link between the humanities and the cutting-edge IT.
By providing a research frame in which all steps of the research process are valorised, documented, and interlinked, Inseri contributes to the change of research paradigms in the humanities from the traditional forms of knowledge to the Knowledge Management Systems.


Abecker, A., and L. van Elst. 2009. “Ontologies for Knowledge Management.” In Staab and Studer 2009:713–734.
Bühnemann, G., and M. Tachikawa. 1990. The Hindu deities illustrated according to the Pratiṣṭhālakṣaṇasārasamuccaya. Bibliotheca Codicum Asiaticorum 3. Tokyo.
Burghart, M., ed. 2017. Creating a Digital Scholarly Edition with Text Encoding Initiative. DEMM-2017.
Carlucci, D. 2019. “Beyond Lessons Learned: Opportunities and Challenges for Interplay Between Knowledge Management, Arts and Humanities in the Digital Age.” In Handzic and Carlucci 2019:241–252.
Handzic, M., and D. Carlucci, eds., 2019. Knowledge Management, Arts, and Humanities: Interdisciplinary Approaches and the Benefits of Collaboration. Heidelberg.
Staab, S., and R. Studer, eds. 2009. Handbook on Ontologies. 2nd ed. International Handbooks on Information Systems. Heidelberg.
Stefanowitsch, A. 2020. Corpus Linguistics: a Guide to Methodology. Textbooks in Language Sciences–7. Berlin.
Stevens, R., and Ph. Lord. 2009. “Application of Ontologies in Bioinformatics.” In Staab and Studer 2009:735–756.
Sure, Y., S. Staab, and R. Studer. 2009. “Ontology Engineering Methodology.” In Staab and Studer 2009:135–152.


[ back ] 1. This article is written from a perspective of a scholar in the humanities who adopts the IT methods, rather than from the point of view of an IT technician. Our original idea was to test Inseri as a frontend for a Sanskrit manuscript project, which is centered on the Jayadrathayāmala. The project is independent from Inseri, but the real project data were used for testing the framework. I would like to warmly thank Dr. Claire Clivaz and Dr. Garrick Allen for their thorough editorial work.
[ back ] 2. Ontology is a formalised knowledge about a discipline.
[ back ] 3. The main lines of knowledge construction in Bioinformatics domain can be transposed to Humanities, see the schemas in Stevens and Lord 2009:737–741.
[ back ] 4. Calculation based on author’s working experience for multiple research projects, 2015–2020.
[ back ] 5. Science IT departments of the UZH do it already, but it has not yet reached the humanities.
[ back ] 18. Developments by Reto Baumgartner applied to some of the NIE-INE projects.
[ back ] 19. In preparation by Olga Serbaeva. More on SWT in Step 6 below.
[ back ] 20. (extensive documentation on SWT and Machine Reasoning),
[ back ] 21. Available to the Inseri users through a dedicated microservice.
READ includes tools for palaeographical analysis, transcription, wordlists with links to dictionaries, tagging, and various exports, tested by the author in December 2020.
[ back ] 23. The project is described here:
[ back ] 25. The up-to-date theory and standards of scholarly editing, including these preliminary steps, are described in Burghart 2017.
[ back ] 27. See Stefanowitsch 2020:15, speaking about the edition problems in terms of “readability” and real world “validity.”
[ back ] 28. IRIs allow passing to the next step of the analysis, namely the Machine Reasoning, see step 6 on that.
[ back ] 29. On that see Carlucci 2019:242–246. This is also visible in the results of Delille concerning the “life of citations” research project that opted to use both Inseri and SWT.
[ back ] 30. Delille project within NIE-INE.
[ back ] 31. A separate article on machine reasoning applied to the Sanskrit manuscripts and the Jayadrathayāmala is in preparation.
[ back ] 32. Inspired by orthogonal process depicted at Sure, Staab and Studer 2009:137–138 in the context of describing Knowledge Meta Process. The loops of verification on the p. 139 illustrate the pluses of SWT application, while on p. 143 one can find an illustration of how the ontologies themselves can change, under the influence of the results of academic research.
[ back ] 33. In NIE-INE alone about 50 general ontologies have been created mainly by Hans Cools and Roberta Padlina.
[ back ] 36. More on that in Abecker 2009:717.