1. Pliny the Elder and statistical tools
1.1 Pliny the Elder and the language of science
1.2 The L.A.S.L.A., HyperbaseWeb, and the second book of the Natural History
A recent study by Poudat and Landragin  offers a complete description of methods and instruments available for corpus-based research, showing the number of options available to every scholar, and giving indications about which methods are preferable depending on the nature of the research. I therefore recommend this reading for a more complete description of corpus-based research, while I will focus only on tools which are useful for this specific study.
- The “Search” instrument allows users, not only to find a specific form surrounded by a certain span of text (that can be selected by the user), but also to search for all the forms corresponding to a certain morphological analysis (for instance all the substantives of the second declination) and all the forms deriving from a certain lemma.  The user can also look for sequences combining forms, lemmata, codes, and unspecified words.
- Theme or specific co-occurrents
- This function allows the user to find the co-occurrents of a certain form, lemma, or morphological code. The user can indicate the span of text considered for co-occurrence: a paragraph might be chosen for thematic research, while a sentence might be more appropriate for a strictly linguistic analysis. The user can choose as well if the co-occurrence will be calculated considering the lemmata, the forms, or the code of the words included in the span. Finally, the user can decide to filter the results, taking into account—for the calculations—only some grammatical categories (for example the verbs, or the substantives, etc.).  HyperbaseWeb shows also the co-occurrents of second degree, i.e. words that are co-occurrent of those being considered. 
- The distribution tool combines different kind of functions whose aim is to show how linguistic features are distributed in a corpus formed by several texts. In particular, through the calculation of the z-score, it is possible to show which grammatical or lexical features differentiate each part of the corpus.  The results can be visualized as a histogram, which can display the z-score, the absolute frequency or the relative frequency of a certain form, lemma, or code. The program can also generate a Correspondence Analysis (CA), representing on a Cartesian graph the relative positions of text and features (or terms) in order to highlight oppositions or, on the contrary, correlations among parts of the corpus,  based on the words or grammatical categories chosen by the scholar. Another available graphical representation is the tree-analysis, which organizes in branches either the texts of the group or the categories chosen on the basis of the proximity to/distance from every other element. The number indicated on the node shows the priority in the grouping of elements. The distance dividing one element from another (measured by following the branches) indicates the distance between the twos. 
2. Seneca’s Natural Questions and Pliny’s Natural History
2.1 Statistical Data
Let us compare, as an example, two paragraphs in order to sketch out the differences between the two authors. In order to identify comparable sentences, we will choose a passage in which both authors deal with the movement of rapid winds. Even though both the paragraphs treat the atmospheric phenomenon of “accidental winds,” the reasons why the subject is brought up in the text are distinct. While Pliny provides a systematic description of all of the atmospheric phenomena, which includes typhoons, etc., Seneca, focuses on the confutation of Epigenes theory of the origin of comets, which states that they might arise out of cyclones. Pliny insists on the actual description of the landscapes and the natural elements that cause the formation of storms; Seneca’s paragraph is, on the contrary, focused on the necessity of showing that the evolution of cyclones prevents the possibility that comets might originate from them:
We also find in Seneca a rhetorical question, which is clearly a way of convincing the reader. Another striking element is the absence, in Seneca, of relative clauses. Now let us see now how Pliny deals with the subject:
The TLL proposes four categories to describe how the adverbial et can be used: additive, cumulative, iuncturae, singularia. Because the research tool enables search for a specific lemma, we can find directly all occurrences of the adverbial et (LEM: ET_1) in Pliny’s text. Many examples of et follow a conjunction or an adverb. In this case the first particle determines the ‘role’ of the added element: for instance, in sed et (uero et) the added element contrasts, somewhat, with the previous one,  or quin et adds an element that emphasizes what has just been written;  ideo et announces that the added element is a consequence of what was stated in the previous sentence.  Contrary to Pliny’s usage, we never find such a combination of words in Seneca. The TLL considers iuncturae to include also the group ‘et + possessive pronoun/adjective’, which we found in our text. This group is specialized in the expression of one concept: that an event, phenomenon, took place also in contemporary times: 
This schema is interesting because it reinforces the credibility and meaning of the notions just described: that Pliny’s epoch also witnessed such events, on one side, stands as a proof of what he says, and, on the other, cues the reader to consider such ‘abstract’ material in the frame of his own experience.
Pliny always uses unde et to introduce an etymology, adding information (the name), then linking it to the previous phrase by means of the etymology. His intent is similar to what we have already seen: the name of the phenomenon (or the river) guarantees the validity of what has just been said and links the information to something familiar to the reader. This expression is typical of the entirety of Natural History, representing therefore a “plinian feature” (we found the expression unde et nomen at Natural History IV 65, V 73, VIII 218, etc., especially in botanical books  ). While Pliny is the first to employ the expression unde et nomen to introduce an etymology, the expression would subsequently be used by different authors (Cyprian, Ambrose, Augustine, Cassiodorus), and it is regularly found in Isidorus’ Etymologiae. We see therefore how the high rate of adverbial et is partly explained by the use of some recurring expressions necessary to Pliny’s informative aim. The expressions become part of the technical language, a kind of formula for introducing certain information.
3. Conclusions: Natural History II, Natural Questions VII, and other literary genres
The second point implies the re-consideration of Pliny’s title Natural History. We have already mentioned the complexity behind the term Historia, but an interesting analysis carried out by P. Jal  underlines how the title might hint at a new conception of history:
We have seen that the linguistic features that we have analyzed are often explained by the necessity to convey as much information as possible drawn from a wide range of sources; to provide the reader with non-hierarchical data, leaving them to the users’ interpretation; to link ‘remote material’ to the readers’ present reality. These concerns were partly shared by historians; on the contrary, both philosophers and rhetoricians are interested in convincing the audience through a precise process of reasoning, which brings strict organization to the speech or selection of material. This AFC (Figure 7), indeed, shows that the elements which characterize the oppositions along the principal axis are those mentioned before: for historical texts and the Natural Questions, substantives (which have an important weight in determining the first axis, 30%), adjectives, adverbs, coordinating particles; for philosophical and rhetorical texts, verbs and all the elements providing an interaction with the reader (interrogative particles, interjections, etc.)
Appendix: statistical data
1. Distribution (z-score) of parts of speech between Natural History II and Natural Questions VII (Figure 1).
2. Distribution (z-score) of nouns categories between Natural History II and Natural Questions VII (Figure 2).
3. Distribution (absolute frequency) of nouns cases between Natural History II and Natural Questions VII (Figure 3).
4. Distribution (z-score) of verbal categories between Natural History II and Natural Questions VII.
5. Distribution (z-score) of 30 more frequent adverbial lemmata between Natural History II and Natural Questions VII (Figure 4).
6. Distribution (z-score) of adverbial et in literary database.
|Plinius, Ep. I-II||0.97||Nepos, DVI||-9|
|Horatius, Ep. I-II||-2.44||Suetonius, Vitae Caesarum I-III||12.05|
|Curtius, Historiae IV||-1.24||Sallustius, De Coniuratione Catilinae||-4.43|
|Petronius, Satyricon||2.49||Sallustius, Bellum Iugurthinum||-6.54|
|Cicero Rhetor||-6.1||Tacitus, Agricola||-0.72|
|Cicero Philosophus||-5.33||Tacitus, Historiae I||0.85|
|Seneca, Consolationes||-0.88||Tacitus, Annales I||-2.08|
|Seneca Philosophus||-0.83||Tacitus, Germania||6.03|
|Livius, Ab Urbe Condita I||1.37||Cato, De Agri Cultura & Origines||-6.49|
|Caesar, Bellum Civile I-II||-6.18||Plinius, NH II||16.07|
|Caesar, Bellum Gallicum I-III||-6.33||Seneca, NQ VII||-1.23|
|Vergilius, Georgica I-IV||0.86||Lucretius, DRN V-VI||-5.13|