Making books digitally
Digital publication at CHS: Ideally, a streamlined process
Creative publishing arrangements and exploiting online technology
The Center for Hellenic Studies (CHS) has embarked on an aggressive publication program with explicit technical goals. As part of the intellectual mission of our editorial group, we are committed to unfettered, free online publication of all books that we publish and to print publication of some books. Our print/online publication process, which is in development, is based on TEI–XML source and uses open source tools to convert proprietary word–processing files to TEI–XML and publish the result. Since it is also our goal to validate and expedite online publication, we are developing a process and tools, described here, that others can adopt or modify to produce online and print books rapidly, beautifully, and accurately. CHS is also committed to educating scholars young and old about the technology of publication. In fact, our publication process requires their hearty participation to enhance the speed and lower the costs of publication. CHS is also committed to experimental uses of online publication to complement print publication as well as innovative arrangements with traditional academic publishers in the interest of generalizing its goals to the academic community and of making creative classical scholarship available to the widest possible audience.
Making books digitally [top]
Over the last 15 years, digital technology has changed the way that books are produced. Book production has devolved from a highly capitalized large–scale industry requiring time, experience, artisans, and apprenticeship into a process that can be carried out by a small team of people with editorial, managerial, and some technical expertise, along with a technological infrastructure of relatively modest proportions. While before it took years to make a book from its manuscript, whether typed or handwritten, now it can take weeks or months. This is due not simply to the arrival of computers on writers’ desktops, but also to the digitization of the editing, production, and the printing process as well.
At the same time, academic books are more expensive than ever, in part, to be sure, due to the astronomical increases in the cost of paper over the past few years. And although they require less time to produce, the difference in production time (as opposed to the time it takes to actually write an academic book or any other book, for that matter) has not been as dramatic in academic publishing as it has been in commercial publishing. Commercial publishing is driven to increase the speed of production by the huge value in coming to market first with a book on a “hot” subject, and it is now possible to produce a complex technical book, for instance, complete with illustrations and indexes, in a matter of weeks without greatly compromising the quality of the product. But commercial and non–profit academic publishers still work on a timetable of months, not weeks. As for the process of online book production, it has generally been undertaken hesitantly and on the model of print production both in presentation (as pages in self–standing books), technology (as a by–product of print production), and in sales (by subscription or piece). There is a deep fear of the effect online publication can have on the sales of printed books and on the control of intellectual property. In a world in which the market for academic books in general has been dwindling, such fears cannot but evoke some sympathy and some concern.
The Center for Hellenic Studies is in a position to experiment with solutions to these problems in ways that it hopes will be helpful to the academic community and traditional academic publishers. There is a need to develop models for cutting the costs of print and online publication without compromising the quality of the result. There is a need to face the fact that whatever the aesthetic and sensual advantages of printed books, the delivery of books online is bound to become routine, so that an appropriate use of online technology needs to be integrated into the process of book production. (For some major academic publishers, the production of online books is an offshore, aftermarket industry requiring the morally complex use of cheap labor in third–world countries; after print publication, it should be understood, most publishers do not even keep copies of the electronic files from which their “intellectual property” is printed.) There is also an unrealized need to exploit the value of online technology for its virtues rather than to consider it a way to convey knowledge that only competes with printed books. There is also the consideration that in fields like ours, concerns about the theft of intellectual property are misplaced. The very survival of our fields, to say nothing of academic publishers within them, requires easy access to the best scholarship that we can provide, not elaborate barriers, technical or financial. to prevent or incite theft.
Digital publication at CHS: Ideally, a streamlined process [top]
Since CHS started its publishing program a year and a half ago with a staff of about a dozen part–time paid and volunteer workers, it has published four printed books. Four more are about to appear in the next four or five months. Our online publication process has yet to produce a single book, but that is about to change in the next few months as well. All eight of these books are in a form that can, we believe, be converted by software with a minimum of human intervention into TEI–XML. Our publication process, which has itself been taking shape as we have been producing books, is as follows.
Once a book is accepted for publication, its author is provided with the CHS style template for Microsoft Word and the CHS style guide (both are available on the CHS publications webpage). Any book that we publish must be tagged with the style tags in the template and written in adherence to our conventions. The style tags in the template have been developed for their correspondence to the structural and inline tagging of a TEI–XML document, and the Word template contains instructions and examples of the styles’ use on the assumption that most users of word processors use them as glorified typewriters.
Once the style tagging of a book is complete, the author and editors must sign off definitively on its content, because once the next stage of the production process begins, there can be no further changes of any kind to the content of the book. If a book requires copyediting and/or proofreading, it must happen now, before the book is transmitted to CHS for production. This rule places a new and unaccustomed burden on the author: he or she must take responsibility for the correctness of the manuscript and not defer changes to the final stages of production. Such changes have become a costly and time–consuming habit for many authors and their publishers, and many authors are not even accustomed to hard and fast limits on their behavior of any kind. But we inform authors of this rule from the start of our relationship to them, and we stick to it. The basic reason for it is as follows: the Word document that is transmitted for print publication is also the canonical source file for online publication, so we need to have only one such file to maintain, not two. But we also do this to streamline the production process and avoid changes to already formatted pages that can ripple through a whole book and cost time and money.
Ideally, two things happen next: the process of print production begins as does the process of online production. The tagged Word document can be imported into any of several typesetting programs for print publication, and we have or are developing templates for the production of print pages in those proprietary programs based on the Word style tags in the canonical source document. Those style tags (including index tags, although we have not yet implemented them) can be preserved on import and defined according to our specifications in the standard typesetting programs like Quark XPress, FrameMaker, and InDesign. This saves time in the page layout process. That process then goes forward in the hands of a professional. Admittedly, the use of proprietary programs like MS Word and Quark XPress is not consistent with our goals, but the commonness of Word and the comfort level scholars have with it as a writing tool, along with the lack of full–featured open source typesetting program (XSL–FO is apparently not yet a viable option), constitute realities that we are willing to live with for the moment. We are also developing additional versions of our style–tagging template that will work with open source word processors like Open Office or AbiWord.
The other thing that happens once the content of a book has been finalized by author and editors is that online production begins. Currently, only the Windows XP version of MS Word 2003 can save tagged Word files in a format that Microsoft deems to be XML. It is a flat XML with a host of presentational tags and not much in the way of structure, but it can be the basis for conversion into TEI–XML since it preserves the CHS style tags embedded in the document by the author. CHS commissioned Erik Ray, the lead author of the O’Reilly books Learning XML and Perl and XML, and Benn Salter, a freelance Perl programmer, to develop a conversion tool that uses Perl and XSLT (also two Perl modules, XML::LibXML and XML::DOMHandler, and two open source C libraries, lib2xml and libxslt) to convert Word’s version of XML into TEI–XML. Since word processors like MS Word are not structured editors of content, there is no guarantee that the document produced by this converter will parse, so there is a need for a manual editing and parsing pass to correct any errors. But our experience is that the documents we are converting are simple enough in structure to make this process largely automatic. The final step is to publish these documents with a style sheet on the CHS website using Cocoon. For a book of normal length, this process should take a part–time worker at most two weeks.
Once a book is formatted for typesetting, the author receives a printout of its pages to proof for format errors only and from which to prepare an index (unless index tags have already been embedded in the source; this is our plan, to have embedded indexes). Then the index is formatted and proofed and pdf files of the book are sent to the printer. CHS has an arrangement with Harvard University Press to market and distribute the books that we produce to bookstores, and they fulfill sales of our books as well. We intend to market and distribute online books on our website, which is beiing redesigned with the kickoff of this publication program in mind. This whole process should take not more than three months and can take less.
Our financial goal in this process is only to recover our costs, and our experience so far, with some exceptions for books that were begun before our program was in place, is that we are able to do so. We generally design book covers in house, I should add, and we only print soft–cover books, since the price of binding hard–cover books is substantially higher. In other words, this is a low–cost and relatively quick was to publish attractive printed and online books, and it works well for Harvard University Press as well.
Creative publishing arrangements and exploiting online technology [top]
It should be clear that our arrangement with Harvard University Press is indispensable to this publishing methodology. The crucial part of book publishing that requires labor and investment and management beyond the abilities of our group is marketing and distributing printed books. But our ability to get this far with the process is an exportable model that other small, non–profit groups may find useful in cooperation with academic presses whether commercial or non–profit.
We are already entering into a similar arrangement with another academic press, and we are also making innovative arrangements of other sorts. Two major academic publishers are allowing us to produce online books from Word source files and to make them freely available, with the stipulation, for example, that the online publication takes place a year after the initial print publication. This gives these publishers some confidence that their print books will sell substantially enough to meet their financial needs and expectations. We are also looking into the possibility of taking a few treasured but now out–of–print or low–sales titles, obtaining the online publishing rights to them, and converting them to TEI–XM for online publication on our site. One of the virtues of online publication is that it is fast, relatively cheap once the infrastructure is in place, and that we can do the marketing and distribution for online titles on our own. For such projects we are not interested in recovering costs but in making creative scholarship accessible. That is an appropriate goal for a reseearch center like ours.
Another strategy we wish to take is to devise and implement ways of presenting online books that are complementary to print books rather than in competition with them. For instance, we have the ability to pre–print online books and to permit and capture online interaction about their contents. This process can even serve an editorial function for us before a book is printed. Or we can use online windowing technology to display complex information in a way that is not possible in print. Here I am referring to the Homer Multitext project, where manuscripts, incunabula, texts with variants, scholia, translations, and commentaries will be simultaneously available to a scholarly reader, or at most a click or two away. Another idea we want to experiment with is to coordinate a book’s website with its print version to display, say color images (which are ridiculously expensive to print) and supplementary illustrations that are referred to with explicit URL’s in the printed version of a book. Finally, we are also interested in the online publication — true online publication, not the presentation of page images — of Classics journals. Stay tuned to this space for some news about our hosting a well–known one in the near future.
The publication program at CHS is a work–in–progress, and we hope to fulfill the intellectual space of a center by proposing and working through and teaching others about effective solutions to some of the problems that beset academic publication. If we can make the transition to fully digital publication work well for our field, that will only be consistent with the paradoxically early and fond embrace of digital technology by progressive Classics scholars. It should also be noted that I have only discussed here CHS efforts with regard to the publication of scholarly monographs and collective volumes. Look elsewhere on this site for pathbreaking work, for example, in the publication of texts and for the technology of that effort, and for CHS’s efforts to educate scholars in the digital technology that supports it.
To refer to this please cite it in this way:
Lenny Muellner, “CHS Publishing Program and Goals,” C. Blackwell, R. Scaife, edd., Classics@ volume 2: C. Dué & M. Ebbott, executive editors., The Center for Hellenic Studies of Harvard University, edition of April 3, 2004.