Writing in 2012, Paul Goldman pointed out to the problem of “original material” in illustration studies. “In simplest terms,” he writes, “one cannot study a subject if the raw materials are not available relatively easily and equally importantly internationally” (Goldman 2016, 20). This was only five years after one of the early scholarly digital illustration archives, the Database of Mid-Victorian Illustration (DMVI), was published online in 2007 (Thomas 2017, Ch 1, note 8). In a matter of years, an unprecedented number of illustrations began to pour onto the web. Nowviskie observes that “hand-crafted, boutique digitization by humanities scholars and archivists (in the intrepid, research-oriented, hypothesis-testing mode of the 1990s) was jarred and overwhelmed” (Nowviskie 2015, 386). As I would like to show in this article, with the increasing availability of large collections of illustrations from the 2010s, the concern over the availability of original material has been replaced by the need to make the existing material useable and useful. Since the mid-2010s, millions of illustrations have been extracted from the scanned pages of books, periodicals, newspapers from the fifteenth to early twentieth century. Computer scientists have not only proved the power of computers, but they have also confirmed the importance and quasi-ubiquity of visual culture in the previous centuries. It is now up to the (digital) humanists to tame this growing ocean of illustrations.
For a long time, the text monopolised attention in digital humanities. Text-mining, concordance analysis, voyant tools have become familiar terms in the field. This central place occupied by the text in digital humanities is largely due to the fact that it is the most widely available source thanks to the advance of the Optical Character Recognition (OCR) technology. OCR is a machine learning model trained to recognize the characters (letter, numbers and other symbols) on the photograph of a page. These characters are then added over the photograph as a layer, enabling us to see the same thing as the computer. It is no wonder that this technology which prioritises the text over any other element on the page promoted such an avid interest in the text. It would, however, be unfair to put the blame for the obliteration of illustrations entirely on the OCR technology. Julia Thomas contends that the modern editorial practices have imposed an imbalance between the text and the illustration by cutting out the visual content from the literary works (Thomas 2017: 17-19). This gave the illusion of the mid-Victorian period as an age dominated by the text whilst it was actually the golden age of illustration.1
Nevertheless, a visual turn has been underway in several fields including digital humanities. In periodical studies,2 in the history of science,3 the interest in visual elements has been reinvigorated. The availability and growing number of digital archives contribute to this renewed interest in the illustration and photography on the printed page.4 The visual turn is most clearly observed in digital humanities as it is apparent from the Appendix listing about thirty projects.5 New projects tap into the experience of the existing ones to develop their own answers to the challenges previous projects faced. Every project, on the other hand, is different in its aims which inform the way the material is selected, treated, presented, and the functionalities offered to the “users”.6 To such an extent that the same material might be treated differently in different projects. In other words, as Mussell contends, every digital humanities project is an editorial project which involves a considerable amount of “editorial practice” towards their aims.7 The aims, material treated, limitations and challenges of most of these projects are recounted on their websites and, albeit much fewer, in a monograph or journal articles. Julia Thomas’s Nineteenth-Century Illustration and the Digital: Studies in Word and Image stands out in this literature not only as an overview of the important questions in the creation of digital visual archives, but also as a guide to the field. As the Principal Investigator of two projects (DMVI and the Illustration Archive), Thomas provides insights into the relationship between the text and the image while detailing the making of digital archives. In less than a decade since the publication of this important work, however, the advances in machine learning technology gave rise to another generation of digital archives, and thus a need for a new review of the field. Besides incorporating recent projects in its analysis, this article seeks to understand how digital archives of illustrations have been constructed in the last decade. In this article, as a first step towards the construction of a digital database of the nineteenth-century scientific illustration, I explore the landscape of visual projects. Inspired by Barman et al.’s (2021) paper which investigates the user interfaces of existing digital newspaper archives, this article examines a selection of digital humanities projects which take visual content from publications as source. By analysing the solutions these finalised or on-going projects offer regarding the extraction of visual content, its presentation on a digital platform, and its discoverability, this article seeks to contribute both to visual studies and to digital humanities.
1. Detecting and extracting images
Despite its initial development with the textual material in mind, OCR engine has been shown to be an effective tool for detecting visual content in scanned pages. Digital archives of illustrations extracted from published material make use of OCR data to detect visual content on the photos of the printed pages. The OCR process produces an XML file which, Leetaru explains, “contains an enormous wealth of information normally invisible to users of the consumer version”.8 Most importantly for the purpose of extracting visual content from scanned books and periodicals, it contains information indicating the boundaries of non-textual content in the scanned page. While denoting the parts of the page which they omit due to the absence of text, OCR engines demarcate the boundaries of the regions where there might be visual content. The irony was not lost on the scholars of illustration studies who lament the undue supremacy of the text over the image. “The identification and isolation of the illustration in the digitised and OCRed… pages were enabled not so much because of their visual difference but because of their negative relation to the text: they were recognised as images because they were not words” (Thomas 2017: 35).
Early digital archives of visual content from historical books and periodicals relied exclusively on OCR data to detect and extract images. In recent projects, however, more advanced techniques incorporating custom machine learning models complement it. A decade ago, two large projects, each taking a different corpus of scanned documents as its source, brought this aspect of the OCR to limelight. First of these was Ben O’Steen’s Mechanical Curator.9 Although unconventional in its logic of showing content “almost randomly” and not aiming at the discoverability of the items, this project required the extraction of images from scanned books.10 O’Steen used the Microsoft Books/BL collection which included 25 million pages, 68,000 volumes, 65,000 titles scanned by Microsoft.11 In his quest to develop a cost-effective and simple method for extracting visual material from this vast corpus, he discovered that the regions of the pages which the OCR engine predicted to contain visual elements were marked with the annotation “GraphicalElement” in the XML file containing OCR data.12 The result is a corpus of about a million illustrations and photographs extracted from the Microsoft/BL collection, available for viewing on Flickr.13
The second project came a few months after O’Steen’s large batch was uploaded to Flickr. Taking a larger corpus as its source, Kalev Leetaru’s 500 years of the images of the World’s Books, sought to extract visual content from 600 million digitised books available on the Internet Archive.14 Examining the OCR XML files which are provided for almost each scanned publication on the Internet Archive, Leetaru found that the regions of the page which the proprietary Abbyy OCR engine “thought” to include visual content were marked as “Picture” block.15 Like O’Steen a few months earlier, Leetaru developed a software which detects the picture blocks in each page, if there is any, and then extracts the corresponding coordinates of the page from the photograph. The result is a corpus of visual material containing about five million images hosted on Flickr.16
Today, the process of using the XML file containing OCR data, available for the majority of items hosted on the Internet Archive has been simplified. Anyone with a limited knowledge of the command line and the popular programming language Python can conduct their own projects (Krewson 2019). It must, however, be recognized that OCR does not always produce the best results. This is especially the case with hybrid formats such as newspapers and periodicals in which the content is usually organised in several columns. To overcome this difficulty, more sophisticated solutions to visual content detection have recently been developed.
The recent availability of artificial intelligence chat tools for public use has drawn a great deal of attention to machine learning. As its name suggests, this technology consists in “teaching” the computer with a training dataset based on the analysis of which the computer creates an algorithm and can analyse and classify data which it had not “seen” before. Instead of OCR data, more recent projects rely on machine learning to detect and extract visual content from the scans of the printed pages. This is especially the case with projects which involve the treatment of visual content from newspapers and periodicals. Due to their multi-column organisation, these hybrid formats tend to defy even the recent improvements on OCR engines. From many perspectives, the machine learning technology and its application to visual digital humanities makes enormous projects including millions of images more viable than before. It does not, however, diminish the importance of human contribution. The success of a machine learning model depends on the quality of annotations in the training dataset as well as the diversity of its content. The images in the training dataset need to be representative of the corpus on which the model is intended to run. If a training dataset is based largely on newspapers from the early twentieth century, it might not produce the best results when the machine learning model is fed newspapers from the nineteenth century. Also, a model which classifies images requires a training dataset which includes annotated images for each category proportionate to the scale of the corpus. Finally, the quality and precision of the image annotations play a vital role in the outcome. Therefore, while the creation of digital archives consisting of millions of images which are more discoverable than ever has become a possibility due the minimised need for human input, for annotating a small part of the corpus rather than its entirety, the need for expertise has become ever greater. Besides technological expertise, better guided annotators and a good knowledge of the corpus to select the documents or pages which represent it best have become very important in the creation of the training dataset. In this sense, machine learning, ever more than the previous decade’s large-scale projects which relied mainly on the expertise of computer scientists, brings the two sides of the digital humanities together.
One of the recent digital archives of newspaper illustrations, the Newspaper Navigator based on the images from the Chronicling America newspaper archive adopts such an approach.17 To detect the visual content in 16 million pages of American newspapers published between 1770 to 1963, instead of relying on the OCR data, Benjamin Lee developed a machine learning model trained on an augmented version of the Beyond Words, a crowdsourced project where volunteers were asked to identify a variety of content on scanned newspaper pages (Lee et al. 2020). Lee’s model does not only detect visual content on the scanned pages, but it also classifies them. However, the Beyond Words dataset containing 3,437 images with 6,732 verified annotations did not, Lee found, proportionately represent the six categories into which he wanted to classify the images. In order to increase the number of these underrepresented categories in the dataset Lee added 32,424 annotations to pages including headlines, advertisements and maps. By using this augmented version of the Beyond Words data as training data, Lee, in his Newspaper Navigator, uses machine learning to impose the following categories on the images: photograph, illustration, map, comics/cartoon, editorial cartoon, headline, advertisement. Unlike Leetaru’s project which does not impose any classification on the illustrations, albeit a great asset for a project involving newspapers which include a variety of images, Newspaper Navigator opens up new possibilities for the treatment of digitised pages. This project also provides important insights into the best practices in the creation of a training dataset. For instance, as the Beyond Words dataset only includes annotations on the newspapers from the WWI era, Lee’s model does not perform so well on scanned pages of the newspapers from the nineteenth century as it does on those from the twentieth (Lee et al. 2020: 3061). In its final form, the Newspaper Navigator includes about 1.5 million items of visual content from newspapers published between 1900 and 1963.18
Newspaper Navigator exemplifies the current interest in machine learning models to detect and extract visual content from digital archives.19 Despite the attempts to gamify and optimise citizen science platforms, the larger the datasets grow and the more tedious the tasks demanded from volunteers become. The difficulty of motivating volunteers for time consuming tasks such as transcription has grown the need to develop methods which minimise human intervention while at the same time optimising the results. Only with machine learning models can we imagine an archive of big data whose contents can be discovered. In Newspaper Navigator, for instance, the ratio of the training dataset to the corpus is about 3%.20 Recent studies promise that the need for large training datasets, hence large-scale volunteer effort, might soon be something of the past with the development of new methods which minimise the size of the training dataset (Barman et al. 2021). This might further increase the importance of the selection of the items in the training datasets, hence the expertise of the literary scholars and historians.
2. Platforms
Extraction of the visual content from printed pages is now a relatively straightforward process, and it is rarely the ultimate aim of a visual archive project. The aspect of a project which determines its lifecycle, aims, and publics is related to the platform on which the images are made available (Tabak 2017). Some projects like Science Gossip (see below) have a predetermined lifespan during which they aim at collecting data about their corpora of images through crowdsourcing. The output is datasets. Others, usually with larger datasets, opt for an “in-progress” approach which allows both data collection and consultation for an undetermined period of time on a dedicated or generic platform. In this second case, the choice of platform, its interface determine the limitations and potentials of a project. Here I first treat the issues regarding the interface and search parameters and then focus on the questions related to permanence and re-usability.
2.1. Interfaces
One common feature of the large-scale projects which began around the mid-2010s was that they did not rely on custom interfaces for their repositories of images. Instead, both Kalev Leetaru and Ben O’Steen uploaded the output of their projects, five million and one million illustrations, respectively, to Flickr.21 The illustrations extracted from the Biodiversity Heritage Library (BHL) as part of the “Art of Life” project were also uploaded to Flickr.22 As a social platform optimised for sharing visual content, Flickr provides a venue where the output of visual digital humanities projects can be viewed, tagged, and shared. Although historical illustrations on Flickr theoretically have a better chance of being viewed by a larger public than if they had been deposited on a custom digital archive, from several perspectives Flickr poses problems.
The Illustration Archive launched in 2015 by a group of researchers from Cardiff University is one of the few and probably the best example of an “in-progress” digital archive.23 The team led by Julia Thomas seeks to make one million illustrations extracted by O’Steen from the Microsoft/BL collection into a searchable illustration archive. As mentioned above, this corpus is already available on Flickr and the existence of the Illustration Archive is a clear sign that the limited functionalities of Flickr do not meet the requirements of every research project, although the tags added to the items on Flickr are also fed into the Illustration Archive (Thomas 2017: Ch 1, note 43). On the Illustration Archive, visitors to the website can both contribute to the archive by annotating the illustrations through a guided questionnaire and consult the illustrations. In this part, I only focus on the interface and search functionality of the website by comparing it to Flickr, and below I treat its annotation interface.
The bespoke search functionality of the Illustration Archive includes the following search fields: keyword, illustrator, author, book title, publisher, place of publication, publication date by range or by decade.24 This is a significant improvement on the limited number of search fields which Flickr offers. First, Flickr’s search interface is not intuitive. In the British Library’s (BL) Flickr photostream which largely includes the same items as the Illustration Archive, the user needs to click on the magnifying glass just above the images in the photostream, instead of the search bar at the top of the website, to run a query within the BL collection. Furthermore, to the frustration of illustration scholars, Flickr is biased against illustrations. Searching for “girl” within the BL photostream produces no results because the user first needs to click on “Advanced” search and tick “illustration/art” from the “content” types which is by default unticked, unlike “photos” and “videos” (Thomas 2017: 22). Secondly, Flickr’s search function is very limited and certainly not suitable for historical research. It can be difficult on the search interface to find illustrations from the same book unless they were put together in a collection by the owner of the account. Similarly for the search between date ranges, Flickr offers only two options “date taken” and “date uploaded”, there is no option to indicate such essential information as the date of publication. For these reasons, Flickr is far from meeting the basic requirements of scientific research on the illustrations unless the user is quite adept at programming and can use Flickr API.
2.2. Permanence of records and the reusability of data
Flickr has been one of the popular platforms for researchers working with visual material (Spyrou and Mylonas 2016). Launched in 2004, like Facebook, and purchased by Yahoo the following year, Flickr soon became one of the prodigious children of the Web 2.0. The 2010s, nevertheless, saw the entry of new competitors such as Instagram in the market of visual content and Flickr again changed ownership in 2017. During this period, several features such as free storage space offered, and search parameters also went through changes.25 When Flickr announced that it was going to delete the volume of photos exceeding its new freemium limit in 2018, it was met with criticism.26 Its audience is estimated to have thinned to “an old community of photographers,” and its prospects as a business are assessed to be bleak, unless it “come[s] up with a new and revolutionary feature,”.27 We have very recently witnessed the drastic changes Twitter went through after its sale, and it is always a possibility that any Web 2.0 platform might go through similar experiences. It can be argued that from the perspective of the permanence of records also, Flickr is probably not the best solution.
Nevertheless, the problem of longevity is not only limited to the privately-owned generic content-hosting platforms. Custom digital platforms, too, run the risk of becoming obsolete in a short period of time, requiring a partial and sometimes complete renovation, or abandonment before or after the completion (Solberg et al. 2021: 23). Visitors to the Illustration Archive’s tagging interface might notice that the image for “advertisement” option is missing in the initial questions of classification.28 Furthermore, after spending some time on the website, the user is asked if they want to fill out a feedback form, nevertheless, this form turns out to be empty. More importantly, the server gives an error when the user tries to sign up. As a result, more advanced features of the platform such as creating user collections are currently not available. This lack of maintenance becomes all the more unfortunate as the Illustration Archive is an ongoing project still seeking to crowdsource annotations from volunteers.
These two examples show that platforms might falter, become outmoded, or remain unmaintained. To overcome such a situation and to save the efforts made throughout the lifecycle of the project, DH projects must make the data they collected as well as their source code available to the public. In this way, in the event that the interface is no longer available, the possibility of using the data remains intact. This also enables other researchers to use the data for purposes other than the original aim of a DH project, for instance, as a training dataset. Fortunately, the data from the majority of digital visual archives have been made available to public. Recent projects almost by default share their source codes as well as their outputs, and older projects follow their lead.29 The data including tags and descriptions, as well as the images themselves hosted on Flickr can also be downloaded for more convenient manipulation by using its API (Application Programming Interface).30
Permanence and reusability have more recently become a bigger concern than interfaces. A new trend is afoot in the digital visual archives which aim at providing only the data and a script to the users in the form of a Jupyter Notebook.31 Besides reusability, this move seems to favour environmental concerns. Lee, for instance, calculates the carbon emission as a result of the implementation of his Newpaper Navigator, about 380 CO2, comparable to the carbon emission of one person flying from Washington, D.C. to Boston (Lee 2020: 29).
3. Making Images Discoverable
The difference between an archive and a pile is a catalogue or an index which helps finding what we look for. Once the visual content is detected and extracted from scanned files, how to make those image files discoverable? The problem posed by discoverability far supersedes that of the extraction of images. Here again there is no solution which fits all. The objectives of the project, the layout of the content on the pages, financial and human resources available enter into the equation. Most of the early projects depended solely on the OCR output to discover the visual content extracted from publications. Relatively smaller projects which outlined a clear historical research perspective employed labour-intensive, standardised and manual method of annotating visual content. Some projects, on the other hand, opted for a continuous annotation by volunteers on their own bespoke platforms. In recent years, more sophisticated methods have been developed.
3.1. Metadata
Literally meaning data about data, metadata have enabled humans and computers to communicate through a standardized structure. In the libraries, museums, archives, and repositories, metadata impose an order on the items, making them accessible to the readers, visitors, and users (Gartner 2021: 1-6). According to Buckland, metadata has two main purposes. First, it describes technical (format, size, colours, etc) aspects, administrative (copyright information etc.) limitations, and the content (period, the subject, author, etc.) of an item. This information is usually recorded in standard descriptive formats such as Dublin Core to facilitate storing and promote interoperability with other formats.32 The second purpose of metadata is to enable and facilitate the discovery of documents in the repository. In this case, the relationship between the data and metadata is reversed because the user has the option to “start with a query or with the description rather than the document—with the metadata rather than the data—when searching in an index” (Buckland 2017: 118). For instance, in a digital archive of illustrations from Buffon’s Histoire Naturelle, a user should thus be able to search for “insects” or “birds” which should be included in the metadata attached to the individual files even if the captions of the illustrations might not include the common names of the species they describe. The metadata can also include colours used in the illustration as well as the technique employed such as wood engraving or photography. Inclusion of such information transforms the metadata from a secondary position to a primary source of information.
In a DH project, probably the most important step is the design of the metadata fields and schemes. Drucker contends that this is “where the intellectual and conceptual modelling of research project takes place” (Drucker, 2021, 53). What information to include, to what detail in the metadata are crucial and might be controversial in the project design. Although there are well-established metadata schemes and classification systems, they might not be suitable to a particular project. For instance, the classification system based on the twenty-six letters of the Roman alphabet used by the Library of Congress (LoC) reserves the letters E and F for the history of Americas and only one letter, D, for the history of the rest of the world (Le Deuff 2018: Ch 8). It would probably not be very suitable for a library outside of the US to follow the LoC classification exactly. For the classification of illustrations however we need to look elsewhere. Iconclass, “the most widely accepted scientific tool for the description and retrieval of subjects represented in images (works of art, book illustrations, reproductions, photographs, etc.),” can be considered one of the options.33 In spite of its detailed and wide-ranging definitions however, Iconclass is considered scarcely adaptable to describe illustrations. Invented in mid-twentieth century by the art historian Henri van de Waal, Iconclass is an opiniated classification of visual art into “28,000 hierarchically structured definitions within 10 main divisions” (Thomas 2017: 57). Although it has gone through frequent modifications to accommodate such visuals as novel artistic imagery, it prioritises fine art over popular visual forms. “A browse through the classification system begins with the category ‘Religion and Magic’, while ‘The Bible’ and ‘Classical Mythology and Ancient History’ feature as part of the ten main iconographic categories” (Thomas 2017: 57).
Due to these reasons, Thomas explains that her team opted not to use Iconclass in the classification of the illustrations in the DMVI.34 Launched in 2004 and completed in 2007, making it the earliest digital archive of historical illustrations, the DMVI contains 868 “literary illustrations” extracted from books and periodicals published in 1862.35 This limitation is explained both by the coverage of the illustration collections cut by Victorian collectors which DMVI uses as its source and by the desire to show the abundance and diversity of wood-engraving illustrations in their heyday in the 1860s.36 The DMVI project divides its items into seven main categories (periods, geography, settings, people, activities, objects, themes) and then into 1,123 hierarchical categories. The category “holidays” under “travel and tourism” under the main category “themes,” for instance, yields thirty-six illustrations, while “acrobatics” under “physical motions” under “actions and speech” under “activities” main category produces only two results. Thomas explains that “The relatively small and uniform corpus of material in DMVI enabled us to develop the classification for the iconographic search by a prior analysis of the images rather than fitting the illustrations into preconceived categories.”37 Indeed in a large dataset such an approach departing from the illustration to classification is unfeasible. The growing number of historical illustrations extracted in the last decade prompted the development of several methods with varying degrees of success.
3.2. Crowdsourcing
With the availability of large amounts of illustrations from the mid-2010s and the popularity of social media outlets such as Flickr as well as dedicated citizen science platforms such as Zooniverse, a new method of making images discoverable emerged: crowdsourcing.38 Two main practices of crowdsourcing can be discerned in the existing projects based on their demands from the volunteers: Folksonomy and citizen science. Defined as “A user-generated system of classifying and organizing online content into different categories by the use of metadata such as electronic tags,” folksonomy has quickly become one of the ways for dealing with large corpora of visual material.39 Although projects seeking to understand publics’ engagement with art have produced interesting results, folksonomy is far from providing a standard description of images (Thomas 2017: 71-4). When applied to the “million images” of the Microsoft/BL collection on Flickr, free tagging produced hardly useable results.
Citizen science projects, on the other hand, demand more specific tasks from volunteers, and so the volunteers are provided with a guide or training. Apart from the convenience of providing a ready-made platform for visual content, Flickr is also a social media platform. Projects have been developed to tap into this social aspect of Flickr. The “Art of Life” project (2012-5) which uses the illustrations from the holdings of the Biodiversity Heritage Library (BHL) was one of the early projects to crowdsource the annotation of about 300,000 illustrations in its Flickr photostream.40 A nineteen-page guide to tagging BHL illustrations on Flickr was produced and is available on the BHL blog. In this guide, the volunteer is given detailed instructions how to “read” a historical scientific illustration. Step three dwells on how to identify the name of the species; step four explains how to “machine tag” species in the form of “taxonomy:binomial="Genus species"” or with the name of the illustrator, for instance, “engraver:name="[Givenname Middlename Familyname]"”, so that these tags can be differentiated from folksonomic tags.41 At level three of the fourth step, volunteers are asked to “box tag” the illustrations if there are multiple species in the illustration. Further advice is provided in the document to find the current taxonomic names of the species as well as their common names and to enter this information as tags.42 As of September 2023, almost a decade after the completion of the Art of Life project, out of about 300,000 images, around 52,000 have been tagged with at least one species name, and 30,000 with the name of the artist.
In contrast with other collections treated in this article, some of the content in the BHL Flickr photostream is organised under collections. This is especially the case with periodicals and multi-volume works which are organised in 227 collections.43 Volumes of periodicals like the well-known Curstis’s Botanical Magazine are organised further into sub-collections, one for each year.44 Because the search function of Flickr is not geared towards researchers’ needs, organisation of content in collections is the only way to enable the user to see entirety of the visual content in a volume.
Flickr is lacking from many perspectives to provide a platform for scholarly presentation and crowdsourcing of visual content. As a result, DH projects with defined research objectives either use more adaptive platforms such as Zooniverse or create their own bespoke platform. One of the first Zooniverse projects to go viral, “Science Gossip” as part of “Constructing Scientific Communities” (2014-9) project, asked the members of the public to annotate the scanned pages from nineteenth-century science periodicals on Zooniverse.45 The “Science Gossip” project produced 34,108 annotations for 10,535 pages including at least one illustration from sixteen periodicals.46 Although the project did not include the creation of an archival platform where users could consult this output, it constitutes an important source of training data for the future projects on the visual content of nineteenth-century periodicals.
Illustration Archive, on the other hand, has its own bespoke platform. The visitors to the website are invited to classify and tag illustrations shown to them in a guided manner. In the first step, the volunteer is asked to choose one of the ten pre-defined categories (advertisement, portrait, decoration, title page, location, map, scientific, literature, photograph or none of these) which describe the illustration best. A second question follows to put the illustration in a sub-category. For instance, location is followed by a two-choice question: “by name” or “on a map”; while scientific is followed by eight sub-categories: geological, medical, engineering, botanical, zoological, archaeological, architectural, and none of these. After this two-step classification of the illustration, the user is asked to enter tags and it is especially here that the Illustration Archive makes an improvement on the existing crowdsourcing platforms. Its tagging interface is connected to WordNet, a large lexical database of English where “Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept”.47 When the user enters a tag, a list of choices comes up and the user is asked to choose the most suitable one. According to Thomas, the benefits of this link with WordNet are twofold: First, it improves the collected data by recording at the backend of the Archive, not only the term entered but also its broader categories up to three inherited hypernyms. In this way, when the user enters “ship” it is recorded with the broader categories of “vessel, watercraft”, “vehicle”, and “transport”. This does not only ensure a more precise description of the illustrations and link them semantically, but it also facilitates the task of the tagger and reduces the amount of labour required for the description of images. The second benefit of this approach is to curtail the vocabulary of the user-generated tags and standardise them (Thomas 2017: 75-6). On the other hand, this approach has significant limitations, especially for projects which use more specialist illustrations as source. First, the user is limited to one-word tags or phrases which exist in the WordNet database. Furthermore, WordNet is only suitable for more general descriptions. For a digital archive of scientific illustrations, for instance, a combination of Encyclopaedia of Life (EoL) and WordNet would provide a better source.48 Finally, although the tagging process concludes with the typing of the caption of the illustration, it omits the artist.
The final aspect to consider in crowdsourcing of illustration tags regards the validation of tags by several users. Thomas observes that “In Flickr, the inconsistency and instability of the ‘language of retrieval’, which comes about because the images are tagged differently by different contributors, means that relevant images will not always be retrieved” (Thomas 2017: 22). While platforms like Flickr which are not specially geared towards scientific projects do not provide such functionalities, it is one of the defining features of bespoke crowdsourcing platforms and dedicated citizen science platforms. The final question in the Illustration Archive asks the user to confirm the existing tags by other users on the illustration and remove the incorrect ones. For instance, on the most popular citizen science platform Zooniverse, validation by agreement is called “subject retirement” and functions as a project parameter to “Decide how many people you want to complete each task”. Zooniverse advises validation by five to ten volunteers for tagging, and by three to five volunteers for transcription tasks before an illustration is retired, hence accepted as completed.49 On the other hand, although essential, validation by agreement increases the need for volunteer engagement and delays the completion of a project.
3.3. Text around the image
Assuming that the text around the illustration should describe it, some scholars have developed tools and methods to make the text sandwiching the illustration speak for it. One of the early big data projects on historical illustrations, Leetaru’s database of historical illustrations offered a solution. Leetaru’s aim was to ‘make them all browseable and searchable (via both the metadata of the original book and the text surrounding each image), “reimagining” the world's books’.50 To enable users to search the millions of images extracted from the scans of printed pages on Flickr, alongside such information as the unique Internet Archive identifier for the item, page number, URL, Leetaru added 1000-character text chunks preceding and following the illustration to the description of each illustration on Flickr.51 This method, however, raises a number of questions. Even if it is assumed that the 1,000 characters preceding and following the illustrations satisfactorily describe them, it is also possible that these chunks of text sandwiching the illustration will also include other “keywords,” which give rise to false positives. Furthermore, without other search parameters such as limitations on dates and publication titles, the user is only bombarded with illustrations without any anchor points.
Leetaru’s method of identifying illustrations by extracting the text around them is used in other projects. SherlockNet, which became finalist of the BL Labs Competition in 2016, combined Leetaru’s approach with machine learning to make the Microsoft/BL collection on Flickr discoverable. However, instead of the 2,000 characters sandwiching the illustration, SherlockNet utilises the text on the whole of the preceding page, on the page of the illustration, and the text on the following page to describe the illustration in a limited number of tags.52 The noun phrases extracted from these three pages through Natural Language Toolkit (NLTK) are then compared with those extracted from the pages sandwiching twenty other illustrations in the collection which the algorithms have determined to be the most similar.53 Currently about 836,000 illustrations carry at least one “sherlocknet:tag=” on BL Flickr photo stream.54 A search for “cow” in the BL Flickr stream produces some cows but also a considerable number of false positives, for instance an illustration of zebras. On the inspection of SherlockNet tags, it can be observed that the automated tagging process also included such unrelated tags as port, bird, antelope, stone, and none for zebra. The query for cow yields the illustration because the folksonomic tags include cow but also zebra.55
More advanced projects employing the state-of-the-art machine learning methods and adapted to the source material have emerged more recently. The Library of Congress (LoC) News Navigator developed by Benjamin Charles Germain Lee in 2020 includes 1.56 million newspaper photos sourced from LoC’s Chronicling America project, which includes 16.3 million pages from newspapers published between 1770 and 1963.56 To improve the searchability of the images, Lee leverages both the XML files resulting from OCR process and Beyond Words training dataset. In Beyond Words dataset, volunteers were asked to annotate scanned pages to indicate the textual content in the boxes on the newspaper pages where images occur. Assuming that the textual content sharing the same box with the visual content describes the visual, as captions or titles, Newspaper Navigator, like Leetaru’s project but somewhat more finely, relies on the text around the visual content to describe and retrieve it. However, as the visual content in newspapers are likely to be described more precisely by the captions around them than illustrations in books in diverse subjects published through a long period and in varying formats, the outcome is much more satisfactory.
There are two points to underline from these experiments. First, the tools and methods at our disposal to describe images and make them discoverable based on the text around them is not yet adequate. This is especially the case for collections like that of Microsoft/BL, eclectic both in terms of the formats of publications from which the illustrations were culled, and their genres. Indeed, the relationship between the text and the image seems to vary with genre, period, and format of publication. In literary illustration, where captions are rare, Thomas contends, relying on the text around the illustration does not always produce the expected results. The text referring to the illustration might appear several pages before or after the illustration or the illustration might describe the entirety of the work rather than a specific part of the text (Thomas 2017: 47-53). In scientific illustration, especially in botany, the illustration is also dependent on the text for description (Chansigaud 2009: 7, 10). Saunders, however, observes a change through time in the relationship between the text and the image in botanical illustration. With the establishment of a standard, universal language of description in the eighteenth century by Linnaeus “many works of botanical theory dispensed with illustrations altogether, though they remained essential for books of a practical, descriptive, decorative or reference nature” (Saunders 1995: 8). Where botanical illustrations appear, however, they are likely to be accompanied by an adjacent caption or description.57 It can therefore be argued that in a digital archive, botanical illustrations, more than literary illustrations, might benefit from a combination of the methods mentioned above for the description of its contents.
Recent projects redefine the relationship between the text and image by providing an option to search for images based on their similarity. Lee’s Newspaper Navigator enables user to compare images based solely on image embeddings, that is, a numerical description of the semantic content of the image in computer-readable format.58 The users can create their own collections by picking images and using this collection as a training dataset to search for similar images in the database. The bigger is the size of the training dataset in the user’s collection, the better are the results. Image similarity detection is a developing field which might open up new ways to study large collections of digitised visual content and might decrease to a limited extent our dependence on the text to describe the images, for in this case the images speak for themselves.
Conclusion
Digital archives challenge the way we have been studying illustrations, and even only for that reason they are invaluable sources. In what context could the relationship between text and image be better analysed than the problem of describing digitised images? They encourage the researchers to engage with the public in the framework of citizen science projects. But they are not only challenging. They also contribute to our understanding of the publishing history by showing that visual culture was not the reserve of the twentieth and twenty-first centuries. A greater portion of the public beyond the specialists have a chance to see, be inspired by this heritage of visual culture. They bring illustrations to the comfort of our homes, which encourages research. They also encourage collaboration between experts from different disciplines. Looking back at the past decade of the digital archives, it can be argued that the future is promising for larger and more useable archives.
Digital illustration archives have gone through four stages in the last three decades. The small-scale, hypothesis-oriented projects of the late twentieth and early twenty-first centuries were succeeded by large-scale technology-dominated projects, only to invite human intervention again in the form of voluntary classifications in the 2010s. Today, the scale of human intervention is not any less important, but its scale has been limited to the making of training datasets. Illustration studies is once again going through a drastic change, this time combining the scale and usability, as well as the specialities of the computer scientists and humanists.