IHOSE header

"aided by this chart" [SIGN]

Throughout the Canon, we encounter a variety of "grams": telegrams, programmes, cryptograms, monograms, and cablegrams; such was the nature of the written word in the Victorian age. Indeed, telegrams in particular often play a crucial role in many different Holmes adventures. 

Now in the age of Google, we have another "gram" to consider: Google Ngrams. The Google Ngram Viewer is a handy tool that enables the user to see the frequency of words and/or phrases across the vast corpus of digitized books that Google has at its disposal. 

[Editor's note: click on any of the links above the images below for an interactive version of the graphs.]

Considering the multi-cultural appeal of Sherlock Holmes, it may come as no surprise to Sherlockians that the default example words and phrases that show up when a user logs on to the Ngram Viewer are as follows: Albert Einstein, Sherlock Holmes, Frankenstein. It did come as a surprise to me, however, that Frankenstein references outrank both Albert Einstein and Sherlock Holmes in the English corpus! Take a look:

Sherlock Holmes gains a bit of ground if we change the corpus to British English, as shown below:

And if we use the smoothing tool set to 50 (instead of the default 3), we see a result that is more comforting to the Sherlockian:

Now I'd like to make it perfectly clear that I am by no means an expert in statistical analysis. What follows is mere conjecture from my own point of view, as a fan of both Sherlock Holmes and Tarzan. (The reader has been warned!) However, as I began to consider Ngrams and iconic characters, my mind went back to a discussion I once had with some fellow Sherlockians at a meeting here in Nashville, Tennessee, wherein we got talking about the popularity of Sherlock Holmes vs. the popularity of the famous Edgar Rice Burroughs character, Tarzan of the Apes. To my surprise and delight, several of the Sherlockians at that meeting were avid Burroughs fans, something we had in common beyond our interest in Holmes.

As I considered the Ngram results mentioned above I wondered, how would references to Sherlock Holmes compare in frequency with references to Tarzan? After all, both characters were iconic, and both characters were associated with their creators for the better part of their careers. So I decided to put it to the test: putting in a time range of 1910 (two years before Tarzan of the Apes was published) through 2008 (the most recent figures available on Google Ngram Viewer), I chose to search the English corpus. Here are the results:

Well! We see quite a lead for Tarzan in the first several years, and then Sherlock Holmes gains some popularity, until the last several years of the search period. A very popular time for Sherlock Holmes references starts after 1930 (due in part to the publication of the Doubleday edition and then the on-screen popularity of the Arthur Wontner and Basil Rathbone, no doubt). Tweaking the search parameters a bit more, changing to the "American English" corpus, Tarzan gains a bit of ground in the early years of this century (although we still see that frequency of Holmes hits in the 1930s), represented here:

And just as one might expect, changing the corpus to "British English" puts Sherlock Holmes firmly back on top:

[On a personal note, I find it interesting, if inconsequential, that there is a pretty sharp spike in the graph above in the early 80s, which just happens to be around the time I started reading Sherlock Holmes as a kid!]

Then I discovered the parameter entitled "English One Million." According to the guide to using the Ngram Viewer,
"The 'Google Million'...are in English with dates ranging from 1500 to 2008. No more than about 6000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year..." 
That corpus yielded a slightly different result, with Sherlock leading the way for most of the time period, except for a huge spike for Tarzan in the 1990s! (Right between the release of the film Greystoke, Legend of Tarzan in 1984 and the Disney animated film Tarzan in 1999.)

One more parameter change I tried, which yielded a slightly different result, was changing the corpus to "English Fiction." The steady downturn that the graphs take through the decades indicates to me that, even though writing about the characters has continued to proliferate (as is clear from several of the graphs above), the actual publication of the fiction works featuring the two characters has decreased, as you can see here:

In my romp through the hills and valleys of the Ngram Viewer, I've been looking at written references to a couple of the more popular characters in the past century of English publishing. Both Lord Greystoke (Tarzan) and the Great Detective are almost synonymous with the names of the authors who created them. Although Conan Doyle's output may be seen my some critics as more "literary" than that of Burroughs, the Tarzan novels and the Sherlock Holmes novels and stories share a similar popularity among their respective fandoms. The sheer volume of subsequent reprints of the novels and stories about the two characters is almost impossible to calculate. Moreover, both characters have been largely in the public domain for some time, which has no doubt contributed to their continuing popularity in print. I would also postulate that the popularity of both characters has been increased immensely by their many appearances in a variety of film (and stage) adaptations.

As I pointed out in an earlier article here, authors such as Philip José Farmer and others have postulated links between the Holmes universe and the Tarzan universe. John Clayton, after all, is both the English name of Tarzan, aka Lord Greystoke, and the name of a very minor character in Conan Doyle's The Hound of the Baskervilles. Whether there is any actual link between these two iconic characters or not, their popularity in the written word has often been intertwined, at least as I read the graphs above.

I would love to hear thoughts (and even dissenting opinions) from readers who may be more well versed than I in the analysis of Ngrams. Please share your opinions in the comment section below. And give the Google Ngram Viewer a try some time. (Warning, though: it can end up wasting scandalous amounts of your time!)