Shakespeare authorship

Musings on what we can know …



A scientific standard of evidence

by Dr Barry R. Clarke

Having been engaged in the Shakespeare authorship question for many years, and having witnessed various arguments that have been put forward for various contestants, I have to say that I have grown to become less interested in the answer to the question ‘Who contributed to this particular Shakespeare work?’ than in how the answer is to be established beyond reasonable doubt. In other words, how effective are our methods for establishing the degree of truth of this or that fact?

For example, did a group of contributors write the Sonnets or only one person? My response to this question is: How are we going to decide? My best current answer is: By means of a rare phrase test using the Early English Books Online (EEBO) database. This method, to which I devoted considerable thought and time during my PhD work at Brunel University 2010-13, I call Rare Collocation Profiling (RCP). Although still in need of much development, I cannot see a more convincing method available. After all, the rarer a phrase the more it points to a single mind, and the procedure gives thousands of contemporary authors the opportunity to appear or not appear in the search results for rare correspondences. Crucially, it offers the chance to eliminate contestants, an important characteristic of the scientific method, and in my view, it is far more reliable than traditional academic methods such as a stylometric word count of a document, which rests on the dubious assumption that a text is uniform in a single contributor.

The difficulty with a stylometric method is that it relies on an extended body of text to perform a count. To be informative, this text must have only one contributor to which is associated a set of numbers associated with the word characteristics that are being counted (e.g. one characteristic could be words ending in ‘-ish’). Now, it is by no means certain that a text in an author’s corpus has not been revised by a different hand and if it has then this set of numbers would be an inaccurate representation of this author. However, there is a larger objection. A target text with more than one contributor has little prospect of its several hands being suggested by the method. The assumption is usually made that multiple contributors would be allocated different scenes of a play text, and would therefore be separated in their contribution. However, it is easily possible that a scene has been revised at a later time by a different contributor. This impasse, which I call here the ‘problem of uniformity’, is the chief difficulty with stylometric methods and can easily result in misattribution. What is needed is a more forensic procedure that does not rely on an uncorrupted text, and which through the meaningfulness and rarity of the elements being processed (i.e. phrases and collocations) is more likely to point to particular personalities. The RCP method satisfies this criterion.

So far I have applied RCP to A Funerall Elegye by W.S. (identified matches: John Ford, Shakespeare canon), the Gesta Grayorum (identified matches: Francis Bacon), A Comedy of Errors (identified matches: Thomas Nashe, Thomas Heywood, Francis Bacon), Love’s Labour’s Lost (identified matches: Thomas Nashe, Thomas Heywood, Thomas Dekker, Francis Bacon), The Tempest (identified matches: Francis Bacon), and Twelfth Night (identified matches: Thomas Heywood, George Chapman, Francis Bacon). The Tempest is strong in rare correspondences that Bacon shares, and although it may come as a surprise that he appears in these lists, he was heavily involved in the 1594-5 Gray’s Inn Christmas revels where The Comedy of Errors and Love’s Labour’s Lost were intended for performance (see PhD thesis) and he was a prominent member of the Virginia Company which has strong connections to The Tempest (see peer-reviewed publication ‘The Virginia Company’s role in The Tempest). Also, the method’s identification of John Ford as the major contributor to A Funerall Elegye has assisted in settling a long-standing debate as to whether or not ‘W.S.’ was referring to William Shakespeare. It seems not, at least not unless Ford’s name was in need of concealment in the dangerous circumstance of a murder having been committed.

However, what if the RCP test is applied to the Sonnets and the result is inconclusive, that is, no contestant turns out to have a particularly strong return? There are several reasons why this might be so, for example, the supposed sole Sonnets contributor has no corpus of letters and prose work in the EEBO database. In that case, he won’t be identified and I say we cannot then challenge Shakespeare’s default claim to the Sonnets (which is justified by his name, or a similar one, appearing on the work). In fact, this is precisely how one would conduct a statistical hypothesis test. First, set up the null hypothesis ‘Mr Shakespeare wrote the Sonnets alone’, then establish the alternative hypothesis ‘There are one or more different contributors’, and finally test it using the available data in EEBO.  Of course, Mr Shakespeare could not participate in such a test himself as there are no independent letters or prose works of his to test the Sonnets against. So since he has no opportunity to fail the test, the conclusion has to be that he cannot be scientifically eliminated from contributing to the Sonnets.

There are those who might recoil at the suggestion that a current attribution method they are employing is unsound. For example, someone might object that they have spent a lifetime researching Joe Soap and have found dozens, no … hundreds, of biographical correspondences between his life and the Sonnets. The trouble is, so have other researchers … for Fred Bloggs, Egbert Nobacon, Sid Snipe … What does this imply? It implies that this type of evidence can be collected for several candidates and for this reason it establishes nothing. The notion that biographical connections are informative is an illusion and they have no place in the science of authorship attribution. Their only use is to reinforce the views of those who have already decided on their favourite candidate, who only collect information relating to that person, and who reject any opportunity to test whether or not he/she was actually involved. This is certainly not a scientific attitude yet the Shakespeare authorship question is heavily populated with such investigators who have no intention of modifying their hypotheses as new evidence arises!

I can hear others objecting that there are some very odd things going on, like no one came out at the time and said Mr Shakespeare wrote the Sonnets, and I would agree that it is not what might reasonably be expected for a man whose name is on a collection of first-rate poetry. In fact, there are books devoted to lists of doubts which range from the non-existence of manuscripts to the absence of eulogies on Shakespeare’s death in 1616. Nevertheless, however well researched these works are, suspicion is not evidence. There must be some kind of scientific test that gives every contestant a reasonable chance of being either eliminated or suggested, and if there isn’t one, or if it is inconclusive, then I say we must be ready to admit that a scientific challenge to Shakespeare’s claim to this or that work is not possible. Fortunately, it seems that the RCP test shows promise despite being limited only to those with several works in the EEBO database.

I have often thought it unfortunate that the Shakespeare authorship question has attracted so many who trade with abandon their favourite candidate’s biographical connections to the Shakespeare work. The attraction seems to lie in the thought that since little is known with any degree of certainty, then one is free to fantasize whatever one likes, free from critical analysis. After all, if no one knows very much then one can be secure in the knowledge that there is no compelling evidence to deliver the pain of contradiction. As a  trained scientist this thought depresses me greatly, because there is no easy route through the vast and intricate maze of connections that is the Shakespeare authorship question, and the construction of a convincing scientific argument takes considerable (and I mean considerable) research, care, and judgment.

So what I would like people to do is to think more carefully about the standard of evidence of a proposed method, and to consider ways in which a proposed scheme might allow the ruling in or out of other contestants. To me, any attribution method that does not have this characteristic is defective and biased. I also believe we should be more ready to declare that such and such a question has insufficient data to answer it, rather than succumb to the temptation to over-interpret circumstantial evidence. It’s a pleasurable game but it is unscientific.

I hope that these conclusions don’t extinguish anyone’s enthusiasm for investigation in any way. All I ask for is more thought devoted to the standard of evidence and how far it eliminates other contestants.

Techniques of Shakespeare authorship attribution


by Dr Barry R. Clarke

It is currently a popular practice to seek out biographical connections between a particular candidate and a Shakespeare play. However, the fact that this can be carried out for several personalities is sufficient to undermine the method as an authorial determinant. The only secure way is to examine verbal correspondences between a candidate’s canon and a target text but, as we shall see, this too has its difficulties.

Statistical stylometry is one such method, and works on the basis of obtaining samples of textual data from several candidates for a test against a target text. Certain attributes of this data are kept constant across all candidates and target text, for example, sample size (equal size texts), genre (for a comedy play under test all samples are taken from comedy material), and chronology (all samples and the target are from the same period). A count of certain linguistic items is then performed for each sample, each item serving as an authorial marker. For example, the proportion of words ending in ‘-ish’ or those beginning with ‘dis-’ might be taken as separate linguistic items. This count is carried out for many different items and are then combined in a correlation calculation for each candidate against the target text to see how closely their counts match.

The flaw in this statistical method when carried out on a Shakespeare play is the dubious assumption that it is the work of only one hand. If a scribe, editor, compositor, or a dramatist has made any alterations to the word spellings of the original text then this assumption falls. In this case, there might be several contributors to a Shakespeare play and what one is actually counting up in the target text is the average effect of several unidentifiable hands. Clearly, a count of this kind, which relies on the entire sample of a Shakespeare text, could easily be a corrupt endeavour.

A more reliable method is to compare phrases and collocations. Being more complex than words, they are less vulnerable to editorial intervention. However, it is not enough to find parallels that only a certain candidate and the Shakespeare play share. Again, this type of success can be obtained for several candidates. Ideally, phrases and collocations in the target text are checked for rarity against a contemporary database of searchable texts, such as Early English Book Online (EEBO). A record is then made of which candidates used them. This kind of rarity actually eliminates candidates. To argue that a candidate contributed to a play, there needs to be a sufficient number of such rare matches from the candidate’s canon both before and after the assumed date of the play. This suggests mutual borrowing, which is so unusual that the argument can then be made that the candidate contributed these particular rare phrases or collocations. Of course, the more accurate the assessment of the target text date, the more assured is the claim for contribution.

Here, there is a clear difference in emphasis between statistical stylometry and the rare collocation method. The former performs a count on an entire sample, which if it is a Shakespeare play might contain several unidentifiable hands. However, the latter method only makes claims about particular phrases and collocations. It surgically separates these elements from the rest of the play, and the only assertion made is that a particular candidate contributed those particular elements.

Needless to say, if a candidate to be tested has insufficient words in EEBO then there can be no test and I suggest no assertion about this person’s contribution to a Shakespeare play can ever be made.

For more information on the new method of Rare Collocation Profiling (RCP) see Barry R. Clarke, ‘A linguistic analysis of Francis Bacon’s contribution to three Shakespeare plays,’ PhD thesis, Brunel University, UK, 2013, which can be found online.

A summary of this work ‘Developments in the Shakespeare authorship question’ is also available online.

Am I really this alone?

by Dr Barry R. Clarke

On 23 November 2014, I attended the Shakespeare Authorship Trust conference at the Globe Theater. Wonderful Shakespearean actors, a day of informative talks on Shakespeare and France, and then came the endgame: a round of five-minute speeches about the claims of different authorship candidates to have originated the entire Shakespeare work.

Two thoughts occurred to me during these expositions. The first was that the speakers all seemed to share an equal conviction that there was a single concealed author and that their man/woman was he/she. This was no superficial conviction either. It was a powerful one, based on years of gathering snippets of evidence in support of their chosen candidate. However, pause to consider that if any of them were correct, it could only be one of them, and in that case the rest must be labouring under a grave misapprehension. In other words, four of the five speakers must have been talking nonsense! … Forgive me! I’m over-simplifying it slightly … well, a lot actually … because I happen to know that one or two of the speakers actually believe that a group of authors were conspiring under the name of Shakespeare. So perhaps they could all have been right. Well, this brings me to my second thought, and this struck me with far greater force than the first. How could they possibly know that their candidate was involved? Connections to the Earls of Montgomery and Pembroke who were dedicatees for Shakespeare’s First Folio (1623), autobiographical references in the plays and sonnets, known visits to France for the background to certain plays, all interesting circumstantial evidence, but the trouble is that a case of this kind can be put together for several candidates and it has been. Clearly this is not the type of evidence that excludes other possible suspects.

So it strikes me that there’s a problem with the standard of evidence and here’s my suggestion. Unless one can show that a candidate’s personal literary idiosyncrasies occur in the Shakespeare work, elements that reveal a uniqueness of thought, then there isn’t much of a case. The suspect needs to have letters and prose works containing rare phrases and collocations that can be compared against the Shakespeare work for correspondence. Ideally, these would be unique matches between the candidate’s canon and the Shakespeare work under investigation. This is a crucial point, because it is the nature of scientific evidence that it strictly narrows the range of possibilities. That is, there needs to be a unique correspondence of some kind.

Of course, we could entertain the hope that the original Shakespeare manuscripts turn up written in a hand that is undeniably that of our favored hero. Surely no one could be deluded enough to believe that this could happen. Think again. There is one investigator who has invested vast resources trying to prove that the manuscripts are buried down a water-logged mine shaft on a remote Canadian island. Oh, dear! Have we really sunk this low?

As access to proper academic resources and techniques gradually reveals more about what actually happened four hundred years ago, then some of these gratuitous ideas will slowly vanish into history, at least, that is my hope.

So, the evidence needs to be more scientific, but who will raise a [middle] finger to help in my quest as I say ‘UFO’ to the religiously entwined … I mean, is there anybody out there?! … Am I really this alone?! …

