by Dr Barry R. Clarke

Having been engaged in the Shakespeare authorship question for many years, and having witnessed various arguments that have been put forward for various contestants, I have to say that I have grown to become less interested in the answer to the question ‘Who contributed to this particular Shakespeare work?’ than in how the answer is to be established beyond reasonable doubt. In other words, how effective are our methods for establishing the degree of truth of this or that fact?

For example, did a group of contributors write the Sonnets or only one person? My response to this question is: How are we going to decide? My best current answer is: By means of a rare phrase test using the Early English Books Online (EEBO) database. This method, to which I devoted considerable thought and time during my PhD work at Brunel University 2010-13, I call Rare Collocation Profiling (RCP). Although still in need of much development, I cannot see a more convincing method available. After all, the rarer a phrase the more it points to a single mind, and the procedure gives thousands of contemporary authors the opportunity to appear or not appear in the search results for rare correspondences. Crucially, it offers the chance to eliminate contestants, an important characteristic of the scientific method, and in my view, it is far more reliable than traditional academic methods such as a stylometric word count of a document, which rests on the dubious assumption that a text is uniform in a single contributor.

The difficulty with a stylometric method is that it relies on an extended body of text to perform a count. To be informative, this text must have only one contributor to which is associated a set of numbers associated with the word characteristics that are being counted (e.g. one characteristic could be words ending in ‘-ish’). Now, it is by no means certain that a text in an author’s corpus has not been revised by a different hand and if it has then this set of numbers would be an inaccurate representation of this author. However, there is a larger objection. A target text with more than one contributor has little prospect of its several hands being suggested by the method. The assumption is usually made that multiple contributors would be allocated different scenes of a play text, and would therefore be separated in their contribution. However, it is easily possible that a scene has been revised at a later time by a different contributor. This impasse, which I call here the ‘problem of uniformity’, is the chief difficulty with stylometric methods and can easily result in misattribution. What is needed is a more forensic procedure that does not rely on an uncorrupted text, and which through the meaningfulness and rarity of the elements being processed (i.e. phrases and collocations) is more likely to point to particular personalities. The RCP method satisfies this criterion.

So far I have applied RCP to A Funerall Elegye by W.S. (identified matches: John Ford, Shakespeare canon), the Gesta Grayorum (identified matches: Francis Bacon), A Comedy of Errors (identified matches: Thomas Nashe, Thomas Heywood, Francis Bacon), Love’s Labour’s Lost (identified matches: Thomas Nashe, Thomas Heywood, Thomas Dekker, Francis Bacon), The Tempest (identified matches: Francis Bacon), and Twelfth Night (identified matches: Thomas Heywood, George Chapman, Francis Bacon). The Tempest is strong in rare correspondences that Bacon shares, and although it may come as a surprise that he appears in these lists, he was heavily involved in the 1594-5 Gray’s Inn Christmas revels where The Comedy of Errors and Love’s Labour’s Lost were intended for performance (see PhD thesis) and he was a prominent member of the Virginia Company which has strong connections to The Tempest (see peer-reviewed publication ‘The Virginia Company’s role in The Tempest). Also, the method’s identification of John Ford as the major contributor to A Funerall Elegye has assisted in settling a long-standing debate as to whether or not ‘W.S.’ was referring to William Shakespeare. It seems not, at least not unless Ford’s name was in need of concealment in the dangerous circumstance of a murder having been committed.

However, what if the RCP test is applied to the Sonnets and the result is inconclusive, that is, no contestant turns out to have a particularly strong return? There are several reasons why this might be so, for example, the supposed sole Sonnets contributor has no corpus of letters and prose work in the EEBO database. In that case, he won’t be identified and I say we cannot then challenge Shakespeare’s default claim to the Sonnets (which is justified by his name, or a similar one, appearing on the work). In fact, this is precisely how one would conduct a statistical hypothesis test. First, set up the null hypothesis ‘Mr Shakespeare wrote the Sonnets alone’, then establish the alternative hypothesis ‘There are one or more different contributors’, and finally test it using the available data in EEBO.  Of course, Mr Shakespeare could not participate in such a test himself as there are no independent letters or prose works of his to test the Sonnets against. So since he has no opportunity to fail the test, the conclusion has to be that he cannot be scientifically eliminated from contributing to the Sonnets.

There are those who might recoil at the suggestion that a current attribution method they are employing is unsound. For example, someone might object that they have spent a lifetime researching Joe Soap and have found dozens, no … hundreds, of biographical correspondences between his life and the Sonnets. The trouble is, so have other researchers … for Fred Bloggs, Egbert Nobacon, Sid Snipe … What does this imply? It implies that this type of evidence can be collected for several candidates and for this reason it establishes nothing. The notion that biographical connections are informative is an illusion and they have no place in the science of authorship attribution. Their only use is to reinforce the views of those who have already decided on their favourite candidate, who only collect information relating to that person, and who reject any opportunity to test whether or not he/she was actually involved. This is certainly not a scientific attitude yet the Shakespeare authorship question is heavily populated with such investigators who have no intention of modifying their hypotheses as new evidence arises!

I can hear others objecting that there are some very odd things going on, like no one came out at the time and said Mr Shakespeare wrote the Sonnets, and I would agree that it is not what might reasonably be expected for a man whose name is on a collection of first-rate poetry. In fact, there are books devoted to lists of doubts which range from the non-existence of manuscripts to the absence of eulogies on Shakespeare’s death in 1616. Nevertheless, however well researched these works are, suspicion is not evidence. There must be some kind of scientific test that gives every contestant a reasonable chance of being either eliminated or suggested, and if there isn’t one, or if it is inconclusive, then I say we must be ready to admit that a scientific challenge to Shakespeare’s claim to this or that work is not possible. Fortunately, it seems that the RCP test shows promise despite being limited only to those with several works in the EEBO database.

I have often thought it unfortunate that the Shakespeare authorship question has attracted so many who trade with abandon their favourite candidate’s biographical connections to the Shakespeare work. The attraction seems to lie in the thought that since little is known with any degree of certainty, then one is free to fantasize whatever one likes, free from critical analysis. After all, if no one knows very much then one can be secure in the knowledge that there is no compelling evidence to deliver the pain of contradiction. As a  trained scientist this thought depresses me greatly, because there is no easy route through the vast and intricate maze of connections that is the Shakespeare authorship question, and the construction of a convincing scientific argument takes considerable (and I mean considerable) research, care, and judgment.

So what I would like people to do is to think more carefully about the standard of evidence of a proposed method, and to consider ways in which a proposed scheme might allow the ruling in or out of other contestants. To me, any attribution method that does not have this characteristic is defective and biased. I also believe we should be more ready to declare that such and such a question has insufficient data to answer it, rather than succumb to the temptation to over-interpret circumstantial evidence. It’s a pleasurable game but it is unscientific.

I hope that these conclusions don’t extinguish anyone’s enthusiasm for investigation in any way. All I ask for is more thought devoted to the standard of evidence and how far it eliminates other contestants.