by Dr Barry R. Clarke

It is currently a popular practice to seek out biographical connections between a particular candidate and a Shakespeare play. However, the fact that this can be carried out for several personalities is sufficient to undermine the method as an authorial determinant. The only secure way is to examine verbal correspondences between a candidate’s canon and a target text but, as we shall see, this too has its difficulties.

Statistical stylometry is one such method, and works on the basis of obtaining samples of textual data from several candidates for a test against a target text. Certain attributes of this data are kept constant across all candidates and target text, for example, sample size (equal size texts), genre (for a comedy play under test all samples are taken from comedy material), and chronology (all samples and the target are from the same period). A count of certain linguistic items is then performed for each sample, each item serving as an authorial marker. For example, the proportion of words ending in ‘-ish’ or those beginning with ‘dis-’ might be taken as separate linguistic items. This count is carried out for many different items and are then combined in a correlation calculation for each candidate against the target text to see how closely their counts match.

The flaw in this statistical method when carried out on a Shakespeare play is the dubious assumption that it is the work of only one hand. If a scribe, editor, compositor, or a dramatist has made any alterations to the word spellings of the original text then this assumption falls. In this case, there might be several contributors to a Shakespeare play and what one is actually counting up in the target text is the average effect of several unidentifiable hands. Clearly, a count of this kind, which relies on the entire sample of a Shakespeare text, could easily be a corrupt endeavour.

A more reliable method is to compare phrases and collocations. Being more complex than words, they are less vulnerable to editorial intervention. However, it is not enough to find parallels that only a certain candidate and the Shakespeare play share. Again, this type of success can be obtained for several candidates. Ideally, phrases and collocations in the target text are checked for rarity against a contemporary database of searchable texts, such as Early English Book Online (EEBO). A record is then made of which candidates used them. This kind of rarity actually eliminates candidates. To argue that a candidate contributed to a play, there needs to be a sufficient number of such rare matches from the candidate’s canon both before and after the assumed date of the play. This suggests mutual borrowing, which is so unusual that the argument can then be made that the candidate contributed these particular rare phrases or collocations. Of course, the more accurate the assessment of the target text date, the more assured is the claim for contribution.

Here, there is a clear difference in emphasis between statistical stylometry and the rare collocation method. The former performs a count on an entire sample, which if it is a Shakespeare play might contain several unidentifiable hands. However, the latter method only makes claims about particular phrases and collocations. It surgically separates these elements from the rest of the play, and the only assertion made is that a particular candidate contributed those particular elements.

Needless to say, if a candidate to be tested has insufficient words in EEBO then there can be no test and I suggest no assertion about this person’s contribution to a Shakespeare play can ever be made.

For more information on the new method of Rare Collocation Profiling (RCP) see Barry R. Clarke, ‘A linguistic analysis of Francis Bacon’s contribution to three Shakespeare plays,’ PhD thesis, Brunel University, UK, 2013, which can be found online.

A summary of this work ‘Developments in the Shakespeare authorship question’ is also available online.