Sourcing Shakespeare in cyberspace: how online sleuthing leads to a manuscript likely used by the Bard. From the article:

[Dennis] McCarthy used decidedly modern techniques to marshal his evidence, employing WCopyfind, an open-source plagiarism software, which picked out common words and phrases in the manuscript and the plays…. “People don’t realize how rare these words actually are,” Mr. McCarthy said. “And he keeps hitting word after word. It’s like a lottery ticket. It’s easy to get one number out of six, but not to get every number.”…

Those techniques may only be the “icing on the cake,” said [Michael] Witmore, who briefly examined an advance copy. “At its core, this remains a literary argument, not a statistical one.”…

To make sure North and Shakespeare weren’t using common sources, Mr. McCarthy ran phrases through the database Early English Books Online, which contains 17 million pages from nearly every work published in English between 1473 and 1700. He found that almost no other works contained the same words in passages of the same length. Some words are especially rare; “trundle-tail” appears in only one other work before 1623.

h/t Scott Newstok