Friday, June 10, 2016

How likely is the "five consecutive word rule" to detect "random," as opposed to intentional plagiarism?


I refer to the old fable that if you set enough monkeys at enough keyboards for a long enough period of time, they will (through random typing), reproduce the "Complete Works of Shakespeare," or any other tome.


Is it likely that someone will "copy" someone else's "five consecutive words" through a random process? Or is that a high enough bar that it takes some "doing" to copy it?



Answer



I'm completely sure picks like "as he walked up to", "he screwed his eyebrows and" or "as far as I know" will happen notoriously but they don't constitute plagiarism because they are very common expressions.


Don't count conjunctions, pronouns, particles and prepositions in the "five word" count - you'll start getting correct matches, and they will be exceptionally rare. Include these "generic words" and you'll get a ton of false positives.


(my experience is with writing variations of "dissociated press" program: find a sequence of words repeating within the same text and cut the text there, continuing from the found match, so that it reads smoothly as a sentence but makes for nonsensical text, a run-on story pieced together from random pieces of a different story in a grammatically correct manner. Finding a repeating sequence of three words within a 130k words document was nearly impossible.)


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...