Spurious Investigations Arise From Flawed Statistics

By: Dennis Maynes, Chief Scientist, Caveon Test Security

This summer, many cheating investigations of schools based on forensics evidence have been inconclusive or weak. This is troubling because investigations require time and money, and they can be disruptive. In my experience, the statistics are often questioned and with good cause.

A famous statistician once said, “If you torture the data long enough, they will confess.” (Attributed to Ronald Coase). Forensics monitoring requires inspection of all the data and listing the extreme observations. Because this is done with the intent of FINDING anomalies, extreme care is required. Knowledge of order statistics teaches how to analyze the data properly. Unfortunately, order statistics are not taught in basic statistics courses, which means that many analysts not having knowledge of order statistics tend to torture the data. It’s like saying my basketball team is taller than your basketball team using only the height of the tallest player on each team. Such a comparison ignores the starting five and the height of all the players on the team.

Let me illustrate the concern with an example. Journalists routinely analyze the score gains of schools within the state using regressions. Because students of basic statistics courses are taught that a regression residual exceeding three standard deviations IS an outlier, the newspaper will report all schools having gains of three standard deviations or more. The named schools will receive “extra attention” in the press and by the public. Such attention is not warranted because the fact that ALL schools were examined has been ignored. The three standard deviation rule is correct only if you were to examine one school and one school only, at random. But, the analysis was not restricted to one school, which means that many schools were inappropriately named. The statistics guarantee it.

Journalists are not the only group to make this error. Most forensics reports that I have read have ignored this issue. Using order statistics, statisticians have developed corrections to avoid overstating anomalies. For example, Bonferroni’s correction divides the target probability level by the sample size (http://en.wikipedia.org/wiki/Bonferroni_correction). Using this correction, 4.2 standard deviations, not 3, is the correct threshold to use if 1,000 schools were inspected.

The next time that you see a forensics report in the media, ask yourself, “Was the Bonferroni correction or similar conservative threshold used?” If the answer is no, be very, very careful because the number of anomalies has probably been overstated. On the other hand, if the answer is yes, the investigations are probably warranted and justified.


Leave a Reply