Benefitspayment cheater caught using statistics
The other day a woman in the
http://www.dailymail.co.uk/pages/live/articles/news/news.html?in_article_id=494261&in_page_id=1770
The article starts: “Any mother who has given birth to quadruplets needs all the help she can get. So benefits staff were happy to provide support for Victoria Young in raising babies Kier, Kie, Kyla and Conrad. There was just one problem – none of them existed. …”
The benefits staff got suspicious on the seventh child and investigated the crime. By that time, Victoria Young “had swindled more than £40,000 in benefits payments with her bogus brood of seven babies in the space of 18 months.” (direct quote)
It’s natural to ask how data forensics techniques could be applied to this situation. We start with models that describe the population. To test the above claims we need to know about multiple birth probabilities, fertility rates, and birth spacing statistics. I found the needed statistics at a government website: http://www.statistics.gov.uk/downloads/theme_population/FM1_32/FM1no32.pdf
In 2003, only one set of quadruplets survived birth, making the probability of live quadruplets to be approximately 1 in 600,000 (Table 6.4 from the government report, see Multiple Births in Wikipedia also: http://en.wikipedia.org/wiki/Multiple_Births). From the table of statistics, the probability of twins is about 9,001 in 615,787, of triplets is about 127 in 615,787, and of quadruplets is about 3 in 615,787. If we use these values and assume that birth multiplicity is independent of each occurrence of maternity, then we can test Victoria Young’s claims with the conditional probabilities in Table 1 (computed using standard convolution equations).
Table 1: Conditional Probabilities of number of maternities given family size
Number of Children 

Number
of Maternities 
1 
2 
3 
4 
5 
6 
7 
1 
1.00000 
0.014836942 
0.000209 
4.95E06 
* 
* 
* 
2 

0.985163058 
0.029234 
0.000629 
1.59E05 
1.88E07 
2.03976E09 
3 


0.970557 
0.043201 
0.001251 
3.57E05 
6.89055E07 
4 



0.956165 
0.056747 
0.002064 
6.70445E05 
5 




0.941987 
0.069882 
0.003059676 
6 





0.928019 
0.082614512 
7 






0.914258077 
The conditional probabilities are read down the columns. (Asterisks are used to indicate values that could not be estimated from the government statistics.) For example, the probability of three maternities given seven children is in row 3 and column 7 and is equal to 6.89055E07. (This number is in scientific notation and indicates the value of 0.00000068905507, or one in 1,450,000.)
The claimed birth spacing is very unusual also.
We have found that it is always useful to combine the probability evidence together. After all,
In data forensics work we proceed just as I have illustrated above. We create population models. We assume the data conform to the models (i.e., there is no cheating). We test the anomalous data against the model and eventually compute probabilities. It is nearly always the case that the data do not conform precisely to the model, but the models provide sufficient guidance that objective statements concerning the improbability of the extreme data may be made.