Archive for the 'Uncategorized' Category


Tracking your test booklets using RFID


Monday, March 24th, 2008

Last time I discussed where RFID (Radio Frequency Identification) chips are finding their way into schools. I promised I would write about RFID applications in testing next.There are at least three areas where RFID technology could help testing program administrators maintain fair and accurate programs: (1) tracking and counting test booklets or answer sheets, (2) verifying that the correct information regarding the test taker and the test form has been recorded on the answer sheet, and (3) maintaining information about test results and test taking status.

Tracking or accounting for test booklets or answer sheets

RFID technology is widely used in materials tracking and handling control systems because individual items may be counted and inventoried quickly and accurately. The obvious need in large scale testing is for the accurate tracking of thousands of test booklets and answer sheets. You can’t put a chip on a test booklet or answer sheet can you? Recent innovations in RFID technology say that you can.

To illustrate the need, consider the following actual occurrences:

1. In 2005, approximately 27,000 TAKS test booklets were lost. These represented .22% of all the test booklets. This would not be serious except that at least some test questions are usually reused in a later year and if these booklets were copied and used as study materials some students would gain an unfair advantage in a later year.

2. In 2003, 232 test booklets (3.3% of 7,000) were lost in New Mexico.

3. In 2005, a significant number of answer sheets were misplaced in Nevada. After a frantic scramble, the answer sheets were found and the affected seniors were awarded their scores that they needed for graduation.

4. There appear to be a large number of situations where exam booklets or answer sheets are misplaced or even lost. For example, Edexcel in the UK lost exam papers, the “Sats” in 2003 in the UK were stolen and offered for sale on the Internet, and in Jamaica this year the sixth grade tests were leaked.

As an illustration of the requirement to properly track the test booklets, consider the situation with Colorado’s Student Assessment Program (CSAP). As it was reported in the news, we read: “In preparation for the testing, administrators have spent hours counting and recounting exams, outlining strict rules for administering the exams to prevent anyone from getting an early peek, and aligning themselves with the proper procedures.”

Do we really want educators counting test booklets instead of teaching? RFID technology has the potential to handle this problem. If every test booklet is identified with an RFID tag and if every answer sheet has an RFID printed label affixed, an RFID reader could process an entire stack of test booklets and answer sheets in just a few minutes and determine if all the materials are present and which ones, if any, are missing.

I haven’t actually seen RFID chips used for test booklets, but I just renewed my US passport and it has an RFID tag embedded in it. I don’t know where the tag is, and I don’t think that I need to know. The key point is that most test booklets are similar to the passport. There is a cover and several pages that are stapled in the middle (at the binding). Affixing RFID labels onto answer sheets is potentially more difficult because the answer sheet needs to be processed through a scanner. However, RFID printing technology exists for affixing labels to documents. A search with Google using “RFID printing” brings up many links with vendors of solutions to create printable labels.

If RFID tags are embedded into the test booklets then the entrances to storage rooms can be fitted with RFID readers and any unauthorized removal of a test booklet may be detected. Similarly, if secure test materials are tagged with RFID devices when they are reviewed by standards setting teams we can be assured that none of the materials will leave the secured area in an unauthorized manner.

These are standard tracking and inventory control processes where RFID has demonstrated its value in other industries.

Verification of information

The other two main areas where RFID technologies might be applied in testing are in verifying that correct information is recorded concerning the test and in maintaining information about test taking status. I see a lot of testing data from public schools and other industries and nearly all of it contains errors. Often, test taker identifiers are recorded incorrectly. These errors require a lot of time to find and correct. But, if they are not corrected, individual students will be affected. RFID has the potential to help in this area, if we give students RFID badges. Even though we currently have systems for processing these data, I see enough of these errors in testing data to know that current technologies are not solving the problem.

Maintaining test results and status

If the students were issued a “smart card” (i.e., an RFID card that can be read and written), we could record on the smart card the student’s transcript and test taking status. Such a card could be beneficial in recording attendance during testing and in recording the test result. Although it’s not obvious that smart cards improve current testing practices in schools, I could see how the smart card might be beneficial in other scenarios, such as military testing. Smart cards are useful when you need to maintain information in a distributed, rather than a centralized, database.

Security concerns

Only time will tell whether RFID applications will bring improvements to testing, but regardless of the potential applications security concerns persist. If the chips are used to record testing results or identify secure materials, they need to be secured against unauthorized tampering. If the chips are used to access secure test materials, they need to be secured against unauthorized duplication. If the chips are used to identify test takers (i.e., by containing biometric data), they need to be secured against unauthorized retrieval. If the chips are used to confirm tests are taken properly (i.e., the test taker’s identifying information is transferred to the test result), they need to be secured against inadvertent data loss.



‘Sabermetrics,’ baseball and steroids


Tuesday, January 8th, 2008

Prognostications are that Mark McGwire will not be inducted into Baseball’s Hall of Fame this year again, because of admitted steroid use. Here is the URL to the article:

http://www.nationalpost.com/sports/story.html?id=221516

In 2005, McGwire ducked the direct question whether he had used steroids or performance-enhancing drugs (PEDs). Many statisticians think that steroids do not improve performance, because “most baseball skills depend primarily on reaction times and judgments, factors unaffected (or even degraded) by these drugs.” Those who study the numbers, “sabermetricians,” (coined from SABR – Society for American Baseball Research) “think the writers should set aside their biases and moral indignation and look at the facts: there’s simply no evidence steroids or other PEDs actually improve performance in baseball.”

One of the quotes in the article states, “While Bonds’ home run output rose significantly in the years after he supposedly started taking drugs, his profile is strikingly similar to Babe Ruth’s high performance level almost right until the [end] of his storied career, they say.” The actual data do not support this statement as you can see in Figure 1, which compares Barry Bonds offensive performance against three of the other great hitters of the game: Babe Ruth, Ted Williams and Ty Cobb. I used http://www.baseball-reference.com/ as the source for my statistics.

Figure 1: Offensive performance comparison

Comparison of hitters

The OPS+ statistic is a normalized statistic that is adjusted for opponents’ defensive strengths and ball park friendliness to hitters. A value of 100 is average performance. The above statistic shows that Barry Bonds performance was below that of the compared hitters for the first 15 years of his career and then suddenly and dramatically his performance soared for the remaining years of his career surpassing all prior years, when the offensive performance of the other hitters was definitely declining. Admittedly, this is arm-chair forensics, but the data suggest that steroid use did improve Barry Bonds’ performance.

Currently, Roger Clemens has emphatically denied that he took steroids. His trainer, McNamee is reported in the Mitchell report as stating that he injected Clemens with steroids from 1998 to 2001. Clemens is scheduled to testify before Congress and there are allegations of defamation of character being “batted” around.

http://www.bloomberg.com/apps/news?pid=20601079&sid=a0z.L9DGg68A&refer=home

Figure 2 compares Roger Clemens ERA (earned runs allowed) performance against three other great pitchers of their time.

Figure 2: ERA comparison

Comparison of pitchers

The ERA+ statistic is a normalized earned-runs-allowed statistic which has been adjusted for opponents’ strengths and other factors. A value of 100 is average. Clemens’ first year of baseball is 1984 and the four year period of 1998 to 2001 corresponds to his 15th through 18th years of play. The data show that during this time, Clemens’ performance was average. However, these data are unusual because some of Roger Clemens’ best years came after he turned forty, an age when nearly all players have retired from baseball and several years after the alleged steroid use.

While I did not expect to arrive at a definitive answer concerning these two players, I found it intriguing to apply forensic thinking to the current allegations of cheating and doping that are being circulated.

 



The case of the befuddled answer copier


Saturday, January 5th, 2008

About a year ago, a university dean asked for our help. A professor in the college decided to use two versions of the test (each version had the same questions, but in different orders) for the final exam. While grading the exams, one student had a very low score (22%), so the professor graded the exam using the answer key for the other form of the test, resulting in a much higher score (63%). Thinking that the exam was mislabeled, the professor rechecked the labeling of the answer sheet and the test booklet. The professor then asked the student to verify the test booklet and answer sheet, confirming that there was not a mislabeling error of the test form. At that point, the professor suspected the student had cheated. After considering the case carefully, the department faculty was of the opinion that the student had cheated and should be expelled. The dean asked us to provide statistical evidence for or against this allegation in order to support the faculty’s decision.

This was an interesting problem for me. It was the first time that I would analyze such a small data set (less than one hundred tests). It was also the first time that I would apply our probability and deduction methods in the analysis of cross-form answer copying. Most test administrators (i.e., teachers and instructors) ignore cross-form answer copying because the answer copier is naturally punished with a failing score. The answer copier will not dispute the low score because it requires admission of the fraudulent activity. Almost no academic research has been published for cross-form answer copying. I suspect there is not much research because there is no such thing as a count of identical incorrect responses (the staple statistic for nearly all answer-copying statistics) in cross-form answer copying analysis, even though the probability derivations are fairly straightforward.

After building the computational procedure, I found one extremely similar pair of tests (probability less than one in one trillion squared). I will denote this pair of tests as “# 32” and “# 121”. The score for test #32 was 22% (approximately equal to the guessing proportion). The score for the test with the alternate answer key was 63%. For illustrative purposes I have aligned the two test responses below.

Table 1: Aligned responses for extremely similar tests

Aligned Responses

You will notice that beginning with question #26, all the responses are identical (that’s 49 questions in a row!). The response is shown in bold if the response is correct. It is highlighted in gold if it is identical for both tests and if it is incorrect. It is highlighted in tan if it is identical for both tests and if it is incorrect. The statistical evidence confirmed all that the faculty suspected. We reported the result. The University decided to let the test score stand and not expel the student. However, it is almost certain that student #32 failed the course and, from this point on, would have to be very careful and not be caught again.

It is interesting to consider what the result might have been if test #32 were indeed mislabeled. In this situation, we have 65 identical answers with 20 that were incorrect and 45 that were correct. The probability of the similarity is less than one in one hundred billion. So, we have the same result (except the probability is not quite as extreme).

The seven mismatched questions above provide two very important clues. First, if test #32 were mislabeled, these seven non-matching questions on test #32 would be incorrect. When we compare this fact with the additional observation that all the same questions on test #121 were correct, we are left with an inference that student #32 is indeed the answer copier. Our source-copier analysis gives us odds of 4,486 to 1 that student #32 is the answer copier.

The second clue leads us to believe that the answer sheet was in fact labeled correctly. A class of statistics, known as person-fit statistics, assesses whether the test response pattern is consistent with expected test-taking behavior. We have developed one such statistic derived from item response theory, which measures score consistency. When we computed this statistic on test #32, we found the test to be aberrant with an extreme probability of .0001. In order to understand the nature of this extremeness the statistical contribution to aberrance for each test question was computed and plotted (shown in Figure 1).

Figure 1: Illustration of aberrance for test #32 (in question order)

Aberrance for Test 32

The values in the plot are approximate z-scores for the aberrance statistic. The responses that are least consistent with the score that was awarded on the test (assuming the test is mis-labeled) correspond to the points having the largest z-scores. It just happens that these are the same questions where test #32 did not match test #121 (shown using orange squares). We conclude that test #32 does not conform to the expected test taking model, and that the non-conformance is the result of answering seven of seven questions correctly for form 1 where the two tests had mis-matching answers. We conclude that student #32 did indeed have access to test form #1, while taking the test. Student #32 would have been better off doing his or her own work, rather than blithely copying from a neighbor.



Forensics analysis moves to online games


Tuesday, December 11th, 2007

Cheating in MMO (Massively Multiplayer Online) games is on the rise, and “to fight back, game developers have taken a page from banks and credit card companies. They’re using fraud-detection software to analyze the rushing stream of events that occur in an ordinary MMO day, in search of something fishy.”http://www.wired.com/gaming/virtualworlds/news/2007/11/mmo_cheatsThe above article is interesting in the data forensics context for a few reasons:

  1. The principles of data forensics are stated clearly,
  2. There is a pervasive need for detection methodologies,
  3. We can learn from other disciplines in the fight against cheating,
  4. The distinction between “games” and “real life” is blurring, and
  5. Just as forensics methods are cross-disciplinary, so are cheating methods.

The gamers are modeling their detection software from the banking and credit card industries, by “by creating a model of how players normally behave during a game.” The software then recognizes a deviation from the norm and flags it. This is the essence of forensics detection.As an example of “normal test-taking” behavior, consider the histogram in Figure 1.

Figure 1: Histogram of test start times

Histogram 1

In the Figure above, most tests start between the hours of 7:00 am and 5:00 pm (17:00). However, there are a few tests that are beginning between the hours of 12 midnight and 2:00 am. This seems very strange and unlike normal test taking behavior.The forensics analyst recognizes that cheaters often repeat the same behavior and repeat the same mistakes. For example in the above data, the distribution of “after hours” testing (i.e., when the test center is normally dark) was not random. Instead there were just a few test sites where this behavior was occurring. As a consequence, those test sites could be detected. Data from one of the sites is shown below in Figure 2.

Figure 2: Anomalous test site with after-hours testing

Figure 2

What is amazing from Figure 2 is that even for this anomalous test site, it is clear that the “after hours” tests were unusual. While I do not know what actually happened, it appears that an individual at the test center allowed late-night access for some test takers. There could have been a legitimate reason for these tests being taken at these times (i.e., special testing sessions were arranged). On the other hand, such strange data could easily be the result of test fraud (i.e., getting test-taking assistance at late night in order to avoid detection by proctors).In the above example, I have illustrated how a “normal test-taking” model can be built and then used to detect unusual and anomalous data. After detection, the investigator then seeks an explanation. As Arthur Conan Doyle expressed through his detective, Sherlock Holmes, “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.” http://thinkexist.com/quotation/once_you_eliminate_the_impossible-whatever/220272.html



Welcome


Wednesday, November 7th, 2007

Welcome to the Dennis on Data Forensics blog.



HOME :: SERVICES :: RESOURCES :: COMPANY :: PRESS :: LINKS