Archive for the 'Cheating detection' Category


Eight Years of Improving Security


Friday, October 21st, 2011

By: Steve Addicott, Caveon Vice President

October is an important month for Caveon. Eight years ago in October, 2003, several assessment industry veterans formed a small consulting company focused solely on improving the security of our clients’ test programs.    That company is Caveon Test Security!

Fast forward to 2011, and it’s gratifying to consider what this entrepreneurial group of test security zealots has accomplished.  Since that fateful October day, we have

  • conducted over 50 Security Audits of leading test organizations and vendors,
  • flagged and removed tens of thousands of internet-based risks, and
  • conducted statistical analyses of over 30,000,000 test instances for many of the largest, most important test programs in the world.

As I consider the number and breadth of these engagements, perhaps it is worth sharing a few of the core values under which we always operate:

Confidentiality

Throughout our years of operation, one fundamental operating principle has always applied:  client confidentiality.  We never reveal the details of our client engagements without the express approval of our clients. Our clients require and appreciate this sensitivity as we investigate security incidents and provide reports on our forensic analyses. This is not secrecy– this privacy stems from respect for our clients and for the right to privacy of individuals and organizations.

Innovation

We constantly strive to improve means and methods for strengthening exam security. We are always interested in sharing the nature of our work.  Not only do we share our methods and science with clients, client stakeholders, TAC members, educational measurement researchers, and other appropriately interested parties; we are committed to furthering the science around test security. We regularly present at conferences and webinars where we openly share our Caveon approach, theories and methodologies. In fact this last year, we have presented at conferences in Phoenix, Orlando, Chicago, Seattle, Washington DC, Hong Kong, and Prague.

Conservative Recommendations

When we conduct an engagement, our approach is to focus on the situations and incidents that are most egregious, as evidenced in the data and the results that we analyze. We highlight those problems that are most readily identified, documented, and ideally, resolved. Dealing with these problems effectively will have the greatest positive impact to the overall validity and security of test results. This reasonable approach helps our clients, most of which suffer from ever-constrained budgets and resources, effectively concentrate their time, resources, and dollars where the likelihood of inappropriate test taking is highest.

Lastly, our growth and success is directly attributable to a few overarching principles—We always strive to exceed our clients’ expectations, comport ourselves honorably, provide valuable services, and share, as openly and honestly as we can, recommendations for improving the fairness and validity of our clients’ test programs. These principles result in proven, practical protection for our clients, and we intend to follow them for another eight years!

Please Submit Your Comments Below. Thank you!



BEAUTIFUL PRAGUE AND A VERY SUCCESSFUL ATP CONFERENCE


Friday, October 7th, 2011

By: John Fremer, President, Caveon Consulting Services

The third European ATP Conference concluded the last week in September in the spectacularly lovely city of Prague in the Czech Republic and it was a rousing success.  Attendance was 225 besting the planners’ target of 200 attendees.  The keynote talks were of exceptionally high quality and there were a large number of productive and well-attended sessions.  The weather was also as good as one can imagine and the city welcomed us in every way.

I pay special attention to how the prevention of cheating and test piracy is addressed at any conference that I participate in and I was struck by the substantial increase in attention from even a year ago to the topic both in formal sessions and in conversations with other attendees. Part of the reason seems to be a high level of awareness of developments in the US, especially in our state assessment programs. There is also a strong country specific set of reasons related to security breaks or cheating episodes in critical programs that have received extensive media coverage. As is the case in the US, once a cheating story breaks in the UK, the Netherlands, or other European countries, it tends to get a great deal of coverage that can last months or more.

There was a good deal of attention to “authentication,” improved ways of using biometrics, proctor training, and closer monitoring of the testing process to make it harder for test takers to invalidate our best efforts to ensure fair testing. The degree to which testing transcends borders within Europe and in the larger world was also emphasized. Steve Addicott of Caveon and Aimee Rhodes of the Chartered Financial Analysts spoke to a packed house on the international aspects of testing and security. Cheaters and pirates can be based in one country in the morning and another in the afternoon if you are resourceful enough to shut down or limit the place where their day started. The situation has been compared to the arcade game “Whack-a-mole,” where as soon as you hit one varmint, another pops up out of a different hole. I like that metaphor as it reflects my view of these unscrupulous enemies of fairness in testing.

Several of the keynotes really impressed me. Two were given by very successful entrepreneurs. Madan Padaki from Bangalore, India, CEO of MeritTrac in an address entitled “The 500 Million Dream: Building a Nation” described the progress made in India raising 400 million people out of poverty. It is an astonishing story as is the development and growth of MeritTrac to be a major provider of testing services in a ten year period. Madan did not say his path had been easy. Rather he indicated that he might not have made the attempt, if he had realized the challenges that he would face.

Another extraordinary session was given by Lucian Tarnowski; the driving force behind “Brave New Talent,” a social media based way of nurturing and locating talent. Lucian is all of 27 years old and talks about digital immigrants, i.e., most of the people now working in assessment. We are “newly arrived” to a world with so many ways to be connected. It was not that way when most of us were in school or starting our careers. Digital natives, by contrast, grew up in this world and it is very familiar to them. Lucian describes his own Dad who has stayed with his typewriter as his way of composing and communicating as a “digital refugee.” I emerged from that session convinced that I am way overdue on my promise to myself to use social networks wherever it will help me keep in touch with the colleagues, clients, and fellow professionals with whom I share interests and goals.

Another session, a plenary in the form of a debate, saw Cor Sluijter of CITO do a very fine job defending our efforts to produce high quality tests that serve valuable purposes against Donald Clark of Learn Direct. Clark pointed out a number of flaws in testing and argued that assessment is not “fit for purpose” in the 21st century world. Clark’s criticisms were thoughtful ones, but Sluijter held his own and all of us who attended welcomed the fact that both did their best with the help of Eugene Burke of SHL who moderated with élan to show us different sides of a meaty and much talked about issue.

I have not captured all of the ATP Europe Conference, but I hope I have conveyed some of the substance and spirit.  Next year it will be in Berlin in mid-September.  The date for next year’s conference has not yet been set.  You will surely see a note with the date on this blog as well as in Caveon’s Newsletter “Cheating in the News.” If I get my own social media act together, you might get a tweet from me about it.  I like to think this particular digital immigrant can still learn new tricks.



Empowering Schools to Use Data Forensics


Friday, September 30th, 2011

By: Dennis Maynes, Chief Scientist, Caveon Test Security

(The following is an excerpt from an invited talk that was presented to the US Department of Education, September 1, 2011.)

It was sometime after we started Caveon, that I realized the primary goal of conducting security analyses was the strengthening of exam security, not catching cheaters. This is a message that resonates very well with the testing program managers with whom I have interacted. They agree that the primary goal of security actions should be to obtain trustworthy test results, which occurs when the exams are administered securely and with integrity. Disciplining cheaters is important and supports this goal, but it is only a means to an end.

Exam security can be strengthened in two ways, and both should be used: (1) Prevention of cheating, and (2) Detection and discipline of cheaters which will result in deterrence.

Prevention of cheating is gained by implementing effective security processes through policies and procedures. An important element of this effort is the periodic review of security processes and how well they have been implemented.

Detection and discipline of cheaters occurs through (1) performing regular forensic analysis, (2) qualifying the anomalies, and (3) imposing sanctions and invalidating scores.

Deterrence results when security actions and consequences for cheating are publicized.

It’s important to realize that security is a process, not a state. As an example, I have an alarm system at home. Installation of an alarm system does not mean that my home is secure. Only by arming and testing the alarm system can I be ensured that it is functioning properly. Speaking of alarm systems, I am delighted when no one breaks into my home. Just because there were no break-ins, does not lessen the value of the alarm system. I have had clients who felt that web patrolling and data forensics monitoring had no value because we did not detect security breaches. The non-existence of security breaches does not lessen the value of the security processes that have been implemented.

Except for some fraud laws, there are very few laws regulating cheating. It is difficult to prove and there is no physical evidence of material loss or harm. I often hear the phrase “Prove that I cheated.” In fact, I recently saw a headline in the papers expressing the same idea. It’s important to realize that state departments of education do not need absolute proof of cheating. They have an obligation to ensure that tests are administered securely and with integrity. In order to meet this obligation, states require a “preponderance of evidence” in order to act, not absolute proof. However, the departments of education must treat students and teachers fairly, and they must communicate policies clearly.

Because security is a process, it is important to have a ready-prepared security breach response plan, before the breach occurs. It’s not a matter of if the plan will be activated; it’s only a matter of when the plan will be activated. The planning process helps the department of education to have a focused and coordinated response for conducting investigations, imposing discipline and, of utmost importance, communicating with the public and the media.

Without such a plan, the department of education must create a response to the security breach in a potentially haphazard manner. The press is very good at uncovering haphazard and hastily prepared communications.

In summary, state departments of education are empowered to use data forensics wisely and effectively when they have implemented security policies, processes, and procedures which enable them to administer tests securely and with integrity. Regular data forensics monitoring allows states to measure and manage security risks that are inherent with all forms of high-stakes testing.



Courage Required


Tuesday, September 13th, 2011

By Steve Addicott, Vice President of Sales, Caveon Test Security

Over the past few weeks, I have participated in several planning sessions with Caveon clients who contract with us to analyze test results.  This service, Caveon Data Forensics ™, represents a proactive means to better protect their programs by identifying statistical anomalies that may indicate cheating.

While each of these programs is different (state education, IT certification, medical licensure, construction certification, etc.), it’s interesting to me that they face common challenges in confronting test fraud.  For each of these programs, test results matter…they really matter…in making important decisions in the lives of test takers (and in education, teachers and principals).  Thus, the integrity of the test administration matters greatly, too.

My overarching impression?   Tackling test fraud head-on is not for the faint of heart.  It takes commitment—a genuine, unwavering commitment to fair and valid test results—to say “We know there is a subset of our test-taking population that is taking shortcuts, and we’re going to do something about it.”

These days, everyone is busy.   So, why would any sane test program leader willingly add to his/her workload?  The results of our Data Forensics analyses do just that.  We identify:

  • candidates/students that may require invalidations;
  • test centers/schools that merit investigations; and
  • items/exams that should be retired and/or revised.

The committed leaders we work with understand that this is what is required to be able to stand in front of stakeholders and proclaim that their test administrations are fair and valid.

Another important takeaway I’ve gained is that a successful Data Forensics program requires the cooperation and coordination of many  groups.  Last week, I met with a client, a large state department of education.  Its leadership, in order to ensure the data forensics program possessed real teeth, sought cooperation with several other state departments:

  • Legal, to ensure any sanctions resulting from the data forensics would hold up in court;
  • Communications, to ensure sound, consistent messaging to the media and public alike;
  • Inspector General, for conducting investigations; and
  • Professional Practices, in case sanctions might be brought against a state certified educator.

This sort of cross-organizational coordination is not easy to facilitate, but critically important in the fight for fair and valid testing.

If you’re considering how you can augment the security of your program, you might find one of our company webinars to be a help.   In “Don’t Shoot The Messenger”, three Caveon clients present the good, the bad, and the challenging in instituting program invalidations through data forensic analyses.  You can get a copy of the webinar slides here:  http://caveon.com/df_blog/ .



Trojan Items and Answer-key Arbitrage


Sunday, March 2nd, 2008

Today is the first day of the annual ATP Conference (Association of Test Publishers). This afternoon I will present a workshop titled, “Strategies and Tactics for Limiting Item Exposure.” We will be exploring innovative ideas for protecting tests and items from theft. It’s easy to understand why test publishers are concerned about test theft. High-quality items are expensive to produce and represent a substantial investment. Item development costs of $1,000 or higher per item are not unusual. In an afternoon, a thief can compromise an investment of $250,000 or more, easily. Most testing professionals will state that item theft is their number one security concern. I discussed this previously in: What is your top security concern?

I can’t share the entire workshop content with you in this short essay. But, I can share with you Gene Radwin’s (of EMC Corporation) intriguing idea of answer-key arbitrage and Trojan items. The idea was briefly mentioned in: Student outwits FCAT with secret pattern. Just as the Trojan horse was the Greeks’ surprise weapon for outwitting the people of Troy, we hope to outsmart users of brain-dump content using Trojan items.

The basic idea of the Trojan item as developed and presented to me by Gene Radwin (email: radwin_gene at emc.com) is to place very easy items on the test which are miskeyed. If a test taker gives the miskeyed answers (and not the correct, easy answers) we have strong evidence that braindump content is being used. The fundamental principle is to create a test-within-a-test to detect test fraud. We booby trap selected items by changing them so that a different answer choice is now correct, and the compromised answer is incorrect. Without knowing which items are booby-trapped, the brain-dump user proceeds in ignorance, until detected. Just to illustrate, consider a math item that I “borrowed” from the SAT practice test.

Table 1: Example of a Trojan item

Example of trojan item
We do not expect the brain-dump user who has memorized the “Exposed” item to notice the small change in the “Trojan” item. As a result, the cheater will give the originally correct, but now incorrect, answer “C,” and at the same time the honest test taker will give the correct answer “E.” The change in the answer key gives us a leverage or arbitrage point, creating a powerful difference in the statistical expectations.

In order to be effective, several Trojan items will be required on the exam. I haven’t done a rigorous analysis of the statistical power of the procedure, but my current intuition suggests that ten to twelve questions will be needed.

We recently analyzed data where one individual was suspected of having prior access to the test content. Six miskeyed items were present on the exam and we found that the suspect answered all the miskeyed items correctly (i.e., with the wrong answer key). Using item response models, we analyzed the “score” for the miskeyed items. (We do not use standard regression techniques because the data are not normally distributed, being highly constrained and skewed.) These data are shown in Figure 1.

Figure 1: Analysis of 6 miskeyed items

We see two extreme data points in Figure 1, corresponding to the suspected exam and another exam (they had probabilities of one in 5,000 and one in 1,000, respectively). The expected score on the miskeyed items was approximately two. We note that there is no correlation between the raw score on the test and the score on the miskeyed items.

In the above example, analysis of miskeyed items detected a potential testing irregularity. When Trojan items are specifically designed as described above, we expect to see a strong negative relationship between the Trojan items and the total score. In other words, high scoring individuals will provide the correct answer and not the original answer. This negative relationship improves our ability to detect users of brain-dump content.

In addition to my own analyses, one of our clients has told me of great success in using these techniques. For obvious reasons, the client does not want brain-dump users to know which tests are treated with Trojan items and how their cheating is being detected. When cheaters realize they are being punished for using brain-dump content, they will quit using the content. Then we will be satisfied. We just want test takers to do their own work and demonstrate their own ability when they take tests.



Are identical answers to exam questions proof of cheating on tests?


Monday, February 18th, 2008

When it comes to supporting an allegation of cheating on tests, there is rarely better statistical evidence than having two (or more) tests with identical sets of responses, or identical answers. Having a great interest in this topic, I have read carefully the abstracts of Rice University Honor Council meetings where these types of allegations are taken very seriously. In several instances of alleged academic fraud, the Honor Council has found the evidence of identical solutions and identical answers to be compelling.

“The Rice Honor System was created by students in 1916. That it has functioned so well for so long is a reflection of the trust and respect that Rice students show to one another and to the University. It is one of Rice’s most highly valued traditions and a vital part of your education–education in responsibility and integrity.” http://honor.rice.edu/

In one instance, the Council minutes read:

Witness 1, the professor for the class, stated that he believed the similarities between the True / False answers and the essay answers given by Student A and Student B to be strikingly similar. He … presented a statistical analysis of the probability of this occurring in certain situations.

In the above case, despite having a probability analysis, the Honor Council did not find that the honor code had been violated (i.e., cheating was not found).

In another instance, the Honor Council had a different finding:

Some members felt that the identical answers on some portions of the exam were beyond coincidence or having similar notes or studying together. Members were suspicious of the fact that these similarities would arise after the students used different sources of information when answering the questions. … Some members were not convinced by the explanations …

Despite denials of cheating in the above situation, both students were found in violation of the honor code.

Here’s a Google search link if you wish to read some of these abstracts.

It is evident from these two abstracts that the Honor Council attempts to find plausible explanations for identical answers and excessive similarities between test questions. It is also evident that the Honor Council may act without having definitive proof. As an example of the degree of “proof” or evidence that may be required to take action in a case of suspected cheating, consider this statement from the University of Western Ontario:

It is particularly important to understand that the conclusion that a student committed a scholastic offence does not have to be supported by evidence beyond a reasonable doubt. In an exam writing situation, that means that a decision maker may conclude that cheating took place, even if it is possible that two people got some identical answers by chance.

The observation that two tests have identical answers is very reliable evidence as defined by the criterion I proposed in my most recent post, because the observation is (1) factual, (2) objective, (3) credible, and (4) defensible. We require that the evidence have one additional attribute before believing that cheating probably occurred. The evidence must be strong.

In order to evaluate the strength of evidence of identical answers on tests, we require the probability of the observed responses. At Caveon, the probability for the observed item responses is estimated using item response theory. We compute this probability by multiplying all the probabilities together of the selected responses (we assume the selected responses are conditionally independent) and then normalizing the product by the marginal probability of the observed score. Formulas for computing exact probabilities are difficult to derive and program, which means that most practitioners who encounter these situations will rely upon judgment and intuition in the same way the Rice Honor Council does.

I have pasted in a table of sampled probabilities for an 18 item test, below. The probabilities are calculated knowing the score that was obtained on the test. So, if we know a person answered all 18 items correctly the probability that another person who answered all 18 items correctly would match is equal to one. If the answer was correct, it is highlighted in gold in the table.

Probabilities of identical tests

Even though I routinely evaluate these types of probabilities, I have been surprised by some instances of identical response data. For example, the probability of an identical test when all items are answered correctly is 1 (as in the first row of the table). But, the probability of an identical test when all but one or two questions are answered correctly may be as high as .10 or .25 (see the second and fourth rows of the table). On the other hand, if several questions are answered incorrectly, the probability of an identical test may be 1 in 100 million or even smaller. The wide variation in these probabilities is a function of the number of correctly answered test questions and the selected responses.

If the probabilities of some test response patterns are sufficiently high (because the tests are easy or the examinees are very proficient) and if we have a large enough group, we might expect to see many identical tests. Probability computations for the number of observed identical tests can be very difficult. This is an instance of the “birthday problem” with unequal probabilities.

At the beginning of this discussion, it appeared that we had a relatively straightforward and simple problem. It often occurs with statistics that many apparently simple problems become very complex, very quickly. The analysis of identical answers for two exams is one of those problems. The answer to the question with which we began the discussion must be: We cannot prove that cheating occurred when we have identical answers for two test instances, but in many situations we can obtain very strong, reliable evidence leading us to conclude that cheating occurred and the conclusion would be right, nearly always.



Can you prove cheating on tests using statistics?


Monday, February 11th, 2008

There is a children’s game known by various names as “Whisper,” “Secrets,” or “Gossip” where a secret is shared and passed from one player to the next. The last player hearing the secret says it aloud, often with hilarious results. These same distortions happen in the news media, as journalists cite other reports or each other. Such a misquote from the Star-Telegram concerning additional security announced by the TEA (Texas Education Agency) for the TAKS (Texas Assessment of Knowledge and Skill) caused me to pause and reflect about using statistical evidence to “prove” that someone cheated on a test.

The reporter wrote, “Among other security measures: … Scramble field test questions on tests to provide proof if someone is copying someone else’s answer sheet.” (Italics added.) http://www.star-telegram.com/news/story/433614.html. Being well aware of the controversy surrounding the use of statistics, alone, to prove cheating, I immediately doubted the accuracy of the above statement. Actually, on June 7, 2007, Shirley Neeley announced that “the Texas Education Agency today will immediately initiate the following: … analyze scrambled blocks of test questions to detect answer copying…” TEA later clarified that the scrambling would only involve field test items. The Dallas Morning News was quick to criticize the scrambling plan, but I applauded TEA’s intent to detect cheating behavior using statistics.

We naturally ask whether statistical evidence can be relied on to detect cheating. Many authors have expressed the opinion that statistical evidence must be corroborated by eye-witness accounts before making allegations of cheating. I can understand this position if the statistics are not reliable. In my opinion, reliable evidence must meet the following conditions:

  1. It must be factual,
  2. It must be objective,
  3. It must be credible, and
  4. It must be defensible.

If statistical evidence meets the above conditions, I believe that it can be relied upon, whether corroborating eye-witness accounts are available or not. Statistical evidence is

  1. factual when it is based on test result data (an actual record of the test event),
  2. objective when it provides a statistic with a probability statement,
  3. credible when the statistics have been shown to work because the models accurately depict actual test taking, and
  4. It is defensible when the underlying science withstands scrutiny.

An additional fifth criterion the evidence must meet for taking action on a suspected instance of cheating is that the evidence must be strong. Statistical evidence is strong when the calculated probabilities are so small that we no longer believe the observed data are the result of normal test taking. Statistics can provide guidance for determining how strong is strong enough to take action, but ultimately the establishment of a probability threshold (i.e., the strength of the statistic) is a matter of policy that must be answered by the testing program administrator.

It is important with any statistical investigation to choose statistics that are well-suited and designed for the task at hand. For example, if the concern is that answer sheets are being modified, then erasure counts should be analyzed. Having analyzed over one hundred data sets for a wide variety of clients including state Departments of Education, admissions tests, certification programs, and licensure exams, I can unequivocally state that answer copying is the predominant means of cheating on tests. Therefore, it is especially relevant in this discussion concerning the reliability of statistical evidence to discuss answer copying and statistics that are designed to detect answer copying.

As you reflect upon the principles that I have outlined, I would ask you to consider the data in Table 1. The table contains differing probability values that a testing program administrator might be asked to evaluate. These are sampled answer-copying statistics (i.e., counts of identical answers) from a test having 240 items. With this many items on the test, the central limit theorem will generally apply so I have included a Z-Score in the table, as a point of reference.

Table 1: Sampling of test similarity statistics

Number of identical answers Expected number of identical answers Standard Deviation Z-Score Probability Index

168

81.3

7.2

12.0

30.3

171

102.3

7.4

9.3

19.9

130

76.4

7.1

7.5

12.4

154

107.7

7.4

6.3

9.5

128

87.9

7.3

5.5

7.3

108

74.3

7.1

4.7

5.5

107

75.1

7.1

4.5

5.0

120

89.4

7.3

4.2

4.6

115

86.1

7.3

4.0

4.2

128

103.9

7.4

3.3

3.1

At Caveon we deal with extremely small probability values, so we typically express those using “an index” where the probability is one in 10 to the power of the index (p=10-index). The most extreme case in Table 1 has a probability of one in 10 to the thirtieth power. These data are definitely not due to normal test taking.

Assuming that you accept the statistical evidence as being reliable, the decision needed by you, the testing program administrator, is how low in Table 1 should you go? Where do you set the cut point? These data illustrate if you set the cut point too low, you might accuse some individuals of answer copying without having strong evidence. If you set the cut point too high, you might allow several individuals who have cheated to escape discipline.

I will elaborate more on this topic, next time. Until then, may your tests remain secure.



Trouble in Section K


Thursday, February 7th, 2008

Elf mistress Heloise entered Elvin’s office (Head of Section K) quickly. “For the eighth week in a row, the reject rate from Section K is three times the rate from the previous twelve months,” she said, handing the weekly quality report to Elvin. She continued, “I was so impressed when your section scored higher on the elf proficiency exam than any other section in the Mechanical Doll Department nine weeks ago that I awarded your elves with assemblage of gears and levers, but this is unacceptable.” Heloise crossed her arms and waited for a reply.

Elvin wrinkled his brow and frowned ruefully. This was unwelcome, but not unexpected, news. He picked up a thick folder and opened it. He leafed through one report after another and muttered, “We have eliminated transportation, storage, tools, assembly, parts, fatigue, and sabotage as explanations. There’s only one conclusion. At least one, and maybe several, of the elves in Section K is incompetent. But how can that be? Is the proficiency exam flawed?”

“Let’s find out,” replied Heloise. And together, they visited the proficiency exam designer. After explaining the problem, the proficiency exam designer shook her head and said, “You need to see the data forensics analyst.” The data forensics analyst listened with deep concentration, scanned page after page of test results, whistled softly, and finally exclaimed, “It looks like elves in Section K have cheated on the elf proficiency exam. Now, how to prove it?” he said mysteriously, and then immersed himself in complex symbols and calculations. Heloise and Elvin excused themselves, but the data forensics analyst didn’t even turn his head as they left. Much later, the proficiency exam designer listened intently while the data forensics analyst described his plan for catching the cheaters in Section K.

Three weeks later, the schedule for the quarterly elf proficiency exam was posted throughout the Mechanical Doll Department. On the day of the test, elf examiners throughout Santa’s workshop reported to a different department than usual to conduct the examination. For example, elf examiners from Remote-Controlled Toys reported to the Games and Puzzles Department. It so happened that an elf examiner from each of the other departments reported to the Mechanical Doll Department. Some administered the elf proficiency exam, and others just watched and waited. All test responses were recorded meticulously. After a long and grueling day, all the elves had been tested.

The data forensics analyst worked all night, making calculations and graphs and charts. At the break of day, Heloise and Elvin knocked at his door. “Enter!” they heard. They stepped into a bizarre scene: scraps of paper were strewn about, charts with bars and circles were plastered on the walls, and a wizened elf was humming in the midst of chaos. “Done!” he shouted. “Oh, it’s you. Well, I have the answer,” he said with absent-minded aplomb.

Then noticing their impatient expressions, he said, “Oh, let me explain.”

“None of the examiners are involved. I know this because there are no patterns of inconsistent answering associated with the examiners. It was important that no examiner give the test to any elf with whom he or she normally associates.

“There were extremely similar test answers between four elves in Section K. It is almost certain that they did not take the tests independently,” The data forensics analyst concluded.

“But, how can that be?” queried Heloise. “They were all watched carefully. There was no way that they could have shared answers or communicated during the test!”

The data forensics analyst minutely explained, “I suspected this might be the case. So, I asked the proficiency exam designer to create two test forms. She very carefully changed a few of the questions between the first and second test forms, so that the correct answers would be close, but not the same. The master test booklet for the first form was locked away in test booklet storage. The proficiency exam designer kept the master test booklet for the second form with her at all times. Even though the elves in the Mechanical Doll Department were given the second form of the test, our four culprits answered all the changed questions with answers from the first form of the test. There is no doubt in my mind. They broke into test booklet storage and memorized the test answers!”

Elvin brought the four suspected cheaters into Heloise’s office. Each elf vigorously denied any wrongdoing. At that point, the data forensics analyst dimmed the lights. He splayed an infrared beam across the hands of each suspected cheater. All of their hands glowed eerily with a blotchy red hue. Then, using gloves to handle the master test booklet from storage he shined the beam on the pages. They glowed red. He touched the booklet pages against his bare arm. Shining the bean on his arm, it also glowed with a blotchy red hue. Heloise barked, “You are red-handed! Now stand still while I consider your punishment!”

“Tomorrow,” pronounced Heloise. “You will report to the master of the Quality Department for ‘R and R,’ where you will begin the repair and refurbishment of all toys in the Rejected Toy Warehouse. You will work there until all the broken toys are operating perfectly and to the satisfaction of the master of quality.”

“Elvin,” Heloise continued. “Section K can no longer be responsible for assemblage of gears and levers. Your section must repair its damaged reputation from producing so many rejected mechanical dolls. Even though you will not receive replacements for these culprits, your production quota will remain the same.”

Elvin wrinkled his brow and frowned ruefully. This was unwelcome, but not unexpected, news. He remembered another time, when he was an impetuous, lazy elf; and when he had cheated. The punishment seemed harsh, but he had learned his lesson and was glad that the cheaters had been apprehended.

Moral: Just as dishonesty betrays the cheater, it injures all who are around him.

Addendum: The cheating detection and prevention techniques described in this story are among best practices. I have described use of the data forensics methodologies in two actual cases we have analyzed at Caveon: The case of the waylaid answer key and The case of the befuddled answer copier.

The State of Mississippi has put together a very nice power-point presentation on test administration auditing and monitoring: www.mde.k12.ms.us/ACAD/osa/DTC_Test_Security_Fall_07.pps

If you are interested in learning more about these or other solutions to test fraud please contact us, at Caveon Test Security.



Moore’s law favors the cheater


Monday, January 21st, 2008

In 1965, Gordon Moore of Intel observed that transistor densities were doubling roughly every 2 years. Since then the exponential nature of faster, smaller and more powerful computational units has continued. Initially, the observation was a remarkable statement of trends. Later, it became an expectation. And, it is now considered an unrelenting challenge for high technology. http://en.wikipedia.org/wiki/Moore’s_law

The trend of faster, smaller and more powerful electronic devices has spilled over from computers into all forms and types of electronics. Notably, consumer electronics commonly used by cheaters on tests are no exception. While Internet-capable PDAs have been available for some time, it was in 2007 that Apple introduced the iPhone, a cellular phone integrated with a browser and digital camera. It would be surprising if iPhones and text-messaging are not replaced with even more sophisticated cheating technology within the next few years. Those who administer tests must anticipate the appearance of these newer, faster, and more easily concealed cheating devices.

Small, fast devices appeal to two broad classes of consumers: (1) persons who want mobile and wearable electronic devices, and (2) persons who have a need for spy gadgetry. Wearable computing (http://www.media.mit.edu/wearables/) trends are very interesting, including smaller keyboards (http://www.frogpad.com/), head-mounted displays (http://en.wikipedia.org/wiki/Head-mounted_display), USB watches (http://www.amazon.com/Timex-Data-Link-Watch-T5C291/dp/B000B545B4), and PDAs and ultra-small computers (examples are: Nokia’s Internet Tablet http://reviews.cnet.com/pdas/nokia-n800-internet-tablet/4505-3127_7-32309517.html and OQO’s Model 02 http://en.wikipedia.org/wiki/OQO).

Spy gadget shops sell tiny pin-hole cameras, but our research at Caveon indicates that the tiny digital cameras have insufficient resolution to capture high quality images of test questions. (See this review of the Casio WQV-1CR Wristwatch camera http://reviews.cnet.com/watches-and-wrist-devices/casio-wqv-1cr-wristwatch/4505-3512_7-2660570.html.) While we found that the pin-hole spy cameras did not have sufficient resolution to steal a high-quality image of a test, we did confirm that the hand-held scanner DocuPen (http://planon.com/) could be used very easily to steal a paper-and-pencil test. There is a clear trend for higher resolution digital cameras in smaller packages, such as the BenQ 8 megapixel camera which is 4 inches by 2.5 inches by one-half inch thick http://blogs.zdnet.com/digitalcameras/?p=151.) We expect to see eight megapixel cameras in cell phones before long due to Samsung’s announcement of a CMOS package for cell phones (http://blogs.zdnet.com/ip-telephony/?p=2737).

In 2007, we saw the introduction of ExamEar, an earpiece with a radio that was specifically marketed to cheaters on tests. This caused a lot of concern in Great Britain (http://news.bbc.co.uk/1/hi/education/6951524.stm, see also http://www.engadget.com/2007/08/20/examear-helping-students-make-the-best-of-exam-day/) and the website owners decided to cease operations. The ExamEar domain is now for sale. But, it would be very surprising if this technology does not resurface. In fact, two Chinese students were recently caught cheating on a test when they couldn’t remove their earpieces and needed medical attention (http://www.chinadaily.com.cn/china/2007-12/31/content_6361740.htm). We don’t know where they obtained these earphones, but they may have been ExamEar models.

Cheaters are usually engaged in one of four behaviors which may be bolstered by technology. These are:

  1. Communicate with or copy from another (requires a miniature radio, cell phone, or other signaling device),
  2. Smuggle test taking aids into the testing event (requires a miniature high-capacity data retrieval device with visual display, such as a PDA, iPod, or DataLink wristwatch)
  3. Steal a copy of the test content (requires a miniature camera)
  4. Engage in impersonation (requires an ability to tamper with or defeat identification safeguards)

Many of the current devices used by cheaters (e.g., cell phones, DocuPens, and PDAs) can be easily slipped past most test administrators, because they are so small. One of the gadgets shown at the 2008 CES (Consumer Electronics Show) which may cause concern for test administrators is the Bug Labs do-it-yourself modular electronics kit (http://gizmodo.com/346789/bug-labs-store-launches-monday-minus-wi+fi). It seems that the device will not include Wi-Fi initially, but it has support for a wide range of other functions, including cameras and cell phones.

Another recent innovation is the Bionic Eye (http://www.msnbc.msn.com/id/22731631/). This is a contact lens that features LCD circuitry which allows projection of an image into the wearer’s field of view. Researchers at the University of Washington have tested it successfully on rabbits. These researchers are the same people who developed the virtual retinal display (http://en.wikipedia.org/wiki/Virtual_retinal_display). It will be sometime before these contact lenses are used by people, but the technology is fascinating.

Another interesting product introduced in 2007 was the FlyPen, a pen-top computer. The company’s marketing literature states, “Meet the FLY Fusion Pentop Computer, the only pentop platform to offer a complete set of high-speed homework solutions and innovative note-taking applications for students of all ages. This next-generation FLYTM system harnesses the same sophisticated Anoto technology as its predecessor, enhanced by PC connectivity, four times the memory, on-the-go calculating functionality, and a 1,000-word Spanish dictionary. Best of all, students can upload handwritten notes and drafts, digitizing them instantly into Microsoft Word documents or emails.” (See http://www.flyworld.com/presskit.pdf.) It will be interesting to see if students use this device for stealing test content.

Because consumer electronics are changing and adapting so quickly, it is very important that testing program administrators review current policies, procedures, and practices to ensure that these devices are not used by cheaters to gain an unfair advantage.



Improving your odds at winning the lottery


Friday, December 28th, 2007

Beginning New Year’s Day 2008, lottery ticket retailers in Ontario will have a new set of rules to follow if they will continue selling lottery tickets. “Most of the changes are the result of Ontario ombudsman Andre Marin and his scathing investigation of the province’s lottery corporation.”

http://canadianpress.google.com/article/ALeqM5jEvfDbJoJ7C3KoaNxekmT8DuUDNA

The previous set of rules allowed lottery ticket retailers to steal lottery winnings from those to whom they sold the tickets. An example of the scam is described in this story where after three years, bilked lottery ticket purchasers were finally awarded their prize.

http://www.ctv.ca/servlet/ArticleNews/story/CTVNews/20071219/opp_lottery_071219/20071219?hub=CTVNewsAt11

In the above situation, the retailer apparently exchanged a non-winning ticket for the winning ticket when the purchasers presented the ticket to claim their prize. The problem is that the retailer is in a position to game the system because two functions are performed: selling the tickets and verifying the tickets. A clever and practiced cheater can manipulate such a situation.

This “man-in-the-middle” attack illustrates an obvious weakness in most paper-and-pencil testing scenarios. An answer sheet may be misdirected or even falsified by an adult who is acting in a trusted test administration position.

For example, it is common practice in elementary schools for teachers to review the student’s answer sheets and make sure that the marked answers are dark, legible, and between the lines on the scan sheet. This practice allows a teacher to not only “clean up stray marks” but also to tamper with the answer sheet. An example of the procedure is described in this document from Dallas Independent School District: http://www.window.state.tx.us/tspr/dallas/ch02h.htm

Another example is more blatant. A teacher could very easily fill-out blank answer sheets for students and then replace the student’s answer sheets with the prepared answer sheets. Erasure or light marks analyses are routinely performed on answer sheets that are scored, but it is unlikely that “fouled” answer sheets (which would also be returned) are subjected to the same analysis.

As a variation of the above exploit, it is well-known that a certification exam can be manipulated by a proxy test taker in a similar manner. The test taker and the proxy test taker both appear at the test site. They have both registered to take the test, and both will take the test. They switch names on the answer sheets (e.g., the proxy test taker puts the name of his or her employer on the answer sheet). If the answer sheets are controlled by document identifiers, the two can breach the security by exchanging answer sheets if they are together when they receive their test materials.

The above vulnerabilities (and others that use the same theme) may be addressed with revised procedures, just as procedures are being revised for the Ontario lottery. For example, instead of stray marks being cleaned up at the school they may be cleaned up at the processing center (where those reviewing the answer sheets do not have a motive for tampering). All returned answer sheets could be scanned, allowing for any fouled answer sheets to be detected. If the answer sheets have document control numbers provided using a readable encoding (such as a bar code), then every control number should be accounted for and none should be duplicated (prevents unauthorized destruction of fouled answer sheets).

To prevent document exchange (such as in the above scenario with the proxy test taker), a digital scan of the test taker signature on the answer sheet may be preserved. This allows for verification of the signature on the answer sheet with the signature on the application. Another way to prevent document exchange between two test takers is to distribute test taking materials to candidates after all are seated, and to collect testing materials from candidates before any leave their seats at the end of the testing session.

While preventative measures are usually the best, analysis of the data may detect these types of attacks. For example, analysis of lottery wins by retailers should have detected there was a problem long before the complaints started to pile up. In the same way, it is very difficult for a person who is tampering with the test results to conceal the effect of their work.

In summary, every aspect of a test administration system and procedure should be carefully reviewed under the assumption that some individual will attempt to exploit that system, and then reasonable security measures should be taken.



HOME :: SERVICES :: RESOURCES :: COMPANY :: PRESS :: LINKS