Archive for the ‘exam security’ Category

Can we slow the flow of money to test thieves?

Friday, October 28th, 2011

By: Dennis Maynes, Chief Scientist, Caveon Test Security

This week, Julian Assange, founder of WikiLeaks, announced that his organization is running out of money and may be forced to cease operations by the end of 2011. On October 24, 2011 Reuters reported: “WikiLeaks says ‘blockade’ threatens its existence.” (Source: The blockade occurred when the major financial processing firms suspended their agreements with WikiLeaks, after WikiLeaks released thousands of secret US diplomatic cables in December, 2010, and threatened the Bank of America with the release of internal documents which resulted in a 3% decrease of Bank of America’s share price.

Assange claims the blockade is illegal and has filed anti-trust lawsuits against Visa and Master Card. On the day before the blockade, WikiLeaks received $135,000. Currently, WikiLeaks receives less than $10,000 per month. The net effect of the blockade to WikiLeaks has been the loss of 95% of its operating cash.

Whether you agree with WikiLeaks’ goals or not, it is clear that WikiLeaks has routinely infringed upon the rights of copyright holders by distributing information and documents without authorization. If it is not obvious why this story has important test security ramifications, let me make it clear: (1) many websites, operated by pirates and thieves, infringe upon the copyrights of secured exam content, (2) it has been very difficult to effectively shutdown this activity, which is costing testing organizations millions of dollars per year in lost test development expenditures, and (3) if payment processors would agree to cease providing services to these thieves and pirates, many of them would fold. The WikiLeaks story demonstrates that copyright infringers will have a difficult time remaining in business without the support of payment processors.

At Caveon, we have been very successful in removing copyrighted exam materials from the Internet. Often our success is based upon respectful and courteous requests to unintentional copyright infringers. However, respect and courtesy do not work against pirates and thieves. At that point, potentially expensive legal action must be commenced.

An alternative to expensive legal proceedings is to work with payment processors to protect their brands. For example, Visa does not want any transaction to bring disrepute upon its brand (source: If we, as an industry, can convince the payment processors that the sale and distribution of pilfered exam content is disreputable, we may be able to slow the flow of money to the test thieves and protect valuable exam content.

What do you think? How can we help payment processors understand that their services facilitate the distribution of stolen exam content? Should ATP (Association of Test Publishers) contact the payment processors, on behalf of its members?

Several months ago, Ben Mannes, Test Security Director at ABIM, expressed this thought: “ATP should be trying to get a meeting with Victoria Espinel [White House intellectual property czar], bring 1-2 industry security experts, and state the case as to why exam content is a vital component to our nation’s infrastructure requiring heightened public sector IP enforcement.”


Please Comment Below, Thank you for Reading

Eight Years of Improving Security

Friday, October 21st, 2011

By: Steve Addicott, Caveon Vice President

October is an important month for Caveon. Eight years ago in October, 2003, several assessment industry veterans formed a small consulting company focused solely on improving the security of our clients’ test programs.    That company is Caveon Test Security!

Fast forward to 2011, and it’s gratifying to consider what this entrepreneurial group of test security zealots has accomplished.  Since that fateful October day, we have

  • conducted over 50 Security Audits of leading test organizations and vendors,
  • flagged and removed tens of thousands of internet-based risks, and
  • conducted statistical analyses of over 30,000,000 test instances for many of the largest, most important test programs in the world.

As I consider the number and breadth of these engagements, perhaps it is worth sharing a few of the core values under which we always operate:


Throughout our years of operation, one fundamental operating principle has always applied:  client confidentiality.  We never reveal the details of our client engagements without the express approval of our clients. Our clients require and appreciate this sensitivity as we investigate security incidents and provide reports on our forensic analyses. This is not secrecy– this privacy stems from respect for our clients and for the right to privacy of individuals and organizations.


We constantly strive to improve means and methods for strengthening exam security. We are always interested in sharing the nature of our work.  Not only do we share our methods and science with clients, client stakeholders, TAC members, educational measurement researchers, and other appropriately interested parties; we are committed to furthering the science around test security. We regularly present at conferences and webinars where we openly share our Caveon approach, theories and methodologies. In fact this last year, we have presented at conferences in Phoenix, Orlando, Chicago, Seattle, Washington DC, Hong Kong, and Prague.

Conservative Recommendations

When we conduct an engagement, our approach is to focus on the situations and incidents that are most egregious, as evidenced in the data and the results that we analyze. We highlight those problems that are most readily identified, documented, and ideally, resolved. Dealing with these problems effectively will have the greatest positive impact to the overall validity and security of test results. This reasonable approach helps our clients, most of which suffer from ever-constrained budgets and resources, effectively concentrate their time, resources, and dollars where the likelihood of inappropriate test taking is highest.

Lastly, our growth and success is directly attributable to a few overarching principles—We always strive to exceed our clients’ expectations, comport ourselves honorably, provide valuable services, and share, as openly and honestly as we can, recommendations for improving the fairness and validity of our clients’ test programs. These principles result in proven, practical protection for our clients, and we intend to follow them for another eight years!

Please Submit Your Comments Below. Thank you!

Item Exposure Is Not the Problem — Poor Security Is

Friday, October 14th, 2011

By: David Foster, CEO, Caveon Test Security

Item exposure during an exam in the testing world is often viewed as a bad thing, because it seems obvious that item exposure leads to item over-use which in turn leads to item compromise. It is common for psychometricians to limit item exposure, defining it as either a too-high absolute number of presentations of the items in a test, or a too-high rate of the items presented on tests. Unfortunately, there is no scientific research or even unscientific guidelines, or even reasonable casual suggestions, about how many exposures are too many, or which rate of exposure is too high.

It does not follow that item exposure is the same as item compromise. In fact, I’ve seen items compromised with an extremely small number of presentations. Some items have even been compromised prior to the first test being administered!

In my opinion, the notion that item compromise results from item exposure—as defined above—leads  to improper conclusions, decisions, and ineffective procedures. I have a few reasons for this opinion, a couple of which I’ll give here. First, item exposure is absolutely necessary. It is obvious that no test can be effective unless its items are exposed during the exam. Test designers even let examinees view an item multiple times encouraging them to return to and review previous items again and again. Second, item compromise has very little to do with the definitions of item exposure given above. Consider this simple example: Suppose that an item was shown to one million test takers and was presented on every exam administered. This would be considered a very high number of exposures along with a 100% exposure rate. But, suppose that none of those examinees were able to share the item with others. In this simple example, the item remains uncompromised and perfectly secure, and can be continued to be used on the exam.

If we wish to reduce item compromise, the example illustrates that limiting the number of presentations or rate of presentations of an item is not as important as the methods used to secure the items, to protect them from theft, and to keep them from being used for cheating. For this reason we need improved item security, which means better ways to keep items from being stolen and used for cheating on subsequent exams. We need methods to detect when an item is truly compromised and then immediately to take it out of service. Instead, we often see stubborn adherence to a century-old model of relatively unsecure test administration, and believing that keeping an item from being presented on a test is a sensible way to secure it.

It is certainly possible to improve the way we secure items. As examples, there are protective item and test designs available, and certainly better test monitoring procedures, that we can use. And perhaps we can learn a little from other industries as well. Consider the problem with the theft of music over the Internet. No one would suggest that music is stolen because it was listened to by too many people. Instead, we see serious efforts to protect the music, to keep it from being stolen, to detect when it is stolen, and to punish those that are responsible. We should be doing the same.

We welcome comments below!

Empowering Schools to Use Data Forensics

Friday, September 30th, 2011

By: Dennis Maynes, Chief Scientist, Caveon Test Security

(The following is an excerpt from an invited talk that was presented to the US Department of Education, September 1, 2011.)

It was sometime after we started Caveon, that I realized the primary goal of conducting security analyses was the strengthening of exam security, not catching cheaters. This is a message that resonates very well with the testing program managers with whom I have interacted. They agree that the primary goal of security actions should be to obtain trustworthy test results, which occurs when the exams are administered securely and with integrity. Disciplining cheaters is important and supports this goal, but it is only a means to an end.

Exam security can be strengthened in two ways, and both should be used: (1) Prevention of cheating, and (2) Detection and discipline of cheaters which will result in deterrence.

Prevention of cheating is gained by implementing effective security processes through policies and procedures. An important element of this effort is the periodic review of security processes and how well they have been implemented.

Detection and discipline of cheaters occurs through (1) performing regular forensic analysis, (2) qualifying the anomalies, and (3) imposing sanctions and invalidating scores.

Deterrence results when security actions and consequences for cheating are publicized.

It’s important to realize that security is a process, not a state. As an example, I have an alarm system at home. Installation of an alarm system does not mean that my home is secure. Only by arming and testing the alarm system can I be ensured that it is functioning properly. Speaking of alarm systems, I am delighted when no one breaks into my home. Just because there were no break-ins, does not lessen the value of the alarm system. I have had clients who felt that web patrolling and data forensics monitoring had no value because we did not detect security breaches. The non-existence of security breaches does not lessen the value of the security processes that have been implemented.

Except for some fraud laws, there are very few laws regulating cheating. It is difficult to prove and there is no physical evidence of material loss or harm. I often hear the phrase “Prove that I cheated.” In fact, I recently saw a headline in the papers expressing the same idea. It’s important to realize that state departments of education do not need absolute proof of cheating. They have an obligation to ensure that tests are administered securely and with integrity. In order to meet this obligation, states require a “preponderance of evidence” in order to act, not absolute proof. However, the departments of education must treat students and teachers fairly, and they must communicate policies clearly.

Because security is a process, it is important to have a ready-prepared security breach response plan, before the breach occurs. It’s not a matter of if the plan will be activated; it’s only a matter of when the plan will be activated. The planning process helps the department of education to have a focused and coordinated response for conducting investigations, imposing discipline and, of utmost importance, communicating with the public and the media.

Without such a plan, the department of education must create a response to the security breach in a potentially haphazard manner. The press is very good at uncovering haphazard and hastily prepared communications.

In summary, state departments of education are empowered to use data forensics wisely and effectively when they have implemented security policies, processes, and procedures which enable them to administer tests securely and with integrity. Regular data forensics monitoring allows states to measure and manage security risks that are inherent with all forms of high-stakes testing.

Hindsight is 20-20: Introducing the security breach post mortem

Monday, April 7th, 2008

Hindsight: Perfect understanding of an event after it has happened; – a term usually used with sarcasm in response to criticism of one’s decision, implying that the critic is unfairly judging the wisdom of the decision in light of information that was not available when the decision was made.

After every single airplane crash or incident, the FAA routinely conducts exhaustive investigations to determine the cause of the crash. The purpose of the investigation is “to identify safety deficiencies and unsafe conditions which are then referred to the responsible FAA office for evaluation and corrective action.” The amazing air safety statistics in this country are primarily the result of these extensive analyses. Setting all sarcasm aside, the FAA has learned that hindsight is 20-20. A perfect understanding of the event is often attainable. And from that understanding, air safety has improved.

I believe that all testing programs can learn from this example. If each program conducts a “security breach post mortem” security processes can be improved. A good practice in security is learning from your own mistakes. A better practice is learning from the mistakes of others. A best practice is creating processes so that those mistakes are never repeated.

As an example of what might be possible with a security breach post mortem, consider two recent news stories. Recent news from the UK suggests that many immigrants are being coached to pass the spoken language and listening portions of the citizenship tests, even though they cannot speak English. The BBC went undercover and filmed “an appraisal” which the undercover reporter understood to be the process for passing the language test. The reporter didn’t even need to speak or listen in English. The video is extremely fascinating. In other news, the results of Boston’s promotion exams for firefighters are being discarded and all the candidates will be required to retest, following a security breach in November 2007 when cell phones were used to cheat. The retesting is required because the investigation was inconclusive and the cheaters were not uncovered.

It is likely that both of the above breaches would have been prevented if proper security safeguards were in place. The purpose of the post mortem is to learn the security strengths and weaknesses of the testing program, so that security may be improved and strengthened. In my experience, we generally do not obtain all the information possible from a security breach investigation. For example, in Boston the investigation was conducted to determine who cheated. While some improvements to security should happen as a result of the investigation, I believe that a serious post mortem would reveal even more information in order to prevent similar breaches in the future. The post mortem allows us to learn from our mistakes.

In an earlier essay, I suggested that testing programs should, “Read stories of cheating in the news to learn how the media might portray your cheating incident negatively.” This is one form of learning from the mistakes of others. In addition to studying security breaches in the media, several other methods exist for learning best security practices and processes from others. Some of these are (1) attending presentations where security breaches are discussed, (2) talking directly with program personnel who have been involved in security breaches, and (3) working with experts who study and analyze security breaches and best security practices. At Caveon, we are doing our best to expand our expertise so that we may effectively assist all testing programs in their efforts to strengthen their test security.

If you have never conducted a security breach post mortem you are probably wondering how you might start.

The first step determines the extent and nature of the security breach. When the breach involves cheating during the test or tampering with the test results, a data forensics analysis is invaluable in making this assessment. When the breach involves the distribution and sale of protected test content, an Internet investigation or Caveon Web Patrol can determine the scope and size of the breach. When the breach involves a breakdown of security procedures and processes, a post-mortem security audit will be needed. Some security breaches may require all three information-gathering activities.

The second step performs a cause-and-effect flow analysis or a fault tree analysis. This analysis establishes where the test security vulnerabilities exist and how those vulnerabilities were exploited by the miscreants.

The third step identifies necessary changes in the testing program’s security processes. These changes should be first considered as suggestions or recommendations. They should be prioritized. They should be assessed for effectiveness using security threat models. They should be evaluated against required resource allocations so that their practicality can be measured in terms of the program’s budget and expertise.

Finally, proposed recommendations are presented to the executive management team with an implementation roadmap. The executive report should clearly state that the purpose of the post mortem is to improve and strengthen test security. A post mortem analysis is not conducted with the purpose of apprehending cheaters and imposing discipline upon test frauds. These actions may result from the investigations. But, the post mortem provides the tactical and strategic initiatives to prevent test fraud in the future.

Caveon is willing and able to assist you in these efforts. We wish you the best as you consider how to learn from your own mistakes and the mistakes of others.

Wise men profit more from fools than fools from wise men; for the wise men shun the mistakes of fools, but fools do not imitate the successes of the wise. – Cato the Elder

Hindsight is indeed 20-20 and is not to be scoffed at when we use it in order to improve.

The incident of the pilfered test booklet

Monday, March 31st, 2008

Georgia bit her lip nervously as she peered out the rear-view mirror of her car. She had already been idling 10 minutes longer than allowed and campus security would be returning shortly. Then, she saw them, exiting the library. Ignacio was detained by a man in uniform. Vincenzo broke into a run, sprinted to the car, and hopped in. “Step on it,” he said. Georgia sped away. “What about Ignacio?” she asked. “Don’t worry. I have it right here,” he replied as he slipped a digital camera from beneath his jacket, extracted a memory card and handed it to Georgia. She grinned. Now, she would be able to pass the test and become an intern at Waldo & Cramer Industries. Once inside W & C and with her computer skills, her current employers would soon be very, very happy.

The above fictionalized account is based upon an incident which Caveon was asked to investigate in 2004. Our client wrote,

“We had an incident over the weekend concerning the XYZ exam …. The examiner contacted our office during the 3rd section of the examination. Two examinees were acting suspiciously throughout the exam. They had questions about how long the breaks were and what would happen if they returned late from the break. During the break, the proctor noticed that one of the test booklets was not on the applicant’s desk.

The proctors noticed that the two examinees went to their car and came back late from the break. When addressed about the booklet, they said they did not have the booklet and then dropped it from their jacket and said, ‘there it is’. They were allowed to continue, although the proctor told them their scores would be invalidated. They were addressed by the proctor and campus police after the exam and questioned. One of the examinees was released as he stated he had nothing to do with the incident. The other fled the scene in a car that was waiting for him, as he was being escorted to check his car to see if there were images on his cell phone of the test booklet. The names of the suspects are Inigo and Vinny.” (Actual names have been changed.)

Results of Investigation

Caveon conducted an investigation into this incident and we discovered that the two individuals, Inigo and Vinny, were enrolled at a nearby university but they were not enrolled in courses of study or college majors that would be consistent with taking the admissions test connected with this incident. Furthermore, we determined that one of these students had lost his passport during the summer and the other had his driver’s license stolen. The information was corroborated and led us to infer that both of these students were victims of identity theft. Some other individuals committed test fraud in their names.

We also discovered that the test thieves were given the opportunity to steal the test because the test site administrator had not collected testing materials during breaks or the lunch period, as per test administration policy and procedures. One of these individuals, “Inigo,” had taken and failed the test approximately six weeks earlier. We presume that this individual determined that an opportunity existed to sneak the test booklet out of the testing site at that time.

In our report, we concluded that the imposters (or identity thieves) took the exam with the intent of exposing the exam content for one or more of the following purposes: for themselves, on behalf of another individual(s), for mass distribution, or for financial gain. We also suggested that, with suitable revision to the test administration policies and procedures, the likelihood of a security breach could be reduced.

Forensics analysis

Another phase of the analysis was to statistically analyze the test responses. It is difficult to infer “intent to steal” from data analysis, but the data are revealing. One of the statistics that we use in Caveon Data ForensicsTM is known as the bimodality statistic. With this statistic, we assume that most individuals answer the test questions consistently according to the observed performance (or a single level of ability). However, we allow the possibility for some individuals to answer the test questions according to two levels of ability (or in two different modes, hence the name bimodality). Using these statistics we found that Vinny’s test was somewhat aberrant (at the probability level of one in 2,000) and that Inigo’s test was extremely aberrant (at the probability level of one in 200 million). These data, along with comparative “normal” data at the same ability levels, are shown in Figures 1 and 2.

Figure 1: Comparison of Vinny’s test with a normal test

Figure 2: Comparison of Inigo’s test with a normal test

The data confirm that both of these individuals took the exam at two levels of ability. The probability of the high level is shown using the yellow line. The probability of the selected response using the low and high levels is shown using the blue and pink lines, respectively. We infer that Inigo demonstrated more information and knowledge about the test content than Vinny, but both of them appeared to be answering the test questions for some other purpose than obtaining a score and an actual measure of their knowledge of this content area. It appears likely that these individuals were connected with the content area being tested.

This incident is extremely instructive. It illustrates that not all test takers are as they appear and that an unfair advantage may be gained in many ways. I had always wondered whether there would be a motive to steal an identity for the purpose of taking a test and now I know.

Security insights from ATP 2008

Monday, March 10th, 2008

The ATP (Association of Test Publishers) conference this year did everything a good conference should do. We networked. We shared industry information. We discussed best practices. We met with clients and vendors. And we created, renewed, and strengthened friendships. Rather than discuss those things, let me share a few observations relating to test security.

Exam security was a hot topic, with many sessions and many serious conversations around test security. Wayne Camara of the College Board asked me, “Was the emphasis on security due to Caveon?” I replied, “I think it is partly due to our outreach effort, and more programs are dealing with security issues.” I think there are deeper reasons.

There were more stories describing successful security efforts this year than I remember in the past. Just to name a few: the FSBPT discussed their breach and resolution in the Philippines, the GMAC caught a proxy test taker in the very act, EMC presented successful risk management cases, and the Mississippi Department of Education has effectively addressed cheating in schools. We celebrate these successes, because they give us confidence that these problems can be solved.

There is deep concern about test and exam piracy. In the past, this concern was primarily expressed by IT (Information Technology) companies. This year many other organizations had the same concern. I heard several instances of exams being stolen from within computer-based testing centers. I have no reason to doubt these reports.

Theft vulnerabilities had been voiced privately in the past, but the discussions were more open this year. I attribute this to at least three reasons: (1) there were new attendees who wanted to expressly discuss security and stayed for the Test Security Summit, (2) the Boston Globe article “Job Exam Piracy Rising,” dated December 26, 2007, gave the topic national prominence, and (3) some presenters disclosed that their entire item banks, including answer keys and digital representations, had been stolen. In the session, “Cheater, Cheater, Pumpkin Eater,” EMC Corporation reported great success in detecting and shutting down test sites where exams are being stolen. Test pirates refused to resell test content because their test sites were shut down immediately after they stole the tests.

To the best of my recollection, there were more lawyers present at ATP this year than any other year. Representatives from at least four different firms had been invited to attend by conference organizers or conference presenters. I have paraphrased some of their very instructive comments below:

“Gather all your evidence in preparation to litigate, but only litigate as a last resort.”

“You can use statistics to invalidate scores and to take other security actions if you can demonstrate that your actions and decisions are made in good faith. The courts are interpreting these actions using contract law and it’s important that your agreements and contracts support your intended actions.”

“All test items are copyrighted, but you must register the copyrights before the items are stolen. Registered copyrights provide stronger protection than unregistered copyrights. There is a special provision in copyright law to protect secure tests for this purpose.”

GMAC and Pearson VUE described initiatives for preventing and detecting imposters. GMAC verifies a candidate’s current photo with the candidate’s registration photo. They attach the photo to the score report. (I call this “testing event authentication.”) Pearson VUE demonstrated Fujitsu’s PalmSecure biometric authentication technology. The readers are priced at around $700, but within reach for secure testing applications.

Gene Radwin and Liz Burns of EMC Corporation captured our imagination. Gene shared his success in detecting users of braindump content using Trojan items. Liz Burns described her security efforts. She visualizes a triangle. At the base of the triangle are honest people who will not lie and will not cheat. At the top of the triangle are those who will cheat if at all possible. In the middle of the triangle are individuals who may cheat depending upon the circumstances. The “at risk group” is where Liz concentrates her efforts.

The Education Division meeting had an interesting discussion concerning the image of testing in education. I think that a positive image of testing is critical. As an example of how incorrect image of testing can be damaging, consider the report that South Africa has effectively banned unproctored Internet testing, because these tests are thought to be unfair, not being secure (reported by Hennie Kriek, President of SHL, USA).

Finally, if you believe that test publishers are cold and dispassionate, let me disabuse this image. I saw a lot of passion and emotion at this conference. Testing professionals are very concerned that tests are administered securely. As an example, Cindy Simmons, State Assessment Director of Mississippi, showed great forthrightness and passion as she described her state’s initiatives to address cheating on the Subject Area Tests.

It’s true there is much work to do. But members of ATP are committed to fairness and integrity in testing. They comprise “the intelligent voice of testing.

The case of the waylaid answer key

Thursday, January 17th, 2008

Recently there have been many reports of lost databases, stolen computers, and misplaced documents. Is it any wonder that tests and exams are also experiencing the same problems? For example, last November in New Zealand the home of an employee of the Qualification Authority was burglarized and a laptop containing math items for the National Certificate of Educational Achievement was stolen. Despite assurances of password protection, the Qualification Authority revised and reprinted 150,000 test booklets:

As another example, the completed answer sheets from an exam for the Arkansas State Board of Cosmetology were lost or misaddressed in the FedEx shipment to the scoring agency. Ninety candidates will have to retake the exam:

Two years ago Caveon’s assistance was sought in dealing with a similar situation. The car of an employee of a major test publisher was stolen. In the car were secured test materials, including an answer key to an upcoming state-wide public school examination. When the car was recovered the answer key was missing. There was not enough time to revise the test. The exam would be administered as scheduled. Our client wanted to know if the answer key was being distributed and if the integrity of the test administration had been compromised.

As we discussed the situation with the client, I was confident that we could detect a widespread breach. But, could we detect a situation when just a few students used the lost answer key? There was no doubt in my mind if the thief knew the market value of the answer key that it would be sold on the Internet. I knew this from first-hand experience. While I was teaching at the University, a dual-campus administration of the test coupled with a time lag between administrations led to the answer key being disclosed. Three of my students obtained the answer key to the exam through a Yahoo chat room. They scored 100% on all the questions, except the essay question, which they refused to answer.

The client gave us the following details about the test. There were 54 questions on the exam with 10 field test items and 44 core items. There were about 2 dozen different forms of the test. The forms all contained the same core items in the same locations, with form differences due to different sets of field test items. Slowly an analysis plan began to emerge. Because the answer key for only one of the forms was lost, we could score the field test items for all the other forms using the waylaid answer key. Scores on the field test items would be the keystone of the analysis.

We assumed that any student using the stolen answer key would not know which items were field test items and which were core items. We also assumed that the student would answer all the items (with potentially a few mistakes) using the stolen answer key. It was easy to determine that a widespread dissemination of the answer key had not occurred. Statistical methodology dictates that statistical tests are performed assuming the null hypothesis (i.e., the answer key was not in play) is true. Under this assumption we found that less than 2% of the tests had “high scores” (i.e., scores above the 95th percentile of the distribution), when 5% were expected. This was very good news. There was not a wide-spread dissemination of the answer key.

Next, we hypothesized that a few teachers or school administrators might have received and used the stolen answer key. Using a probability inversion formula, we rank ordered the schools by the proportion of tests where more than six correct answers on the field test items (using the stolen answer key) were found. We found that the proportion of schools in the upper tail (above 10%) was less than 7% when 10% were expected. This was good news. It meant that if the answer key was disseminated, it was not likely to have occurred through teachers or administrators. (We also visually inspected the 30 most extreme schools for “perfect” scores of 10 on the field test items for all the other forms except the one associated with the lost answer key. Nothing untoward was found in any of those schools.)

Finally, searching for the proverbial needle in the haystack, we hypothesized that a few isolated students may have been able to receive the answer key through personal contact with the thief on the Internet. In order to attack this problem we created a Bayesian probability model, where we estimated the probability that the stolen answer key was used by a particular student conditional upon the test score. Using this model we inferred a 95% upper bound on the proportion of student who used the answer key to be less than .09% (or nine in ten thousand). The five most extreme tests were visually inspected, and not one of them had a “perfect” score on the field test items, using the lost answer key.

The results of the analysis gave our client sufficient confidence to trust the integrity of the test administration. In order to place perspective on these statistical estimates, we note that the estimated bound (i.e., .09%) on answer key compromise is much, much lower than the actual proportion of students who copy from each other in the normal test taking situation. While we could not prove that the stolen answer key had not been used, we concluded the following:

If any students have gained access to the answer key, the data indicate the answer key has not been shared with friends. And, if the answer key was used, its use was isolated.

With 95% confidence, no more than .09% of students used the compromised answer key. It is very likely, in fact, that no student actually used the compromised answer key.

The above situations illustrate the importance of properly securing test materials. They also illustrate that by using innovative and defensible statistical analyses, testing program administrators may know the degree of security risk that is present. The analysis of the waylaid answer key illustrates the power of data forensics in protecting and maintaining exam and test security.

No-Fly List shenanigans

Monday, January 14th, 2008

Just last week a five-year old boy was detained by TSA (Transportation Security Administration) because his name was similar to a suspected terrorist on the no-fly list. The reporter wrote, “A five-year-old boy was taken into custody and thoroughly searched at Sea-Tac because his name is similar to a possible terrorist alias. As the Consumerist reports, ‘When his mother went to pick him up and hug him and comfort him during the proceedings, she was told not to touch him because he was a national security risk. They also had to frisk her again to make sure the little Dillinger hadn’t passed anything dangerous weapons or materials to his mother when she hugged him.'”

On the other hand, 13 News in Indianapolis interviewed a woman, Lisa Skaggs, who described an incident two rows in front of her, where a man occupied the same seat that was assigned to another passenger. The man refused to produce his ID, only showing his boarding pass with the same seat number. The plane was finally evacuated in order to remove the recalcitrant passenger.

A United Airlines representative confirmed that the passenger’s name did not match the boarding pass. In my opinion, the most shocking statement about this incident came from a TSA official. “TSA’s Christopher White believes the system worked. ‘The fact that one of two million may not have a boarding pass that does not match and I.D., does not overly concern us when they’re exposed to all these other layers of security,’ said White.”

It’s not illegal to fly without having an ID. In fact TSA’s regulations explicitly allow for passengers to board an aircraft without an ID. You might find the experience and perspective of Joby Weeks to be interesting in this context:

The fact that boarding passes are an element of TSA’s security and that boarding passes may be printed from home represents a security hole in TSA’s security rules and regulations. This was documented by Senator Charles Schummer of New York, who vividly described how “Joe Terrorist” circumvents the no-fly list, in a letter dated February 11, 2005 to TSA officials.

The insecurity of “print-from-home” boarding passes was demonstrated convincingly a year ago by Christopher Soghoian, a Ph. D. student in Computer Science at Indiana University. The FBI raided the home of Indiana University grad student Christopher Soghoian, who created a Web site that lets users forge their own airline boarding passes. Soghoian said he intended to call attention to an airport security loophole.” See Christopher’s description of the FBI raid here:

There are several security principles that are illustrated in the above scenario:

  1. If security is not implemented properly and has glaring security weaknesses, your organization may receive intense negative attention.
  2. If security is not designed into the overall system, but it is added in after the fact, security holes will be present that will be difficult to patch.
  3. A proper view of security requires understanding the true risk that is represented by anomalous and unusual behaviors (such as understanding what a one-in-one-million anomaly potentially represents).
  4. Simple lists and blindly following ad-hoc rules (such as detaining five-year olds) can make your organization look ridiculous.
  5. When you use elements in your security system that were not designed to provide security (such as print-from-home boarding passes), you are likely to have security holes.

We don’t know why the passenger without the ID refused to present his identification documents. Here are some possible scenarios.

  1. He could have learned how to hack United Airlines’ reservation system.
  2. He could be an actual wanted fugitive who paid for or fabricated a false boarding pass.
  3. He could be a terrorist who was probing airline security in order to learn how to board an airplane without presenting an ID and without drawing attention to himself.

All of these possibilities show the inanity of the TSA comment: “The fact that one of two million may not have a boarding pass that does not match and I.D., does not overly concern us when they’re exposed to all these other layers of security.” We have learned at Caveon that the unusual circumstance is that which requires the greatest care and scrutiny.

A few years ago a large number of test booklets were lost. Even though the large number of lost booklets was a very small percent of the total number of printed booklets, the fact remained that those lost test booklets represented a substantial security risk to the testing program. It only takes one lost booklet to compromise an entire exam. It only takes one or two terrorists out of a million flyers to represent a significant security risk to the public safety.

Caveon Data Forensics is based on the premise that unusual and extremely anomalous data are those that should receive the greatest scrutiny. We are extremely concerned when test takers go outside the country to take tests. We are especially vigilant when tests are extremely similar, even when or especially when they represent a very small proportion of the total tests administered. From my view, the unusual and the anomalous data are those that should receive our highest attention. The comment from the TSA official suggested that such data do not represent a significant worry. In my opinion, such an attitude is short-sighted and imprudent.

The word for today is: steganography

Thursday, January 10th, 2008

Holmes handed Watson a note and said, “This is the message which struck Justice of the Peace Trevor dead with horror when he read it.”

The supply of game for London is going steadily up. Head-keeper Hudson, we believe, has been now told to receive all orders for fly-paper and for preservation of your hen-pheasant’s life.

Justice of the Peace Trevor was struck dead because the note contained a secret message. This is the essence of steganography. “Steganography is the art and science of writing hidden messages in such a way that no one apart from the sender and intended recipient even realizes there is a hidden message.”

In our modern world, we usually use the term “digital watermarking” for steganography when we refer to electronic files that contain hidden secrets. Digital watermarking techniques are being used to verify the illicit distribution of copyrighted photos, with the intent that the copyright holder may receive royalties for the unauthorized distribution. The recording industry has been using digital watermarking in the form of Digital Rights Management (DRM) in order to prevent digital theft of copyrighted movies and recordings.

The big news from the digital music industry is that during the past year, the four largest digital music labels have all dropped DRM from MP3’s that are being distributed on line. Sony BMG was the last hold-out and recently announced that DRM would no longer be used for MP3’s.,39044908,62036088,00.htm

The fact is that album sales have declined. As one article states, “In short, downloads are up, physical sales are down, and downloads are not picking up the slack of lost sales.” In other words, if the future of music sales is in downloads, then the recording houses have very little choice except to remove DRM from downloaded music. The DRM software is distasteful enough to consumers so that they will go elsewhere for music.

Professionals in the testing industry have been talking about “digital watermarking” for some time as a means of protecting tests. However, the term “digital watermarking” is a misnomer because true digital watermarking involves bit twiddling within the electronic content. You can’t twiddle the bits of text files and expect the modification to remain hidden.

Steganography can be used to protect tests by hiding information in the test so that when a stolen copy of the test is acquired (e.g., purchase from a braindump site) the exact copy of the test that was stolen may be verified. In other words, this becomes a means of detecting where and when a test is stolen. This information is used to identify the weak link in the chain of custody, so that the person responsible for the security of the exam when the test was stolen may be identified. The information cannot identify the thief, but it can identify the individuals who were entrusted with the custody of the test at the time of theft.

At Caveon we have engaged in a few projects of this nature on a limited basis. At the request of one of our clients we injected small editorial changes into the text content of selected items and then compared the various versions of the test with stolen content purchased from the Internet. We determined to the client’s satisfaction that the test theft did not occur inside their test development organization. Instead, it occurred after the test was published.

The above work was labor intensive and could only be performed on a small scale. Our research indicates that the point of risk for test theft is at the test delivery sites, which number in the thousands. An effective steganographic system will require encoding hidden information into the test content in order to detect the points of theft in the test delivery channel. A steganographic system capable of providing this kind of detection must be automated and it must be implemented on a wide-scale. This means that potentially thousands of test versions must be generated and the decoding system must be able to reliably determine which test version was stolen.

We have been conducting research and developing algorithms for such a steganographic system. Whatever method is used for hiding information, it cannot affect the performance of the test. It must be truly unobtrusive. This is a big challenge, because modifications to the item text can potentially change the difficulty of the test questions.

If you are interested in this topic you might look at these websites: