Archive for December, 2007

What is your top security concern?

Saturday, December 29th, 2007

The number one security concern of testing professionals is exam theft and piracy, according to a survey that Caveon conducted at NOCA in 2005. We asked the question: “Which of the following are security concerns for you? (Please check as many as apply).” One hundred participants responded in the following manner:


Number Responding

Proxy test taking


Lax proctoring at test sites


Stealing items, pools, or tests


Posting of secure items on the Internet


Attempts to hack into your item banks


Use of your secure items by training programs or coaching schools


Leakage of items by item writers, reviewers, or other contributors


Given the news article from the Boston Globe, “Job exam piracy rising,” published December 26, 2007, it would be interesting to repeat the above survey.

This is a very important article because while data are not provided to support the headline that exam piracy is really on the rise, it strongly illustrates the impact of exam piracy on the testing industry and the fact that current remedies cannot effectively counter many instances of test theft. This is particularly true for information technology certifications.

I have been studying the problem of exam piracy for a long time, and can offer a few insights. First, the asset that must be protected by exam security is the integrity of the examination process and the credibility of the test result, not the item bank or the test form. Second, the correct perspective of the relationship between certifying authority and test thief is a host-parasite relationship. The exam pirates live and draw from the vitality of the certification, devaluing it with their success. Lastly, a year ago we analyzed the data forensics analyses that we had performed for more than 20 certification programs. We determined that three main factors were directly related to exam piracy: (1) the mission and role of the certification, (2) the test administration model, and (3) the security of the test administration channel.

Protecting the integrity of the examination process – Current legal protections against exam piracy involve copyright and trade-secrecy statutes. Unfortunately, these can only be invoked after the integrity of the test is breached. They usually involve protracted investigations followed by even lengthier legal proceedings. In the meantime, the test is compromised and keeping it in service further erodes credibility in the examination process. The DMCA (Digital Millennium Copyright Act) provides some assistance when the stolen content is accessed through on a US-based ISP. But, legal remedies are few. In fact, legal jurisdiction of crimes committed over the Internet is at times very unclear, compounding the problem.

Host-parasite relationship – A certifying authority such as the FSBPT (Federation of State Boards of Physical Therapy) derives its existence from maintaining and administering the exam. An attack on the integrity exam is an attack against its very existence and must be countered. On the other hand, a company such as Microsoft provides certifications in support of its business. The vitality of such a company is derived from product sales and service, not from the certifications. Thus, as long as attacks on the exam do not adversely affect the core business of the company, it may be able to withstand parasitical infestations. In either case, the parasitical exam pirate bears no goodwill toward the certifying authority and has no compunction in destroying it.

Mission and role of certification – Resources within any organization are deployed according to its core mission or function. In the context of exam security this means that operational budgets and legal expenditures are prioritized accordingly. For example, the lawyers for an organization such as FSBPT will be more willing to tackle exam security issues than will lawyers for the typical IT company. This is because the lawyers for IT companies are involved in patent protection, maintaining business contracts, and other core business functions.

Test administration model – Most high-stakes testing programs administer tests according to pre-determined testing events. A new test (which may use previously administered items) is constructed for each event, thus decreasing the chance that stolen test items will be present on the new test. This practice means that it is more difficult for the exam pirate to profit from the testing program. On the other hand, when the same test forms are kept in service for a protracted length of time, the exam pirate has a distinct advantage in stealing and selling the test content.

Security of the test administration channel – The article from the Boston Globe states, “Technology companies in particular have accepted lower levels of security in order to have testing centers in distant corners of the globe.” The lower levels of security involve contracting the test administrations with third-parties who may have never had a background check, who may be operating cheat sites, or who don’t care exactly how they make money. A rogue test site administrator can very easily steal a test by merely recording every testing session (i.e., with a video camera) and then transcribing it. I believe that some these individuals have discovered how to actually pilfer the test content electronically, avoiding the need for transcription.

Hopefully, thinking about the above observations will help you understand why exam piracy is not going to be solved easily. Some testing organizations are being seriously affected by exam piracy. Only time will tell whether they will be able to successfully ward off the pirates, or not.

Improving your odds at winning the lottery

Friday, December 28th, 2007

Beginning New Year’s Day 2008, lottery ticket retailers in Ontario will have a new set of rules to follow if they will continue selling lottery tickets. “Most of the changes are the result of Ontario ombudsman Andre Marin and his scathing investigation of the province’s lottery corporation.”

The previous set of rules allowed lottery ticket retailers to steal lottery winnings from those to whom they sold the tickets. An example of the scam is described in this story where after three years, bilked lottery ticket purchasers were finally awarded their prize.

In the above situation, the retailer apparently exchanged a non-winning ticket for the winning ticket when the purchasers presented the ticket to claim their prize. The problem is that the retailer is in a position to game the system because two functions are performed: selling the tickets and verifying the tickets. A clever and practiced cheater can manipulate such a situation.

This “man-in-the-middle” attack illustrates an obvious weakness in most paper-and-pencil testing scenarios. An answer sheet may be misdirected or even falsified by an adult who is acting in a trusted test administration position.

For example, it is common practice in elementary schools for teachers to review the student’s answer sheets and make sure that the marked answers are dark, legible, and between the lines on the scan sheet. This practice allows a teacher to not only “clean up stray marks” but also to tamper with the answer sheet. An example of the procedure is described in this document from Dallas Independent School District:

Another example is more blatant. A teacher could very easily fill-out blank answer sheets for students and then replace the student’s answer sheets with the prepared answer sheets. Erasure or light marks analyses are routinely performed on answer sheets that are scored, but it is unlikely that “fouled” answer sheets (which would also be returned) are subjected to the same analysis.

As a variation of the above exploit, it is well-known that a certification exam can be manipulated by a proxy test taker in a similar manner. The test taker and the proxy test taker both appear at the test site. They have both registered to take the test, and both will take the test. They switch names on the answer sheets (e.g., the proxy test taker puts the name of his or her employer on the answer sheet). If the answer sheets are controlled by document identifiers, the two can breach the security by exchanging answer sheets if they are together when they receive their test materials.

The above vulnerabilities (and others that use the same theme) may be addressed with revised procedures, just as procedures are being revised for the Ontario lottery. For example, instead of stray marks being cleaned up at the school they may be cleaned up at the processing center (where those reviewing the answer sheets do not have a motive for tampering). All returned answer sheets could be scanned, allowing for any fouled answer sheets to be detected. If the answer sheets have document control numbers provided using a readable encoding (such as a bar code), then every control number should be accounted for and none should be duplicated (prevents unauthorized destruction of fouled answer sheets).

To prevent document exchange (such as in the above scenario with the proxy test taker), a digital scan of the test taker signature on the answer sheet may be preserved. This allows for verification of the signature on the answer sheet with the signature on the application. Another way to prevent document exchange between two test takers is to distribute test taking materials to candidates after all are seated, and to collect testing materials from candidates before any leave their seats at the end of the testing session.

While preventative measures are usually the best, analysis of the data may detect these types of attacks. For example, analysis of lottery wins by retailers should have detected there was a problem long before the complaints started to pile up. In the same way, it is very difficult for a person who is tampering with the test results to conceal the effect of their work.

In summary, every aspect of a test administration system and procedure should be carefully reviewed under the assumption that some individual will attempt to exploit that system, and then reasonable security measures should be taken.

Testing Event Authentication – Is it right for you?

Friday, December 21st, 2007

Cisco now “requires all exam takers to provide digital photos and digital signatures” when candidates are admitted to take a test.

Cisco states, “This new layer of identity authentication will help to ensure candidate identity and result in increased assurance that individuals are presenting accurate certification records in the marketplace.” In my opinion, it is very important to understand why Cisco felt that the current identity authentication mechanism (presenting a photo id along with the exam registration code) needed to be strengthened.

First, the former system relied upon a proctor at the test site to verify the validity of the identity documents that were presented. It is well known that forgers are able to create false identity documents which are undetectable by all except the most sophisticated verification systems. It is also well known that trained people do not perform this authentication task with great accuracy. After being admitted to the test site, the identity documents are no longer needed. This one-time authentication method relies upon having honest and astute proctors. Besides the fact that the candidate was admitted to the test site, no permanent record is made of the authentication. The act of authentication is not subject to review.

Second, the new system presumably captures a digital photo and signature of the test taker (as opposed to having the test taker bring the digital photo and signature to the test site). This biometric information can now be permanently stored with the test result. It can be recalled on demand. Questions concerning the identity of the individual who actually took the exam and whether that individual is the same as the person presenting the credential derived from the exam can be answered immediately. This new capability would be more properly named “transaction authentication” (borrowing a term from information systems). In other words, the testing event itself is being authenticated, which is stronger than merely authenticating the test taker. Unless the proctor is dishonest, the capture of the digital photo is outside the control of the candidate, meaning that the photo cannot be falsified.

The above article discusses braindumps and cheating, but the primary purpose of the initiative is to authenticate the identity of the test taker. In other words, Cisco is trying to keep proxy test takers or “hired gunmen” from taking tests ( There are websites that proclaim for a few dollars you can “obtain your certification at home without entering the testing site.” They say, “Why waste your valuable time? We can take the test for you.” Through the above initiative, Cisco is taking preventative measures against these people.

Proxy test takers are a potential problem for all testing organizations. It may not be feasible to capture digital photos for your organization, but you should be able to employ some measures for authenticating the testing event. The testing event is authenticated when permanent, verifiable, non-counterfeitable information is stored with the test result. This would typically be biometric information, but non-biometric information may also be used. For example, the British government has implemented “authentication by interview” ( as a method of passport authentication.

If you are interested in the above topic, you might check out other authentication techniques. I have linked to a few below:

PassFaces (strong passwords):

BioPassword (authentication by typing):

Several biometrics are listed on this page:

Here’s an interesting article on “voice risk analysis” or “lie detector by phone”:

The above techniques are interesting and they are gaining momentum, but in order to authenticate the testing event you need permanent, verifiable, non-counterfeitable information. Some of these techniques do not provide that kind of information. In my opinion, Cisco’s initiative is very good. It will be interesting to see future advances in testing event authentication.

Can unproctored online assessments be trusted?

Wednesday, December 19th, 2007

As more and more online courses are developed and offered, instructors of online courses need to consider the potential for cheating on the assessments. The following article describes some measures being implemented by FGCU (Florida Gulf Coast University):

One of the measures is to track IP addresses and determine if more than one test is being submitted from the same computer. Other measures include randomization of answer choices and random selection of items from an item bank. The software also prevents the test questions from being printed. Kathleen Davey, Dean of Academic Technology, said, “”You can’t prevent everything from happening. You must rely on the integrity of the individual students up to a certain point.”

Ultimately, the above statement is true. If a test taker is sufficiently determined he or she will be able to successfully cheat on the test or steal the test content.

I have been very interested lately in the security of online assessments. They are becoming more prevalent and indications are that they will become a dominant technology in testing if security concerns can be adequately addressed. The problem is that most online assessments are essentially unproctored assessments. Until unproctored Internet tests can be delivered securely, they should not be used for high-stakes exams. By definition, an exam has high stakes if passing or failing the exam has significant life consequences for the test taker. Usually this means getting a job, getting licensed in a profession, getting admitted to a school, getting a diploma, etc.

Recently, Boston Globe released an investigative report concerning Army Correspondence Courses. Yesterday, Senator Edward Kennedy M. Kennedy, Chairman of the Armed Services Committee, reacted strongly to the report, writing, “I was shocked to read of one website that provides answer keys and boasts that “[w]ith cheap prices and fast service, you can be wearing that E-5 [sergeant] rank before you know it.”

The essential problem is that the assessments being used for the correspondence courses are unproctored Internet tests.

I remember taking unproctored tests as a student at the university. We called them “take home” tests. Our take-home tests had implicit security built into them:

  1. They were really hard. You couldn’t just find the answer to the questions in the university library.
  2. You might find someone to take the test for you or help you out, but eventually you would take a few in-class tests (where you couldn’t use your friend).
  3. The tests were written in your own handwriting, which was easily compared with prior copies of your handwritten assignments.

Later, as an instructor at the university we added another twist to take-home tests: Every student got the same problems but with different data and different answers.

The above simple principles highlight the issues that must be addressed to administer a test securely online in an unproctored setting:

  1. Biometrics should be used to authenticate test taker identity.
  2. The questions must not be answerable using simple “Google” searches.
  3. A verification process needs to be in place that allows the unproctored test result to be trusted.
  4. Other security measures may assist with authenticating that the test taker actually did his or her own work.
  5. Algorithms that produce item clones or variants can reduce the ability of test takers to share test content or profit from another’s answers.

I remember the day that I took my oral exams. There was no faking. There was no cheating. I was in a room, face-to-face, with three professors. Each of them had taught me in at least one course. Of course, it is not realistic to do this for every single individual being certified in a profession or being admitted into the university. But, it demonstrates the importance of having several observations which together confirm that the candidate does indeed possess the requisite competence.There has been interesting progress in the area of secure administrations of unproctored Internet tests. I will mention just a few items that I can recall readily:

  1. Kryterion ( is using data forensics and biometrics to establish that a test is being taken properly.
  2. SHL ( is using an initial unproctored test followed by a verification test in a proctored setting to ensure that the test results can be trusted.
  3. An instructor named Simon at the School of DCIT, University of Newcastle, used an innovative detection system with online unproctored tests that relied on font colors in Word documents to detect cheaters:

At this URL: you will find a paper that is very interesting in this context.

Two things are clear: (1) online assessment is here to stay, and (2) ubiquitous security solutions are needed if online assessments are to be trusted.

Student outwits FCAT with secret pattern

Friday, December 14th, 2007

A senior from Manatee High School passed the FCAT (Florida Comprehensive Assessment Test) in ten minutes by using a “secret pattern” after flunking the test three times. His score was invalidated. Apparently the test score was not invalidated because he used a pattern. Carla Frazier told the news, “FCAT rules do not prohibit students completing the test using any patterns, nor does the test have a minimum time requirement.”

We don’t know why the principal invalidated the score. We don’t know what “secret pattern” was used by the student. But, I have an idea what it might have been: “a-n-s-w-e-r-k-e-y.” Ok, I admit to being a cynic and a skeptic at times. This is one of those times.

Consider the facts, and then decide for yourself if you believe the student’s story.

  1. Test publishers are very careful to make answer keys as unpredictable as possible. They are well aware of the guesser’s adage, “If you don’t know, choose ‘C’.”
  2. Item writers and item reviewers are careful in writing distractors and answer choices to prevent guessers from gaming the test and gaining an advantage. They know that guessers will attempt to deduce the correct answer by analyzing the answer choice lengths and details.
  3. Having analyzed a lot of high school exit exam data, I know that pass rates go down with every make up test. Students who fail three times are very lazy, easily confused or just not proficient. Passing the test in ten minutes is not consistent with any of these.
  4. Cheaters are often very creative liars and they prey on our gullibility. The news reporter was gullible in writing the story and, for some reason, expects us to be equally gullible.

There are a lot of ways to detect cheating. In this particular case we might have seen any of the following:

  1. An extremely high score after having flunked three times previously would be a clear warning sign to the principal.
  2. The FCAT, according to the district FCAT coordinator, often contains pilot questions. If the student did very well on all the questions, except the pilot questions, and the answers to those questions matched the answer key form a different form of the test, then the principal would definitely have a “smoking gun.”
  3. Sometimes the answer sheet can be modified after the fact. With the right inducement, an insider may be persuaded to change the answers. Erasure analysis would detect this kind of tampering. Perhaps the principal was suspicious and saw a lot of erasures on the answer sheet.
  4. It is often the case that the cheaters boast of their exploits and in this case the principal may have gotten wind of the boasting.

Being a student of statistics, I imagine that the student could have finally gotten lucky and passed the test. Distribution theory states that the maximum observed value in a distribution has a much higher mean than the distribution from which the value was drawn. In this case, we have repeated scores on the FCAT for the student. Just by chance alone, if the student’s expected score is reasonably close to passing, after repeatedly taking the test a passing score will be observed eventually.

But, suppose that in my skepticism I am correct. Suppose the student did have the answer key. How would the forensics analyst detect that an answer key had been stolen and used? I have seen three answer-key arbitrage techniques used for exam security purposes, and which could be used in similar situations.

  1. The FCAT coordinator disclosed that pilot questions are often used on the exam. Scoring the pilot questions with alternate keys could provide probability evidence that an answer key was in play.
  2. I know of a situation where items were intentionally miskeyed and left unscored with the goal of determining whether the answer key had been stolen and used.
  3. In another situation, the exam contained a few poorly written questions where the provided answer was ambiguous (This often happens on exams). These questions were exploited in a similar manner to compute probability evidence that an answer key was stolen and used.

The test publisher has many tools and techniques that can be used to trap the unsuspecting cheater. Answer-key arbitrage is one of those.

Forensics analysis moves to online games

Tuesday, December 11th, 2007

Cheating in MMO (Massively Multiplayer Online) games is on the rise, and “to fight back, game developers have taken a page from banks and credit card companies. They’re using fraud-detection software to analyze the rushing stream of events that occur in an ordinary MMO day, in search of something fishy.” above article is interesting in the data forensics context for a few reasons:

  1. The principles of data forensics are stated clearly,
  2. There is a pervasive need for detection methodologies,
  3. We can learn from other disciplines in the fight against cheating,
  4. The distinction between “games” and “real life” is blurring, and
  5. Just as forensics methods are cross-disciplinary, so are cheating methods.

The gamers are modeling their detection software from the banking and credit card industries, by “by creating a model of how players normally behave during a game.” The software then recognizes a deviation from the norm and flags it. This is the essence of forensics detection.As an example of “normal test-taking” behavior, consider the histogram in Figure 1.

Figure 1: Histogram of test start times

Histogram 1

In the Figure above, most tests start between the hours of 7:00 am and 5:00 pm (17:00). However, there are a few tests that are beginning between the hours of 12 midnight and 2:00 am. This seems very strange and unlike normal test taking behavior.The forensics analyst recognizes that cheaters often repeat the same behavior and repeat the same mistakes. For example in the above data, the distribution of “after hours” testing (i.e., when the test center is normally dark) was not random. Instead there were just a few test sites where this behavior was occurring. As a consequence, those test sites could be detected. Data from one of the sites is shown below in Figure 2.

Figure 2: Anomalous test site with after-hours testing

Figure 2

What is amazing from Figure 2 is that even for this anomalous test site, it is clear that the “after hours” tests were unusual. While I do not know what actually happened, it appears that an individual at the test center allowed late-night access for some test takers. There could have been a legitimate reason for these tests being taken at these times (i.e., special testing sessions were arranged). On the other hand, such strange data could easily be the result of test fraud (i.e., getting test-taking assistance at late night in order to avoid detection by proctors).In the above example, I have illustrated how a “normal test-taking” model can be built and then used to detect unusual and anomalous data. After detection, the investigator then seeks an explanation. As Arthur Conan Doyle expressed through his detective, Sherlock Holmes, “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.”

Anatomy of the meltdown of a forensic procedure

Friday, December 7th, 2007

The CBS News program “60 Minutes” and the Washington Post aired an investigative report on November 16 criticizing the FBI for failing to notify relevant jurisdictions that hundreds of inmates have been jailed using a flawed forensic methodology. Despite discontinuing the use of “bullet lead” analysis in 2005 because of validity concerns, the FBI had taken no action to inform the courts that some defendants were potentially innocent and wrongfully imprisoned.

Bullet lead analysis was first used in the investigation of the assassination of JFK, and was routinely used in the 1980’s when bullets were so misshapen that ballistic evidence was unobtainable. The essential idea is that trace elements in lead vary naturally and that bullets could be “matched” as coming from the same source (i.e., the same box of bullets) by comparing the compositions of these trace elements. In the 2005 press release, the FBI stated, “One factor significantly influenced the Laboratory’s decision to no longer conduct the examination of bullet lead: neither scientists nor bullet manufacturers are able to definitively attest to the significance of an association made between bullets in the course of a bullet lead examination.”

We naturally ask, “How is it possible that a procedure could be trusted for 40 years, be invoked in 2,500 investigations, be used as testimony in about 500 of those cases, and then be discredited?” The FBI commissioned an independent review of the procedure in 2002 by the National Research Council. Their report is very fascinating to read, is very comprehensive, and was completed in 2004. A copy may be purchased at the following URL: The findings of this report convinced the FBI to discontinue the bullet lead analysis.

After browsing through this report and reading the findings and recommendations, it is clear that the FBI procedure devised in the 1960’s could not withstand public scrutiny. From my perspective, the most troubling aspect of the analysis was that it was (and is) unknown how many compositionally similar bullets were produced and where they were distributed. This means that a probability statement concerning the likelihood of a false positive (i.e., saying the bullets came from the same box when they didn’t) was impossible. Without such a statement the forensic examiner cannot state with any reliability or objectivity that the bullet found at the crime scene came from the same box as bullets found in the possession of the suspect.

The NRC also indicated that the method of computing the statistical match should be revised. From my perspective this is because the FBI’s computational procedure was not based on a statistic. It was computed using statistical ideas, but not supported with statistical distribution theory. This procedure falls into the realm of “ad-hoc analytics.” It seemed good at the time. There wasn’t a better idea. But, there was no way to determine error rates and probabilities associated with the procedure. I have seen a lot of ad-hoc statistical procedures in my day and they nearly always fail eventually because they are based on some statistical idea but they have no statistical theory that supports them. In the long run, the queen of statistics (i.e., natural variability) overwhelms all procedures that do not estimate probability models from empirical data.

I have a good friend who quoted the maxim, “Models before algorithms” often. By this he meant that you should analyze the processes that generate the data and the variability associated with the data before you build detection methodologies. I have tried to follow this rule assiduously in devising detection methodologies for Caveon Data Forensics. Without the guidance of reasonable probability models, statistical interpretations of the data are subjective and indefensible.