

SailPoint used SmartItem technology exclusively for their two certification exams published in 2018. They were attracted to SmartItem technology primarily for the security and cost-saving benefits. From the outset, they learned that creating an exam with SmartItems was more of a design effort than a writing one. To enable this effort, item writers were divided into small teams, including a coder for each team.
During the item development process, they created a significant number of SmartItems. Code, along with other pieces of item content, were reviewed constantly and fixed when problems were discovered. Field tests of the items for the two exams went well, and a cutscore was set based on the collected data. Statistical analyses for the exams revealed the SmartItems to be high-quality items that ranged in difficulty and other properties. The large majority met the usual psychometric quality control criteria required to serve on the exams.
Since the tests went operational, the SmartItem technology has performed well, with the typical variability in item quality for certification exams made up of traditional items. Test-level reliability and validity evidence provided support for the use of SmartItems to produce test scores for high-stakes certification decisions.
A job analysis was conducted by interviewing SailPoint subject matter experts. From these interviews, a list of objectives was created and then weighted for their perceived importance. There were 62 total objectives.
A total of 64 SmartItems were created and vetted for the objectives for the IdentityIQ Engineer exam.² The SmartItems used a mixture of item formats. Figure 1 shows the breakdown of item types used as SmartItems. To avoid the validity issues associated with testwiseness, no traditional MC items were used.
Each SmartItem was developed by a team of two or three SMEs, plus an individual with coding skills. (At the time, Caveon’s GUI-based tool was still in development, so coders were necessary to develop SmartItems.) The coder provided input to the SMEs on whether the design of their item was both functional and easy to code, thereby maximizing efficiency. The coder then implemented the SMEs’ design in Scorpion, Caveon’s exam development and delivery platform.
Figure 1. The breakdown of item types used as SmartItems for Case Study #1.
Once a draft of the SmartItem was complete, the SmartItem was “previewed”⁴ a sufficient number of times to determine whether the item was functioning as designed or needed revisions. This review process differed from what is typically done for traditional items because the SmartItems appear differently each time they are rendered. This process was more similar to a software quality assurance check than to a conventional item review. SmartItems that failed to pass this quality check were discarded or revised and then previewed again. Those that passed this quality step were compiled together for a field test.
The field test, often referred to as a “beta test” or an item “pre-test,” was conducted to obtain an empirical measure of SmartItem quality. Were the SmartItems too easy or too difficult? Did they discriminate well among test takers?
Several dozen individuals were recruited to take all 64 SmartItems that made up the field test exam. The participants were recruited to span a range of ability in order to:
Obtain stable estimates of item performance metrics, and
Help determine the pass/fail standard for the final exam.
Based on field test results, 49 out of the 64 SmartItems were deemed to be of sufficient psychometric quality to serve on the actual certification exam. The issues with the remaining SmartItems were typical of those found in all test development projects. Some SmartItems were too difficult or too easy. A few had low correlations with the total test score.
All of the remaining 15 items were carefully reviewed and revised and then included in the certification exam as unscored questions to collect data on the new versions. It was hoped that these repaired SmartItems might eventually perform well enough to be included on the scored portion of
the exam.
Which choice best describes your experience in the SailPoint IdentityIQ Engineer Role?
r = 0.304
df = 248
prob. = .00004
Figure 2. The survey question and its validity coefficient for the Field Test of the IdentityIQ Exam.
The field test also included survey questions. One question asked the candidates to indicate their proficiency in the SailPoint skills covered by the exam. Their responses were then correlated with their test scores, providing empirical evidence of validity. Figure 2 shows the survey question and its validity coefficient.
The Cronbach’s Alpha reliability coefficient, calculated using only the data from the 49 scored SmartItems, was α = .75.
The Contrasting Groups method (Zieky, 2001) was used to set the cutscore for this exam. The field test provided a way to create a cutscore. Field test participants were pre-sorted, based on judgments by the SailPoint management team and various other supervisors regarding the capability of each participant. Based on these judgments, the field test participants were divided into three groups:
Those who are expected to pass the exam
Those where it is unclear if they should pass or not, and
Those who should not pass the exam
After the SmartItems were evaluated and the final 49 SmartItems selected, the performance of the individuals was re-scored based on those 49 items. The score distributions for the three groups were plotted and compared. A cutscore that minimized classification errors was set.
Field test participants were told in advance that their score, based on qualified SmartItems, would be evaluated against an empirically derived cutscore following the field test. It was assumed the motivation these individuals would feel to obtain the certification would produce test-taking behavior similar to candidates who would be taking the operational exam in the future.
One year after the exam had been released and made available to candidates, there still had not been any public disclosures of exam content on the Internet. This was confirmed by extensive web patrolling efforts throughout the life of the exam. This result is very unusual for IT-based certification programs, as test content is usually stolen and disclosed within days.
With over 375 tests administered as of the authoring of this case study, the SailPoint IdentityIQ Engineer Exam continues to perform well. Similar to the field test results, the items show a typical large range of variability in item difficulty and item discrimination.
Like the field test, the operational test also included survey questions. One of those questions asked the candidates to indicate their proficiency in the SailPoint skills covered by the exam. Their responses were then correlated with their test scores, providing initial empirical evidence of validity. Figure 3 shows the survey question and its validity coefficient.
The Cronbach’s Alpha reliability coefficient, calculated using the data from the 55 scored SmartItems used on the operational exam, was α = .74.
Which choice best describes your experience in the SailPoint IdentityIQ Engineer Role?
r = 0.24
Figure 3. The survey question and its validity coefficient for Case Study #1 for the Operational Test of the IdentityIQ Exam.
Case Study #2 describes the second exam published by SailPoint, this one titled the “Identity IQ Architect Exam.” The experience, including the positive outcomes, was similar to the first exam except for one main difference: For this exam, we changed the SmartItem review process to improve our ability to evaluate each SmartItem.
A new “mapping” phase was added to the development process. In this phase, the SMEs created the item variables and values⁵ for a SmartItem and indicated the relationships between them. This helped later reviewers understand how the SmartItem functioned in order to evaluate if it is functioning properly.
A job analysis was conducted in a manner similar to Case Study #1. A total of 67 objectives were identified.
Based on the objectives of the IdentityIQ Architect Exam, 67 SmartItems were eventually created. The SmartItems used a mixture of item formats. Figure 4 shows the breakdown of item types used as SmartItems. As with the original exam, no multiple-choice (MC) items were used. Each SmartItem was developed by a team of two or three SMEs who designed and crafted the SmartItem. A coder was assigned to each team to create the items in Scorpion, Caveon’s exam development and delivery platform. The writing process was enhanced with a “mapping” phase where SMEs documented their variables and logic behind each SmartItem. This additional step helped subsequent reviewers understand the intended function of the items.
Figure 4. The breakdown of item types used as SmartItems for Case Study #2.
As with the first case study, a field test of the 67 items was conducted. Participants were recruited whose abilities spanned a range of competency on the content comprising the IdentityIQ Architect Exam. As before, the field test not only evaluated item performance empirically, but also provided the data to set the cutscore. Based on field test results, 61 of the 67 total SmartItems were deemed to be of sufficient psychometric quality to use on the actual certification exam. The remaining six SmartItems were revised and then included on the certification exam as unscored items in order to collect new data.
Have you previously taken any IdentityIQ Implementer courses offered by SailPoint?
r = 0.49
Which choice best describes your experience in the SailPoint IdentityIQ Architect Role?
r = 0.51
Figure 5. The survey questions and validity coefficients for Case Study #2.
The operational test also included survey items. Two of those survey items asked the candidates questions related to their proficiency in the content of the exam. Their responses were then correlated with their test scores, providing empirical evidence of validity. Figure 5 shows the survey questions and validity coefficients.
Cronbach’s Alpha reliability coefficient, calculated using only the data from the 61 scored SmartItems, was α = .82.
As with Case Study #1, the Contrasting Groups method was used to set the cutscore for this exam. Field test participants were pre-sorted based on the SailPoint management team’s experience with participants. They were then divided into two groups:
Those who are expected to pass the exam, and
Those who should not pass the exam.
After the SmartItems were evaluated and the 61 final SmartItems were selected, the performance of the individuals was re-scored based on those 61 items. The score distributions for the two groups were plotted and compared. A cutscore was set that minimized classification errors.
What did we learn from the two case studies? What are some of the initial insights from using SmartItem technology on actual high-stakes information technology certification exams? Here are some initial conclusions:
Using SmartItem technology reduces initial item development costs.⁶ It is also likely they will reduce long-term maintenance costs because SmartItems do not need to be replaced.
With the addition of a unique step or two,⁷ SmartItems can be created easily in a typical item development workshop environment. Developing SmartItems takes no more time than a traditional workshop.
Multiple-choice (MC) items, either as traditional MC items or as MC-based SmartItems, are generally unnecessary. DOMC is a preferred substitute because it performs as well or better statistically and removes the testwiseness advantages for some test takers.
Not all SmartItems require coding. The SuperDOMC format⁸ was sufficient and appropriate for almost half of the skills on the SailPoint exams.
SmartItems, when created, can be effectively reviewed for accuracy, bias, etc.
SmartItems can use several varieties of item formats: DOMC, SuperDOMC, Build List, Matching, and Short Answer.
DOMC: Some DOMC items employed code to display the item content.
SuperDOMC: No code was used. Instead, a large number of options were produced.
Build List: Here, examinees selected from a large list and dragged elements from that list to create another. Then examinees re-arranged the list according to the instructions in the item.
Matching: This Matching format required the examinees to match two lists using drop-down menus.
Short Answer: Examinees were required to type their answer in the response box provided using their keyboard.
SmartItems performed well statistically in both the field tests and the operational tests.
SmartItems, like traditional items, may be constructed poorly. Weak items are determined through field testing and statistical analysis.
A test completely comprised of SmartItem technology can produce strong evidence of reliability and validity.
SmartItems can be used as part of the process to set cutscores using the Contrasting Groups method.
Like the field test, the operational test also included survey questions. One of those questions asked the candidates to indicate their proficiency in the SailPoint skills covered by the exam. Their responses were then correlated with their test scores, providing initial empirical evidence of validity. Figure 3 shows the survey question and its validity coefficient.
The Cronbach’s Alpha reliability coefficient, calculated using the data from the 55 scored SmartItems used on the operational exam, was α = .74.
Which choice best describes your experience in the SailPoint IdentityIQ Engineer Role?
r = 0.24
Figure 3. The survey question and its validity coefficient for Case Study #1 for the Operational Test of the IdentityIQ Exam.




