A Comprehensive Bibliography:
Randomly Parallel Tests & SmartItems
This bibliography provides a comprehensive collection of research and studies on Randomly Parallel Tests & SmartItem technology. This resource is designed to support researchers, practitioners, and students seeking a deeper understanding of this approach to testing, offering easy access to seminal papers, key findings, and recent developments.
Whether you are new to the concept or looking to expand your knowledge, this page serves as a curated guide to the most relevant and impactful literature in the field.
Research Bibliography:
Foster, D. (2020). The SmartItem™: Stop Test Fraud, Improve Fairness, and Upgrade the Way You Test. Caveon, LLC.
In this paper, the author introduces the concept of the SmartItem. This is the main source of the concept of SmartItems.
Araneda, S., Lee, D., Lewis, J., O’Riordan, M., Sireci, S. G., & Zenisky, A. L. (2023). Incorporating SmartItem Technology on a Multistage-Adaptive Test [White paper]. Submitted for publication.
In this paper, the authors explore different challenges found when implementing SmartItems. The team created items for several levels for the Massachusetts Adult Proficiency Test (MAPT). Among the main conclusions we can mention the following:
- Flexibility in Item Design: Developing SmartItems requires considering item stems and options flexibly to allow for broad replication. This means creating item stems that work for a range of values and ensuring the options are systematically conceptualized for consistency across multiple variations.
- Complexity in Creating Distractors: Writing appropriate distractors for SmartItems is challenging, similar to traditional multiple-choice items. The complexity arises in ensuring distractors are plausible and do not repeat as correct answers across variations.
- Efficiency in Item Bank Development: SmartItems provide an efficient way to bulk up item banks, especially in contexts where security and item exposure are significant concerns. This efficiency comes from the ability to generate numerous item variations from a single template.
- Content Quality and Alignment: The quality and alignment of all item variations depend on the quality of the source item template. This requires careful parametrization to ensure each variation accurately measures the intended construct.
- Technical Challenges: There were several technical issues encountered, such as varying certain elements (e.g., images, equations in LaTeX format, rounding issues) and generating images dynamically. These challenges required proactive solutions from the item authoring vendor.
- Adaptation to Different Content Domains: It was easier to write SmartItems for some math content domains than for others. For example, quantitative contexts were more amenable to automated item generation than domains relying heavily on visualizations, like geometry.
- Personalized Assessments: One benefit of SmartItems is the potential for creating more personalized assessments. For example, word problems could include scenarios randomly selected to be relatable to all test-takers, although this posed challenges in ensuring language appropriateness for diverse learners.
- Cultural and Linguistic Simplification: The process of creating SmartItems can benefit from cultural and linguistic simplification, making item frames generalizable for parameterization. This supports culturally sustaining and anti-racist assessment practices by allowing context features to be manipulated to match test-taker characteristics.
- Diagnostic Potential: SmartItems have the potential for improving diagnostic assessment by precisely specifying and manipulating distractors to identify patterns that may signify misconceptions.
- Hybrid Approach: Some content standards may benefit more from parameterization than others. A hybrid approach using both SmartItems and traditional items may be best for covering all necessary content standards effectively.
Foster, D., & Foster, N. (2024). Systems and methods for testing skills capability using technologically-enhanced questions in a computerized environment (U.S. Patent No. 11,961,416). U.S. Patent and Trademark Office.
In this patent, the authors explain in general how to use technology-enhanced tests, specifically the use of SmartItems, as an original idea for test development.
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16(2), 137-163.
Abstract:
“Reliability theory” is reinterpreted as a theory regarding the adequacy with which one can generalize from one observation to a universe of observations. If the observation is randomly sampled from the universe-whether or not the universe consists of equivalent observations-the intraclass correlation provides an approximate lower bound to the expected value of the desired coefficient of generalizability.
In “Theory of Generalizability: A Liberalization of Reliability Theory,” Cronbach, Rajaratnam, and Gleser propose a comprehensive framework that extends classical reliability theory to better account for the complexities of psychological and educational measurement. The authors argue that traditional reliability theory, centered on the concept of parallel measures, often falls short in practical applications due to the difficulty in creating truly equivalent measures. To address this, they introduce generalizability theory, which allows for the assessment of measurement reliability across a broader range of conditions and facets.
The key innovation of generalizability theory is its ability to separate different sources of measurement error and to estimate the reliability of scores under various conditions of measurement. This approach involves defining a universe of admissible observations and then assessing the consistency of measurements within this universe. The theory employs analysis of variance (ANOVA) techniques to estimate components of variance attributable to different sources, such as persons, items, and occasions, thus providing a more nuanced understanding of measurement reliability.
The authors provide detailed mathematical formulations and examples to illustrate the application of generalizability theory. They show how this framework can accommodate various experimental designs, including both matched and unmatched conditions, and how it can be used to derive reliability estimates that are less dependent on the stringent assumptions of classical theory.
Overall, the paper offers a significant advancement in the field of psychometrics, providing researchers with a more flexible and robust tool for evaluating the reliability of their measurements across diverse settings and conditions. This work has important implications for the development and validation of psychological and educational assessments, enabling more accurate and generalizable conclusions about test scores. [Annotation created with ChatGPT.]
How does this paper relate to SmartItems?
The paper “Theory of Generalizability: A Liberalization of Reliability Theory” by Cronbach, Rajaratnam, and Gleser extends the concept of randomly parallel tests by introducing generalizability theory, which assesses measurement reliability across diverse conditions. Unlike classical reliability theory that assumes strict equivalence between test forms, generalizability theory uses ANOVA to estimate variance components and intraclass correlations, allowing for the evaluation of reliability in more practical scenarios where test forms are treated as random samples from a broader universe. This approach provides a flexible and comprehensive framework for understanding the dependability of measurements.
Hively, W., Patterson, H. L., & Page, S. H. (1968). A” universe-defined” system of arithmetic achievement tests. Journal of educational measurement, 5(4), 275-290.
In this paper, the authors show some results from G-theory on items that were defined as universes over specific content domains. This is related to SmartItems as it shows an early attempt to do a domain-reference test as considered with SmartItems.
Abstract:
The paper “A ‘Universe-Defined’ System of Arithmetic Achievement Tests” by Wells Hively II, Harry L. Patterson, and Sara H. Page, presents a novel approach to arithmetic achievement testing, integrating the principles of educational behaviorism and generalizability theory. The authors propose a system where achievement tests are developed based on well-defined behavioral classes, allowing for precise measurement and diagnosis of arithmetic skills. The methodology involves generating a universe of item forms representing different arithmetic tasks and employing random sampling to create test forms. This system is tested within the context of the Federal Job Corps Program’s basic education curriculum, focusing on the arithmetic skills of young men. Results demonstrate the reliability and generalizability of the tests, showing consistent variance patterns across different test families. The findings suggest that while item forms provide valuable stratification, further refinement is needed to ensure homogeneity within item forms. The study underscores the potential of universe-defined tests to enhance diagnostic precision and instructional effectiveness in educational settings.
Snowballing:
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16(2), 137-163.
Millman, J. (1977). Creating Domain-Referenced Tests by Computer.
Abstract:
A unique system is described for creating tests by computer. It is unique because, instead of storing items in the computer, item algorithms similar to Hively’s notion of item forms are banked. Every item, and thus every test, represents a sample from domains consisting of thousands of items. The paper contains a discussion of the special practical applications of such tests, a description of the easy-to-learn user language in which item algorithms are written, and the results of using the tests in a college course taught by the mastery learning instructional strategy.
Embretson, S. E. (1984). A general multicomponent latent trait model for response processes. Psychometrika, 49, 175- 186.
Abstract:
The purpose of the current paper is to propose a general multicomponent latent trait model (GLTM) for response processes. The proposed model combines the linear logistic latent trait (LLTM) with the multicomponent latent trait model (MLTM). As with both LLTM and MLTM, the general multicomponent latent trait model can be used to (1) test hypotheses about the theoretical variables that underlie response difficulty and (2) estimate parameters that describe test items by basic substantive properties. However, GLTM contains both component outcomes and complexity factors in a single model and may be applied to data that neither LLTM nor MLTM can handle. Joint maximum likelihood estimators are presented for the parameters of GLTM and an application to cognitive test items is described.
Intentional Variations: For items with intentional variations designed to test different aspects of a skill or knowledge area, GLTM can model the impact of these variations on item difficulty and test-taker performance, ensuring a more comprehensive and accurate assessment.
In summary, the GLTM appears to be capable of scoring tests that utilize smart items, item templates, or items with intentional variations, thanks to its comprehensive approach to modeling both complexity factors and component outcomes.
Embretson, S.E. (1985). Test design: Developments in psychology and psychometrics. New York: Academic Press.
Test Design: Developments in Psychology and Psychometrics is a collection of papers that deals with the diverse developments contributing to the psychometrics of test design. Part I is a review of test design including practices being used in test development. Part II deals with design variables from a psychological theory that includes implications of verbal comprehension theories in the role of intelligence and the effects of these implications on goals, design, scoring, and validation of tests. Part III discusses the latent trait models for test design that have numerous advantages in problems involving item banking, test equating, and computerized adaptive testing. One paper explains the use of the linear exponential model for psychometric models in speed test construction. The book discusses the traditional psychometric; the Hunt, Frost, and Lunnerbog theory; and the single-latency distribution model. Part IV examines test designs from the perspective of test developments in the future integrating technology, cognitive science, and psychometric theories. Psychologists, psychometricians, educators, and researchers in the field of human development studies will value this book.
Hornke, L.E, & Habon, M.W. (1986). Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 369-380.
Abstract:
In cognition research, item writing rules are considered a necessary prerequisite of item banking. A set of 636 items was constructed using prespecified cognitive operations. An evaluation of test data from some 7,400 examinees revealed 446 homogeneous items. Some items had to be discarded because of printing flaws, and others because of operation complexion or other well-describable reasons. However, cognitive operations explained item difficulty parameters quite well; further cross-validation research may contribute to an item writing approach which attempts to bring psychological theory and psychometric models closer together. This will eventually free item construction from item writer idiosyncrasies.
Quote:
“The present undertaking attempted to define a bank of items by means of construction rules based on substantive cognitive theory. Further development of this approach might produce items which need not be tried out empirically prior to administration. Their cognitive operation composition would suffice to moderate their difficulty and interpretability. In the area of individual personnel evaluation, the major benefit would be that items could be assembled solely according to personnel decision criteria. The achievement of this goal would require intensive basic cognition research and item analysis as well as cross-validation of the results.”
LaDuca, A., Staples, W. I., Templeton, B., & Holzman, G. B. (1986). Item modeling procedure for constructing content-equivalent multiple-choice questions. Medical education, 20(1), 53-56.
Recent research on multiple-choice questions has identified deficiencies of inadequate content‐equivalence and item‐writer bias. Systematic methods of writing multiple-choice questions are being advocated as effective responses. This article describes the preliminary development of a new item‐writing method. Details of the procedure, called item modeling, are provided.
Haladyna, T., & Shindoll, R. (1989). Items shells: A method for writing effective multiple-choice test items. Evaluation and the Health Professions, 12, 97-106.
Abstract:
Writing multiple-choice test items has been typically characterized as more of an art than a science. Textbooks commonly offer advice on how to write items, but most inexperienced item writers, despite having expertise in a content area, have difficulty phrasing the stem. A technique is described that has been successfully used in several testing programs in the health professions. This technique, item shell, provides a basis for getting item writers started in the difficult process of writing the effective multiple-choice item.
Mislevy, R. J., Sheehan, K. M., & Wingersky, M. (1993). How to equate tests with little or no data. Journal of Educational Measurement, 30, 55-76.
This paper has one technique to deal with uncertainty on item parameters. This would go on how to use IRT and SmartItems. There is some predicted information function it seems.
Standard procedures for equating tests, including those based on item response theory (IRT), require item responses from large numbers of examinees. Such data may not be forthcoming for reasons theoretical, political, or practical. Information about items’ operating characteristics may be available from other sources, however, such as content and format specifications, expert opinion, or psychological theories about the skills and strategies required to solve them. This article shows how, in the IRT framework, collateral information about items can be exploited to augment or even replace examinee responses when linking or equating new tests to established scales. The procedures are illustrated with data from the Pre-Professional Skills Test.
Mislevy, R. J., Sheehan, K. M., & Wingersky, M. (1993). How to equate tests with little or no data. Journal of Educational Measurement, 30, 55-76.
This paper has one technique to deal with uncertainty on item parameters. This would go on how to use IRT and SmartItems. There is some predicted information function, it seems.
Standard procedures for equating tests, including those based on item response theory (IRT), require item responses from large numbers of examinees. Such data may not be forthcoming for reasons theoretical, political, or practical. Information about items’ operating characteristics may be available from other sources, however, such as content and format specifications, expert opinion, or psychological theories about the skills and strategies required to solve them. This article shows how, in the IRT framework, collateral information about items can be exploited to augment or even replace examinee responses when linking or equating new tests to established scales. The procedures are illustrated with data from the Pre-Professional Skills Test.
Shye, S., Elizur, D., & Hoffman, M. (1994). Introduction to facet theory. Thousand Oaks, CA: Sage Publishers.
Abstract:
Through the use of detailed examples, Shye introduces readers to the use of facet theory as a method for integrating content design with data analysis. He shows how facet theory provides a strategy for conceptualizing a study, for formulating the study’s variables in terms of its purposes, for systematic sampling of the variables, and for formulating hypotheses. The book is organized into 2 parts: Part [I] introduces the reader to mapping with a specific emphasis on mapping sentences, and Part [II] explores procedures for processing multivariate data. The book concludes with a discussion on the nature of scientific inquiry and the difference between a research question and observational questions.
Bejar, I. I. (1996). Generative response modeling: Leveraging the computer as a test delivery medium (ETS RR-96-13). Princeton, NJ: ETS.
Abstract. Generative response modeling is an approach to test development and response modeling that calls for the creation of items in such a way that the parameters of the items on some response model can be anticipated through knowledge of the psychological processes and knowledge required to respond to the item. That is, the computer would not merely retrieve an item from a database, as is the case in adaptive testing, but would compose it, or assist in doing so, according to desired specifications. This approach to assessment has implications for both the economics and validity of computer-administered tests. To illustrate the concept, a system for measuring writing skills will be outlined where the examinee is expected to rewrite sentences, rather than just recognize errors in a sentence, using a multiple-choice format. The possibility of estimating the psychometric parameters of items based on a psychological analysis of the response process will then be examined and shown to be feasible. Such estimates are less precise than estimates based on large samples of test takers. A Monte Carlo study is presented to investigate the possibility of compensating for that imprecision when estimating ability or proficiency. The paper concludes that a generative approach is feasible and can be a mechanism for taking advantage of the considerable investment required for computer-based testing.
Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 380-396.
In this paper the author proposed a principled procedure to create something we would call a smartitem on Caveon. This is based on a cognitive model and the paper study the variations on the item parameters of renderings from the template they created.
Abstract:
The actual impact of cognitive theory on testing contrasts sharply with its potential impact, which suggests some deep incompatibilities between the areas. This article describes and illustrates a cognitive design system approach that centralizes cognitive theory in developing valid tests. To resolve incompatibilities between cognitive and testing, the cognitive design system approach includes both conceptual and procedural frameworks. To illustrate the cognitive design approach, an item bank for measuring abstract reasoning was generated from cognitive theory (ie, PA Carpenter, MA Just, & P. Shell’s, 1990, processing theory). The construct validity of the generating item bank was strongly supported by several studies from the cognitive design system approach.(PsycINFO Database Record (c) 2016 APA, all rights reserved)
Quote:
“However, especially intriguing is the possibility of constructing items on-line for the first time as the person takes the test. That is, the test will be a set of generating principles, with known relationships to item performance, rather than an existing item bank with calibrated characteristics. This possibility obviously requires an extremely strong cognitive model. Although the ART model in the current studies is strong, still greater accuracy is required. However, perhaps by adding some perceptual features to the cognitive model, the required level of prediction may be obtainable. In any case, future research should explore this potential more fully for both its theoretical and practical advantages.”
Embretson, S. E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407-433.
On this paper the author explores the psychometric challenges of using item templates and families of items that may or may not be “clones.”
Abstract:
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well-established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.
Quote:
“In the face of such rapid change, it seems impossible to foresee very far into the next century but the results from the current studies support on-line item generation as the next development. I will venture two predictions. First, on-line adaptive item generation for non-verbal items will be readily available by 2005. Second, on-line adaptive item generation for verbal items, a much harder task due to the nature of language, will be partially available by 2010. It will be interesting to revisit these predictions in ten years.”
Embretson, S., & Gorin, J. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38(4), 343-368.
Abstract:
Cognitive psychology principles have been heralded as possibly central to construct validity. In this paper, testing practices are examined in three stages: (a) the past, in which the traditional testing research paradigm left little role for cognitive psychology principles, (b) the present, in which testing research is enhanced by cognitive psychology principles, and (c) the future, for which we predict that cognitive psychology’s potential will be fully realized through item design. An extended example of item design by cognitive theory is given to illustrate the principles. A spatial ability test that consists of an object assembly task highlights how cognitive design principles can lead to item generation.
Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2002). A feasibility study of on‐the‐fly item generation in adaptive testing. ETS Research Report Series, 2002(2), i-44.
Abstract The goal of this study was to assess the feasibility of an approach to adaptive testing based on item models. A simulation study was designed to explore the affects of item modeling on score precision and bias, and two experimental tests were administered — an experimental, on-the-fly, adaptive quantitative-reasoning test as well as a linear test. Results of the simulation study showed that under different levels of isomorphicity, there was no bias, but precision of measurement was eroded, especially in the middle range of the true-score scale. However, the comparison of adaptive test scores with operational Graduate Record Examinations (GRE) test scores matched the test-retest correlation observed under operational conditions. Analyses of item functioning on linear forms suggested a high level of isomorphicity across items within models. The current study provides a promising first step toward significant cost and theoretical improvement in test creation methodology for educational assessment.
Singley, M. K., & Bennett, R. E. (2002). Item generation and beyond: Applications of schema theory to mathematics assessment. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 361-384). Mahwah, NJ: Erlbaum.
Abstract:
In this chapter, we attempt to show how schema theory can be applied to the automatic generation of items, the scoring of more complex test responses, and to intelligent tutoring. When attempting to develop algorithms for automatic item generation in mathematics, one necessarily learns a lot about the underlying structure of the problems. This knowledge of problem structure is indispensable to making progress in item generation, but it may also be useful for other applications. In particular, it may be useful for either assigning partial credit scores or offering remediation in instructional settings. We report here on a particular theory of problem structure, schema theory. Schema theory is applied to the automatic generation and variation of items, the analysis of multiple-line solutions, and the delivery of instruction.
Williamson, D. M., Johnson, M. S., Sinharay, S., & Bejar, I. I. (2002). Applying Hierarchical Model Calibration to Automatically Generated Items.
Abstract:
This study explored the application of hierarchical model calibration to reduce or eliminate the need for pretesting automatically generated items from a common item model before their operational use. The ultimate goal is to develop automatic item generation (AIG) systems capable of producing items with highly similar statistical properties, which may enable the implementation of adaptive on-the-fly testing. The study applied the related siblings model to mathematics item data from an experimental administration associated with a national testing program, aiming to calibrate operational data incorporating multiple items generated from both AIG and manual item generation. The sample consisted of 3,793 examinees in grade 8, distributed among four test forms. Results suggest that including AIG-generated items in item families tends to result in item characteristic curves that are somewhat more variable than those for item families consisting of the same item under repeated administration. However, this increased variability is neither assured nor particularly pronounced in most cases. (Contains 2 figures, 1 table, and 13 references.)
Williamson, D. M., Johnson, M. S., Sinharay, S., & Bejar, I. I. (2002). Hierarchical IRT Examination of Isomorphic Equivalence of Complex Constructed Response Tasks.
Abstract:
This paper explores the application of a technique for hierarchical item response theory (IRT) calibration of complex constructed response tasks, which holds promise both as a calibration tool and as a means of evaluating the isomorphic equivalence of these tasks. Isomorphic tasks are explicitly and rigorously designed to be highly similar in domain-relevant characteristics and evaluation standards. A related task model was used in which each item was modeled with a separate item response function, but the isomorphic tasks were related through a hierarchical model. The model was implemented in software that conducted Bayesian Markov Chain Monte Carlo (MCMC) estimation to estimate the joint posterior of all model parameters by integrating over the posterior distribution of model parameters given the data. The study analyzed operational data from a high-stakes assessment consisting of a number of complex constructed response tasks. The MCMC estimation procedure was conducted through 100,000 iterations. The item characteristic curves (ICCs) for the six isomorphic families were determined. In general, the families of isomorphic tasks showed considerable similarity in the item response functions for their respective members as well as for the family response function for the isomorphic set. Results suggest that efforts to construct complex constructed response tasks that are isomorphic equivalent tasks can range somewhat in their degree of success, with some being consistently equivalent, some being more variable, and others being largely consistent but with notable deviations.
Sinharay, S., Johnson, M. S., & Williamson, D. M. (2003). An application of a Bayesian hierarchical model for item family calibration. ETS Research Report Series, 2003(1), i-41.
Abstract:
Item families, which are groups of items related to each other in some way, are increasingly used in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from a number of item models. Item calibration or scoring for such an assessment requires fitting models that can take into account the dependence structure inherent among the items that belong to the same item family. Glas and van der Linden (2001) suggest a Bayesian hierarchical model to analyze data involving item families. We fit that hierarchical model using the Markov chain Monte Carlo (MCMC) algorithm. Formulating the MCMC algorithm provides little additional difficulty compared to fitting a simple item response theory model, even with the additional complexity in the form of hierarchy in the model. We show that the model can take into account the dependence structure inherent among the items and hence is an improvement over the models currently used in similar situations. We introduce the notion of the family expected response function (FERF) as a way to summarize the probability of a correct response to an item randomly generated from an item family and suggest a way to estimate the FERFs. Our work is a step towards creating a tool that can save a significant amount of resources for tests with item families, because calibrating only the item families might be enough rather than calibrating each item in the families separately.
Sinharay, S., Johnson, M. S., & Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28(4), 295-313.
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can take into account the dependence structure inherent among the items that belong to the same item family. Glas and van der Linden (2001) suggest a Bayesian hierarchical model to analyze data involving item families with multiple-choice items. We fit the model using the Markov Chain Monte Carlo (MCMC) algorithm, introduce the family expected response function (FERF) as a way to summarize the probability of a correct response to an item randomly generated from an item family, and suggest a way to estimate the FERFs. This work is thus a step towards creating a tool that can save significant amount of resources in educational testing, by allowing proper analysis and summarization of data from tests involving item families.
Rizavi, S., Way, W. D., Davey, T., & Herbert, E. (2004). Tolerable Variation in Item Parameter Estimates for Linear and Adaptive Computer-Based Testing. Research Report No. 04-28. Educational Testing Service.
Abstract:
Item parameter estimates vary for a variety of reasons, including estimation error, characteristics of the examinee samples, and context effects (e.g., item location effects, section location effects, etc.). Although we expect variation based on theory, there is reason to believe that observed variation in item parameter estimates exceeds what theory would predict. This study examined both items that were administered linearly in a fixed order each time that they were used and items that had appeared in different adaptive testing item pools. The study looked at both the magnitude of variation in the item parameter estimates and the impact of this variation on the estimation of test-taker scores. The results showed that the linearly administered items exhibited remarkably small variation in parameter estimates over repeated calibrations. Similar findings with adaptively administered items in another high stakes testing program were also found when initial adaptively based item parameter estimates were compared with estimates from repeated use. The results of this study also indicated that context effects played a more significant role in adaptive item parameters when the comparisons were made to the parameters that were initially obtained from linear paper-and-pencil testing.
Sinharay, S., & Johnson, M. (2005). Analysis of data from an admissions test with item models. ETS Research Report Series, 2005(1), i-32.
Abstract:
Item models (LaDuca, Staples, Templeton, & Holzman, 1986) are classes from which it is possible to generate items that are equivalent or isomorphic to other items from the same model (e.g., Bejar, 1996; Bejar, 2002). They have the potential to produce large numbers of high-quality items at reduced cost. This paper introduces data from the first known application of items automatically generated from item models in a large-scale assessment and addresses several research questions associated with the data. We begin by reviewing calibration techniques for the analysis of data involving item models; one method assumes that the items are isomorphic, while the other treats items generated from the same item model as distinct but related. A major question for this type of data is whether these items are isomorphic, that is, if they behave the same psychometrically. This paper describes several rough diagnostic measures and a rigorous statistical diagnostic to assess the extent of isomorphicity in the items generated from an item model. Finally, this paper discusses the issue of scoring, an area that needs more research, with data involving item models.
Deane, P., Graf, E. A., Higgins, D., Futagi, Y., & Lawless, R. (2006). Model analysis and model creation: Capturing the task‐model structure of quantitative item domains. ETS Research Report Series, 2006(1), i-63.
This study focuses on the relationship between item modeling and evidence-centered design (ECD) and considers how an appropriately generalized item modeling software tool can support the systematic identification and exploitation of task-model variables. It then examines the feasibility of this goal using linear-equation items as a test case. The first half of the study examines task-model structures for linear equations and their relevance to item difficulty within ECD. The second half of the study presents prototype software, a Model Creator system for pure math items, designed to partially automate the creation of variant item models reflecting different combinations of task-model variables. The prototype is applied to linear equations but is designed to generalize over a range of pure mathematical content types.
Solano-Flores, G. (2008). Who is given tests in what language by whom, when, and where? The need for probabilistic views of language in the testing of English language learners. Educational Researcher, 37(4), 189-199.
Abstract:
The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment systems to serve ELLs. The question Who is given tests in what language by whom, when, and where? provides a conceptual framework for examining testing as a communication process between assessment systems and ELLs. Probabilistic approaches based on generalizability theory—a psychometric theory of measurement error— allow examination of the extent to which assessment systems’ inability to effectively communicate with ELLs affects the dependability of academic achievement measures.
Sinharay, S., & Johnson, M. S. (2008). Use of item models in a large-scale admissions test: A case study. International Journal of Testing, 8(3), 209-236.
Item models (CitationLaDuca, Staples, Templeton, & Holzman, 1986) are classes from which it is possible to generate items that are equivalent/isomorphic to other items from the same model (e.g., CitationBejar, 1996, Citation2002). They have the potential to produce large numbers of high-quality items at reduced cost. This article introduces data from an application of item models for the generation of items for a large-scale assessment and investigates several research questions associated with the data. We begin by reviewing calibration techniques for the analysis of data involving item models; one method assumes that the items are isomorphic, while the other treats items generated from the same item model as distinct but related. A major question for these types of data is whether these items are isomorphic; that is, if they behave the same psychometrically. This article describes a number of rough diagnostic measures and a statistical diagnostic to assess the extent of isomorphicity in the items generated from an item model. Finally, this article discusses the issue of scoring—an area that needs more research—with data involving item models.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-559.
Abstract:
It is common practice in IRT to consider items as fixed and persons as random. Both continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and troubleshooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both in theory and for its goodness of fit. Second, the linear logistic test model with an error term is introduced so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead, a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.
Wendt, A., Kao, S., Gorham, J., & Woo, A. (2009). Developing item variants: An empirical study In DJ Weiss. In Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing.
Abstract:
Large-scale standardized tests have been widely used for educational and licensure testing. In computerized adaptive testing (CAT), one of the practical concerns for maintaining large-scale assessments is to ensure adequate numbers of high-quality items required for item pool functioning. Developing items at specific difficulty levels and for certain areas of test plans is a well-known challenge. The purpose of this study was to investigate strategies for varying items that can effectively generate items at targeted difficulty levels and specific test plan areas. Each variant item generation model was developed by decomposing selected source items possessing ideal measurement properties and targeting the desirable content domains. A total of 341 variant items were generated from 72 source items. Data were collected from six pretest periods, and items were calibrated using the Rasch model. Initial results indicate that variant items showed desirable measurement properties. Additionally, compared to an average of approximately 60% of the items passing pretest criteria, an average of 84% of the variant items passed the pretest criteria.
Embretson, S. E. (2010). Measuring psychological constructs with model-based approaches: An introduction.
Abstract:
The purpose of this volume, in part, is to present a broad spectrum of model-based measurement approaches that remove some of the constraints. Typical test development practices under both CTT and IRT require several assumptions that do not necessarily interface well with psychological constructs as conceptualized theoretically. That is, the test developer must assume that (a) the same construct can characterize responses of all persons,(b) items have identical psychometric properties when administered to different persons,(c) items are fixed entities with known stimulus content,(d) items are calibrated prior to test scoring,(e) item response probabilities are monotonically related to the trait to be measured, and (f) internal consistency between items on a test indicates adequate assessment of a trait.(PsycInfo Database Record (c) 2023 APA, all rights reserved)
Geerlings, H., Glas, C. A., & Van Der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337-359.
Abstract:
An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.
Sinharay, S., & Johnson, M. S. (2012). Statistical modeling of automatically generated items. In Automatic item generation (pp. 183-195). Routledge.
A large pool of high-quality items is important for the efficient operation of any large-scale testing program, especially those with flexible administration times, to address concerns regarding item exposure and potential disclosure. In an attempt to produce high-quality items at reduced expense, there is an increasing interest in generating items automatically. [summary from google scholar]
Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation. International journal of testing, 12(3), 273-298.
Abstract:
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates or prototypes, that highlight the features or elements in the assessment task that must be manipulated. Second, these item model elements are manipulated to generate new items with the aid of computer-based algorithms. With this two-step process, hundreds or even thousands of new items can be created from a single item model. The purpose of our article is to describe seven different but related topics that are central to the development and use of item models for automatic item generation. We start by defining item model and highlighting some related concepts; we describe how item models are developed; we present an item model taxonomy; we illustrate how item models can be used for automatic item generation; we outline some benefits of using item models; we introduce the idea of an item model bank; and finally, we demonstrate how statistical procedures can be used to estimate the parameters of the generated items without the need for extensive field or pilot testing.
Gierl, M. J., Lai, H., & Breithaupt, K. (2012, March). Methods for creating and evaluating the item model structure used in automatic item generation. In annual meeting of the National Council on Measurement in Education, Vancouver, BC, Canada.
Abstract:
This study presents an application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules, where the items within the families are assumed to differ only in surface features. The parameters of this model are estimated in a Bayesian framework using a data-augmented Gibbs sampler, with a particular emphasis on computerized algorithmic item generation. The hierarchical model offers the potential to enhance both the cost-effectiveness and flexibility of item generation and administration. Data from a non-verbal intelligence test created using design rules were used to illustrate the application of this model, alongside results from a simulation study conducted to evaluate parameter recovery. This approach underscores the theoretical and practical viability of random item parameters for addressing issues such as the measurement of persons, the explanation of item difficulties, and trouble-shooting differential item functioning (DIF), highlighting the Rasch model’s fit and the introduction of the linear logistic test model with an error term, as well as the random item profile model (RIP) and the random item mixture model (RIM). These models demonstrate promising results for DIF identification through robust regression approaches and posterior anchoring without predefined anchor sets. This study thus contributes valuable insights into enhancing item generation methodologies for large-scale standardized assessments. [Created using ChatGPT]
Attali, Y. (2018). Automatic item generation unleashed: An evaluation of a large-scale deployment of item models. In Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, June 27–30, 2018, Proceedings, Part I 19 (pp. 17-29). Springer International Publishing.
Abstract:
Automatic item generation represents a potential solution to the increased item development demands in this era of continuous testing. However, the use of test items that are automatically generated on-the-fly poses significant psychometric challenges for item calibration. The solution that has been suggested by a small but growing number of authors is to replace item calibration with item model (or family) calibration and to adopt a multilevel approach where items are nested within item models. Past research on the feasibility of this approach was limited to simulations or small-scale illustrations of its potential. The purpose of this study was to evaluate the results of a large-scale deployment of automatic item generation in a low-stakes adaptive testing context, with a large number of item models, and a very large number of randomly generated item instances.
Fay, D. M., Levy, R., & Mehta, V. (2018). Investigating psychometric isomorphism for traditional and performance‐based assessment. Journal of Educational Measurement, 55(1), 52-77.
Abstract:
A common practice in educational assessment is to construct multiple forms of an assessment that consists of tasks with similar psychometric properties. This study utilizes a Bayesian multilevel item response model and descriptive graphical representations to evaluate the psychometric similarity of variations of the same task. These approaches for describing the psychometric similarity of task variants were applied to two different types of assessments (one traditional assessment and one performance-based assessment) with markedly different response formats. Due to the general nature of the multilevel item response model and graphical approaches that were utilized, the methods used for this work can readily be applied to many assessment contexts for the purposes of evaluating the psychometric similarity of tasks.
Loe, B. S., Sun, L., Simonfy, F., & Doebler, P. (2018). Evaluating an automated number series item generator using linear logistic test models. Journal of Intelligence, 6(2), 20.
Abstract:
This study investigates the item properties of a newly developed Automatic Number Series Item Generator (ANSIG). The foundation of the ANSIG is based on five hypothesised cognitive operators. Thirteen item models were developed using the numGen R package and eleven were evaluated in this study. The 16-item ICAR (International Cognitive Ability Resource) short form ability test was used to evaluate construct validity. The Rasch Model and two Linear Logistic Test Model(s) (LLTM) were employed to estimate and predict the item parameters. Results indicate that a single factor determines the performance on tests composed of items generated by the ANSIG. Under the LLTM approach, all the cognitive operators were significant predictors of item difficulty. Moderate to high correlations were evident between the number series items and the ICAR test scores, with high correlation found for the ICAR Letter-Numeric-Series type items, suggesting adequate nomothetic span. Extended cognitive research is, nevertheless, essential for the automatic generation of an item pool with predictable psychometric properties.
Pandarova, I., Schmidt, T., Hartig, J., Boubekki, A., Jones, R. D., & Brefeld, U. (2019). Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language tutoring. International Journal of Artificial Intelligence in Education, 29, 342-367.
Abstract:
Advances in computer technology and artificial intelligence create opportunities for developing adaptive language learning technologies which are sensitive to individual learner characteristics. This paper focuses on one form of adaptivity in which the difficulty of learning content is dynamically adjusted to the learner’s evolving language ability. A pilot study is presented which aims to advance the (semi-)automatic difficulty scoring of grammar exercise items to be used in dynamic difficulty adaptation in an intelligent language tutoring system for practicing English tenses. In it, methods from item response theory and machine learning are combined with linguistic item analysis in order to calibrate the difficulty of an initial exercise pool of cued gap-filling items (CGFIs) and isolate CGFI features predictive of item difficulty. Multiple item features at the gap, context and CGFI levels are tested and relevant predictors are identified at all three levels. Our pilot regression models reach encouraging prediction accuracy levels which could, pending additional validation, enable the dynamic selection of newly generated items ranging from moderately easy to moderately difficult. The paper highlights further applications of the proposed methodology in the area of adapting language tutoring, item design and second language acquisition, and sketches out issues for future research.
Bai, Y. (2019). Cognitive Diagnostic Models-based Automatic Item Generation: Item Feature Exploration and Calibration Model Selection. Columbia University.
Abstract:
One of the most significant challenges for test developers is the creation and production of effective test items. Automatic Item Generation (AIG) presents a highly-efficient approach to developing items at a relatively low cost. Research is conducted on the AIG system to explore item characteristics (or features) that impact item parameters, and to develop the appropriate calibration models for the items generated. Current research has focused on developing the AIG system within a framework of Item Response Theory. However, there may be additional benefits to developing an AIG system based on Cognitive Diagnostic Models (CDM), since both AIG and CDM development start with developing cognitive models. It remains to be seen, however, to what extent the cognitive model of CDMs (Q-matrix) may be helpful to the AIG system.
Embretson, S. (2019). Explanatory Item Response Theory Models: Impact on Validity and Test Development?. In Quantitative Psychology: 83rd Annual Meeting of the Psychometric Society, New York, NY 2018 (pp. 1-11). Springer International Publishing.
Abstract:
Many explanatory item response theory (IRT) models have been developed since Fischer’s (Acta Psychologica 37:359–374, 1973) linear logistic test model was published. However, despite their applicability to typical test data, actual impact on test development and validation has been limited. The purpose of this chapter is to explicate the importance of explanatory IRT models in the context of a framework that interrelates the five aspects of validity (Embretson in Educ Meas Issues Pract 35, 6–22, 2016). In this framework, the response processes aspect of validity impacts other aspects. Studies on a fluid intelligence test are presented to illustrate the relevancy of explanatory IRT models to validity, as well as to test development.
Luecht, R. & Burke, M. (2020). Reconceptualizing items: From clones and automatic item generation to task model families.
Abstract:
Large-scale testing operations in education and the professions are increasingly facing enormous demands for high-quality test items to support more frequent testing. This is especially true for computer-based testing (CBT) and computerized adaptive testing (CAT), and especially where the limited capacity of secure test centers often requires extending test administration windows to weeks or even months. From a traditional assessment perspective, test items are unique entities that have static statistical characteristics in a population and conditions of use that seldom if ever change. That perspective is being challenged by modern principled assessment design and automatic item generation methods that reconceptualize items as instantiated units within larger task-and item-model families created from either tightly controlled, template driven item writing or computer algorithms that employ item-cloning templates or item shells/models. This chapter presents the genesis of these new developments with implications for the generalizability of scores and decisions. It introduces two important evaluative criteria—substantive isomorphism and statistical isomorphism—as a means of judging the quality of these task-model families. The chapter also discusses the serious need for ongoing, strong quality control mechanisms. (PsycInfo Database Record (c) 2022 APA, all rights reserved)
Kunze, K. L., Levy, R., & Mehta, V. (2020). Leveraging psychometric isomorphism in assessment development. International Journal of Quantitative Research in Education, 5(1), 1-15.
Abstract:
Two studies were conducted to examine ways in which isomorph item families can aid in the creation of exam forms and the assessment of student learning. Methods for selecting isomorph item families for specific uses are described. Study 1 examined the use of isomorphs on high-stakes final exam forms. Study 2 explored using isomorphs for lower-stakes comparisons between pre-tests and post-tests. Results of this work highlight the benefits of using isomorph item families and provide implications for both operational assessments in the Cisco Networking Academy Program, where this work takes place, and for the assessment community at large.
Luecht, R. M. (2020). Generating Performance‐Level Descriptors under a Principled Assessment Design Paradigm: An Example for Assessments under the Next‐Generation Science Standards. Educational Measurement: Issues and Practice, 39(4), 105-115.
Abstract:
The educational testing landscape is changing in many significant ways as evidence-based, principled assessment design (PAD) approaches are formally adopted. This article discusses the challenges and presents some score scale- and task-focused strategies for developing useful performance-level descriptors (PLDs) under a PAD approach. Details of the strategies are illustrated using an example based on the Next-Generation Science Standards (NGSS).
Wolfe, J. H. (2020). Real-Time Generative Adaptive Digit Span Testing.
Abstract:
This paper explores the subject of generative adaptive testing using the digit span test as an example. A large-sample study of computer-generated and administered digit-span items on Navy recruits showed an almost perfect correlation (.98-.99) between digit span length and IRT difficulty. Predicted IRT parameters can be used for adaptive testing using items generated in real-time. Our results suggest that the best research strategy for developing generative adaptive tests may be to start with the most elementary cognitive tasks and then build toward more complete psychometric models of complex mental tasks. The results of this study are sufficiently encouraging that the same research approach should be tried with other forms of memory span tests and more complex working memory tests, including tests for figures, colors, and words. The paper advances the conjecture that the test information function of a generative CAT system has a mathematical relationship to the model fit and the distribution of the model-specified item parameters, independent of the content domain of the test.
Jiang, S., Xiao, J., & Wang, C. (2023). On-the-fly parameter estimation based on item response theory in item-based adaptive learning systems. Behavior Research Methods, 55(6), 3260-3280.
Abstract:
Online learning systems are able to ofer customized content catered to individual learner’s needs, and have seen growing interest from industry and academia alike in recent years. In contrast to the traditional computerized adaptive testing setting, which has a well-calibrated item bank with new items added periodically, the online learning system has two unique features: (1) the number of items is large, and they have likely not gone through costly feld testing for item calibration; and (2) the individual’s ability may change as a result of learning. The Elo rating system has been recognized as an efective method for fast updating of item and person parameters in online learning systems to enable personalized learning. However, the updating parameter in Elo has to be tuned post hoc, and Elo is only suitable for the Rasch model. In this paper, we propose the use of a moment-matching Bayesian update algorithm to estimate item and person parameters on the fy. With sequentially updated item and person parameters, a modifed maximum posterior weighted information criterion (MPWI) is proposed to adaptively assign items to individuals. The Bayesian updated algorithm along with MPWI is validated in a simulated multiple-session online learning setting, and the results show that the new combo can achieve fast and reasonably accurate parameter estimations that are comparable to random selection, match-difculty selection, and traditional online calibration. Moreover, the combo can still function reasonably well with as low as 20% of items being pre-calibrated in the item bank.
Falcão, F., Pereira, D. M., Gonçalves, N., De Champlain, A., Costa, P., & Pêgo, J. M. (2023). A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation. Advances in Health Sciences Education, 28(5), 1441-1465.
Automatic Item Generation (AIG) refers to the process of using cognitive models to generate test items using computer modules. It is a new but rapidly evolving research area where cognitive and psychometric theory are combined into a digital framework. However, assessment of the item quality, usability, and validity of AIG relative to traditional item development methods lacks clarification. This paper takes a top-down strong theory approach to evaluate AIG in medical education. Two studies were conducted: Study I—participants with different levels of clinical knowledge and item writing experience developed medical test items both manually and through AIG. Both item types were compared in terms of quality and usability (efficiency and learnability); Study II—Automatically generated items were included in a summative exam in the content area of surgery. A psychometric analysis based on Item Response Theory inspected the validity and quality of the AIG-items. Items generated by AIG presented quality, evidence of validity, and were adequate for testing student’s knowledge. The time spent developing the contents for item generation (cognitive models) and the number of items generated did not vary considering the participants’ item writing experience or clinical knowledge. AIG produces numerous high-quality items in a fast, economical, and easy-to-learn process, even for inexperienced item writers without clinical training. Medical schools may benefit from a substantial improvement in cost-efficiency in developing test items by using AIG. Item writing flaws can be significantly reduced thanks to the application of AIG’s models, thus generating test items capable of accurately gauging students’ knowledge.
Tian, C., & Choi, J. (2023). The Impact of Item Model Parameter Variations on Person Parameter Estimation in Computerized Adaptive Testing With Automatically Generated Items. Applied Psychological Measurement, 47(4), 275-290.
Abstract:
Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small, medium, or large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. A related sibling model is used for data generation and an identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated scores and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation being random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that “fake-easy” and “fake-difficult” item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency.
Laverghetta Jr, A., & Licato, J. (2023, July). Generating better items for cognitive assessments using large language models. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 414-428).
Abstract:
Writing high-quality test questions (items) is critical to building educational measures but has traditionally been a time-consuming process. One promising avenue for alleviating this is automated item generation, whereby methods from artificial intelligence (AI) are used to generate new items with minimal human intervention. Researchers have explored using large language models (LLMs) to generate new items with equivalent psychometric properties to human-written ones. But can LLMs generate items with improved psychometric properties, even when existing items have poor validity evidence? We investigate this using items from a natural language inference (NLI) dataset. We develop a novel prompting strategy based on selecting items with both the best and worst properties to use in the prompt and use GPT-3 to generate new NLI items. We find that the GPT-3 items show improved psychometric properties in many cases, whilst also possessing good content, convergent, and discriminant validity evidence. Collectively, our results demonstrate the potential of employing LLMs to ease the item development process and suggest that the careful use of prompting may allow for iterative improvement of item quality.
Conclusion
At Caveon, our thoughts on lockdown browsers stem from our dedication to providing the most effective and user-friendly exam security solutions. We recognize the value that lockdown browsers can offer in certain situations, but we believe that there are now new alternatives that address many of their limitations while enhancing the overall user experience.
As the field of exam security continues to evolve, Caveon remains committed to developing and implementing innovative strategies that ensure the integrity and fairness of assessments. By combining advanced technology with a comprehensive approach to security, we aim to support our clients and our industry in maintaining and promoting the highest standards of exam integrity.
READY TO TALK TO AN EXAM SECURITY EXPERT?
Reach out and tell us about your organization’s needs today!