Equipercentile equating software testing

Unlike polytomous irt models, the trt model yielded quite stable equating results across different equating methods investigated in their study. Kernel equating ke is a powerful, modern and unified approach to test equating. Equipercentile equating with equal interval scores brad hanson february 10, 1993 revised 5295 let x and y be discrete random variables representing the distribution of scores on two forms of a test labeled form x and form y, respectively in the some population. Finally, examples of currently available software will be inventoried. An analytical procedure for the equipercentile method of equating tests. Hypothesis testing of equating differences in the ke. The major testing companies of course have the software they need for scaling and equating but software available for researchers and graduate students is very limited. The general form of the levine function will be soon available in ke software at. All of them can be accomplished with our industryleading software xcalibre, though conversion equating requires an additional software called irteq. Effect of item response theory irt model selection on. Computer programs college of education university of iowa. Conducts linear and equipercentile equating under the commonitem nonequivalent groups design.

The scores on these multiple forms were equated using the equipercentile equating method the legally sanctioned format and the merit list created. He is the inventor of jmetrik, an open source psychometric software program. The proposed procedure requires a approximating the empirical score distributions of the two forms by means of the first terms of an infinite series, and b contrasting the results obtained when only the first two moments are used i. An analytical procedure for the equipercentile method of. Annual meeting of the national council on measurement in education. Equating determines for each score on the new form the corresponding score on the reference form. For practitioners, the book provides a splendid introduction to the topics considered. In the case of the common pupils design, nfer developed its own software to. Approach 1 includes a commonitem linking strategy using item response theory irt, with external anchor sets embedded in the new test administration. The test form to which we are equating the new form.

Comparison of approaches for equating different versions of. The assumptions and the formulas for the chain equipercentile equating function. The table below shows how the test equating process works. In addition to statistical procedures, successful equating, scaling and linking involves many aspects of testing, including procedures to develop tests, to administer and score tests and to interpret. The most complete coverage of the entire field of score equating and score linking in general has been provided by kolen and brennan 2004.

Patrick meyer is an assistant professor in the curry school of education at the university of virginia. Several other studies, including a generalizability study and an equipercentile equating study, were conducted to determine the equivalency between the two forms. This booklet grew out of a halfday class on equating that author samuel livingston teaches for new statistical staff at educational testing service ets. An equipercentile version of the levine linear observedscore. Methods and practices statistics for social and behavioral sciences kolen, michael j. Statistical equating with measures of oral reading fluency. Equating in smallscale language testing programs geoffrey. Test score equating is used to compare different test scores from different test forms. In observed score equating, the characteristics of score distributions are set equal for a specified population of examinees angoff, 1971. Fair and equitable measurement of student learning in moocs. Considering that irt data simulation might unequally favor irt equating methods, pseudo tests and pseudo groups were also constructed to make equating results comparable with those. References of noncommercial software for irt analyses nina.

Equating unl digital commons university of nebraskalincoln. Two local methods for observedscore equating are applied to the problem of equating an adaptive test to a linear test. Pdf equating in smallscale language testing programs. A didactic approach to the use of irt truescore equating. Equipercentile equating produced scores on a 30point scale in all studies. It is based on a flexible family of equipercentilelike equating functions and contains the linear equating function as a special case. Web table 2 provides score equivalencies for each equated pointtotal version of the mmse. A comparison of linear, equipercentile, and fipc equating. The class is a nonmathematical introduction to the topic. A common approach is known as equipercentile equating.

Impact of group differences on equating accuracy and the adequacy of equating assumptions. Equating scores from adaptive to linear tests iacat. You can equate forms with classical test theory ctt or item response theory irt. The new edition of test equating, scaling, and linking. An investigation into the test equating methods used during 2006. In largescale testing programs, various equating methods are available to ensure the. Kernel and traditional equipercentile equating with degrees of. The book is appealing to anyone interested in the topic of equating, scaling, and linking. Equipercentile equating is typically done by computer, though it is relatively easily done by. While equating methods research has flourished because of the need for technically sound designs and analyses, software development has been limited. Prior use of the equipercentile method of test equating was based on a graphic procedure which is tedious, subject to smoothing errors, and nonanalytical.

The primary purpose of the center for advanced studies in measurement and assessment casma is to pursue researchbased initiatives that lead to advancements in the methodology and practice of educational measurement and assessment. The kernel levine equipercentile observedscore equating. A new procedure for comparing results of linear and equipercentile equating methods is presented and illustrated. Ctt methods include tucker, levine, and equipercentile. Center for advanced studies in measurement and assessment. Unlike with item response theory, equating based on classical test theory is somewhat distinct from scaling. The class consists of illustrated lectures, interspersed with selftests for the participants. The computer programs listed below can be used to conduct many of the equating analyses described in kolen and brennan 2004. For the equipercentile equating property eep, the converted scores on form x have the same distribution as scores on form y.

The results show that the 2pl and the trt approaches produce comparable results that more closely agree with the results of the equipercentile method than the grm does. The class is a nonmathematical introduction to the topic, emphasizing conceptual understanding and practical applications. Equipercentile equating determines the equating relationship as one where a score could have an equivalent percentile on either form. This booklet grew out of a halfday class on equating that i teach for new statistical staff at educational testing service ets. Pdf language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores. An equipercentile version of the levine linear observed. Linking two assessment systems using commonitem irt. This book provides an introduction to test equating, scaling and linking, including those concepts and practical issues that are critical for developers and all other testing professionals. Methods of equating utilize functions to transform scores on two or more versions of a test, so that they can be compared and used interchangeably. Frequently asked questions equating of scores on multiple forms. A sas program for calculating equivalent scores using the equipercentile method. There are three general approaches to irt equating.

Statistical methods for test equating computer software manual. The r package equate albano, 2014 is free, opensource software for conducting observedscore linking and equating under singlegroup, equivalentgroups, and nonequivalentgroups designs with one. The merit list is the combination of different batches taking different test forms and from different districts each. Impact of group differences on equating accuracy and the. In the descriptions that follow, forms are referred to as x and y, where scores on x will be equated to the scale of y. Equating is an important step in the process of collecting, analyzing, and reporting test scores in any program of assessment. Software enabling complex machine learning has become widely. An r package for observedscore linking and equating.

However, one of the reasons that irt was invented was that equating with ctt was very weak. The advantages and disadvantages of each equating method are discussed along with the conditions conducive to satisfactory equating. A comparison of irt observed score kernel equating and. Aug, 2014 the traditional equipercentile method was used as an evaluation baseline. Abd in applied mathematics and computational science, 2008. Estimates bootstrap standard errors of linear equating and equipercentile equating under the random groups design. An equipercentile version of the levine linear observedscore equating function using the methods of kernel equating alina a. Frequently asked questions equating of scores on multiple. Any equipercentile equating method has five steps or parts. Equipercentile equating involves percentile rank or score to be found for all scores in each of the forms and of all forms and clubbed together to generate a merit list.

A query was sent for seeking districtwise merit list. Dementia rating scale2 drs2 publisher, online testing. Equipercentile equating was not necessary for the european continent, because all contributing studies administered versions with 30point totals. Methods and practices is a welcome update to a book which has become a classic in equating and linking. Several methods have been developed to conduct equating. Ir provides unlimited scoring and report generation after handentry of drs2 and drs2. Approach 2 is an equipercentile linking method kolen and brennan, 2004 where the scale scores from the new administration are linked to those of the old administration through percentile ranks.

Pie for pc console, pie for pc gui, pie for mac os9, pie for mac os10 conducts irt true and observed scoring equating for dichotomously scored tests. A handful of statistical packages are available for linking and equating test forms. Item response theory irt observed score kernel equating was evaluated and compared with equipercentile equating, irt observed score equating, and kernel equating methods by varying the sample size and test length. Equating in smallscale language testing programs sage journals. Bayesian nonparametric estimation of test equating functions. Therefore, the construction and administration of alternate forms of the same test is a necessary requirement for operating these testing programs cook, 2007. A comparison of linear, equipercentile, and fipc equating methods across multidimensional test forms for nonequivalent groups. This article presents a sas program that uses equipercentile equating to derive equated scores on two. We give the assumptions for the two methods in order to emphasize that all equating methods require some nontestable assumptions to be fulfilled. Irteq windows application that implements irt scaling and. Test equating is the statistical process that accounts for the differences in test difficulty and then adjusts the scale of the current test administration so that the same criterion standard can be used.

72 858 145 424 1211 305 707 1216 1044 1319 1267 208 1156 943 760 1153 1437 357 409 1103 488 376 1047 1000 1090 694 1099 1148 1473 27 1493 715 62 676 563 1216 170 72 546 41 236