Equating high-stakes educational measurements: A study of design and consequences
The practice of equating educational and psychological tests to create comparable and interchangeable scores is increasingly becoming appealing to most testing and credentialing agencies. However, the Malawi National Examinations Board (MANEB) and many other testing organizations in Africa and Europe do not conduct equating and the consequences of not equating tests have not been clearly documented. Furthermore, there are no proper equating designs for some agencies to employ because they administer tests annually to different examinee' populations and they disclose all items after each administration. Therefore, the purposes of this study were to: (1) determine whether it was necessary to equate MANEB tests; (2) investigate consequences of not equating educational tests; and (3) explore the possibility of using an external anchor test that is administered separately from the target tests to equate scores.
The study used 2003, 2004, and 2005 Primary School Leaving Certificate (PSLCE) Mathematics scores for two randomly equivalent groups of eighth grade examinees drawn from 12 primary schools in the Zomba district in Malawi. In the first administration, group A took the 2004 test while group B took the 2003 form. In the second administration both groups took an external anchor test and five weeks later, they both took the 2005 test. Data were analyzed using identity and log-linear methods, t-tests, decision consistency analyses, classification consistency analyses, and by computing reduction in uncertainty, and the root mean square difference indices. Both linear and post-smoothed equipercentile methods were used to equate test scores.
The study revealed that: (1) score distributions and test difficulties were dissimilar across test forms signifying that equating is necessary; (2) classification of students into grade categories across forms were different before equating, but similar after equating; and (3) the external anchor test design performed in the same way as the random groups design.
The results suggest that MANEB should equate tests scores to improve consistency of decisions and to match their distributions and difficulty levels across forms. Given the current policy of exam discloser, the use of an external anchor test that is administered separately from the operational form to equate score is recommended.