Evaluating the consistency and accuracy of proficiency classifications using item response theory

2006 2006

Other formats: Order a copy

Abstract (summary)

As demanded by the No Child Left Behind (NCLB) legislation, state-mandated testing has increased dramatically, and almost all of these tests report examinee's performance in terms of several ordered proficiency categories. Like licensure exams, these assessments often have high-stakes consequences, such as graduation requirements and school accountability. It goes without saying that we want these tests to be of high quality, and the quality of these test instruments can be assessed, in part, through the decision accuracy (DA) and decision consistency (DC) indices.

With the popularization of IRT, an increasing number of tests are adopting IRT for test development, test score equating and all other data analyses, which naturally calls for approaches to evaluating DA and DC in the framework of IRT. However, it is still common to observe the practice of carrying out all data analyses in IRT while reporting DA and DC indices derived in the framework of CTT. This situation testifies to the necessity to the exploration of possibilities to quantify DA and DC under IRT.

The current project addressed several possible methods for estimating DA and DC in the framework of IRT with the specific focus on tests involving both dichotomous and polytomous items. It consisted of several simulation studies in which the all IRT methods introduced were valuated with simulated data, and all methods introduced were also be applied in a real data context to demonstrate their application in practice. Overall, the results from this study provided evidence that would support the use of the 3 IRT methods introduced in this project in estimating DA and DC indices in most of the simulated situations, and in most of the cases the 3 IRT methods produced results that were close to the "true" DA and DC values, and consistent results to (sometimes even better results than) those from the commonly used L&L method. It seems the IRT methods showed more robustness on the distribution shapes than on the test length. Their implications to educational measurement and some directions for future studies in this area were also discussed.

Indexing (details)

Educational evaluation
0288: Educational evaluation
Identifier / keyword
Education; High-stakes testing; Item response; Proficiency classifications
Evaluating the consistency and accuracy of proficiency classifications using item response theory
Li, Shuhong
Number of pages
Publication year
Degree date
School code
DAI-A 67/04, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Sireci, Stephen G.
University of Massachusetts Amherst
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.