A higher-order item response model: Development and application
This study was designed to address the issue of ability estimation in large-scale assessment settings. Tests consisting of different domains measuring specific content or skill objectives are common in many large-scale assessments. Although multidimensional in nature, these tests are assumed to be unidimensional as a whole or within specific domains for implementation of the conventional unidimensional item response theory (CU-IRT) models. The resulting overall ability estimate may be inappropriate if the unidimensionality assumption is violated, or the domain ability estimates may be not sufficiently reliable when the number of items in a domain is small. Through formulation of a one-factor higher-order IRT (HO-IRT) model, the current study specified the overall and multiple domain-specific abilities in the same model. The HO-IRT model is a general framework that subsumes the CU-IRT estimation as a special case. However, the HO-IRT estimation is different in that it provides an overall ability estimate that does not assume unidimensionality, and domain ability estimates that are more reliable. Using Markov chain Monte Carlo (MCMC) techniques in a hierarchical Bayesian framework, the overall and domain-specific abilities, and their correlations can be estimated simultaneously. The feasibility and effectiveness of the proposed HO-IRT model was investigated under varied conditions in a simulation study. The HO-IRT overall ability estimate is similar to the CU-IRT estimate when domain abilities are correlated, but is more accurate when domain abilities are uncorrelated. When abilities are correlated, the HO-IRT domain ability estimates are more efficient than the CU-IRT estimates.
In addition, the usefulness of the proposed HO-IRT model was examined through its application to the Trends in International Mathematics and Science Study (TIMSS) 2003 eighth-grade mathematics test. Using the U.S. and Japan data, hierarchical linear modeling analyses of teacher quality and student achievement were conducted using both the HO-IRT estimates and TIMSS reported scores. Analyses using the domain scores revealed some dynamics between teacher quality and student achievement that are different from those obtained using the total test score. Due to their scoring algorithm, the HO-IRT estimates and the TIMSS plausible values showed different magnitudes in the associations between teacher quality and student mathematics achievement.
0288: Educational evaluation
0525: Educational psychology