Impact of item parameter drift on test equating and proficiency estimates
When test equating is implemented, the effects of item parameter drift (IPD), especially on the linking items with the anchor test design is expected to cause flaws in the measurement. However, an important question that has not yet been examined is how much IPD is allowed until the effect is consequential. To answer this overarching question, three Monte-Carlo simulation studies were conducted.
In the first study, titled 'Impact of unidirectional IPD on test equating and proficiency estimates,' the indirect effect of IPD on proficiency estimates (through its effect on test equating designs that use linking items containing IPD) was examined. The results with the regression line-like plots provided a comprehensive illustration of the relationship between IPD and its consequences, which can be used as an important guideline for practitioners when IPD is expected in testing.
In the second study, titled 'Impact of multidirectional IPD on test equating and proficiency estimates,' the impact of different combinations of linking items with various multidirectional IPD on the test equating procedure was investigated for three popular scaling methods (mean-mean, mean-sigma, and TCC method). It was hypothesized that multidirectional IPD would influence the amount of random error observed in the linking while the effect of systematic error would be minimal. The study found the results confirming the hypothesis and also found different combinations of multidimensional IPD results in different levels of impact even with the same total amount of IPD.
The third study, titled 'Impact of IPD on pseudo-guessing parameter estimates and test equating,' examined how serious the consequences are if c-parameters are not transformed in the test equating procedure when IPD exists. Three new item calibration strategies to put c-parameter estimates on the same scale across tests were proposed. The results indicated that the consequences of IPD with various calibration strategies and scaling methods were not substantially different when the external linking design was used, but the study found a choice of calibration method and scaling method could result in different outputs when the internal linking design and/or different cut scores were used.
Monte Carlo simulation