An empirical comparison of the Bookmark and modified Angoff standard setting methods and the impact on student classification
No Child Left Behind has increased the importance of properly classifying students into performance level categories due to the ramifications associated with not making Annual Yearly Progress (AYP). States have the opportunity to create their own standards and conduct their own standard setting sessions. Two of the more popular methods used are the Angoff method and the Bookmark method. Reckase (2005) simulated both methods and found that the Bookmark method had negative bias associated with the method while the Angoff method did not produce any bias. This study simulated the Angoff and Bookmark methods similarly to Reckase's (2005) article and also added a different simulated bookmark method, which was used to simulate the Bookmark method more accurately. The study included six independent variables: standard setting method, cutscores, central tendency, number of panelists, item density, and bookmark placement. The second part of the study applied the results of the simulations to real data to determine the impact on student classification, based on the different conditions.
Overall, the results of the simulation study indicated the Angoff simulated method was able to recover the parameters extremely well, while the second Bookmark simulated method recovered the item parameters better than the original Bookmark simulated method. However, in certain conditions, the second Bookmark simulated method was able to recover the item parameters as well as the Angoff method.
The simulated cutscores were then used to place students into performance level categories based on students' ethnicity, gender, socioeconomic status, and interactions. The results indicated that the simulated Angoff method and the second Bookmark simulated method were most similar when the median was used as the central tendency for the Bookmark method and the panelists' error was large.
The simulated Angoff method was the most robust method compared to the two simulated Bookmark methods. The implications and suggested future research are discussed.