Specification searches in multilevel structural equation modeling: A Monte Carlo investigation
Sample data obtained via cluster sampling rather than simple random sampling requires the use of specialized multilevel statistical analysis techniques, such as multilevel structural equation modeling, to model within-cluster and between-cluster variation appropriately. Properly modeling both within-cluster and between-cluster variation could be of substantive interest in numerous applied research settings. However, applied researchers typically test only a within-cluster (i.e., individual difference) theory; specifying a between-cluster model in the absence of theory involves a specification search.
Consistent with previous specification search studies, this dissertation manipulated the following independent variables: starting model, search method, and method of Type-I error control as independent variables. Further, consistent with previous multilevel research studies, this dissertation also manipulated the number of clusters, cluster sample size, and intraclass correlation magnitude as independent variables. The main dependent variable of interest was which combination of start model, search type, and method of Type-I error control best recovered the population between-cluster model. Additional dependent variables were also examined to assess the precision of specification search efforts.
Results showed that a "saturated" start model, univariate specification search, and no Type-I error control best recovered the population between-cluster model. However, this specification search method recovered the population model in less than one in five attempts at the largest sample size. A majority of the specification searches recovered the population model in less than five percent of all attempts, and the remaining specification search efforts failed to recover the population model under any conditions. Overall, specification search efforts were more likely to produce a notably misspecified model with biased parameter estimates, an under-identified model, or an inadmissible solution.
Model complexity, non-normally distributed data, and within-cluster model misspecification were not manipulated as independent variables in this dissertation. Further, the current results were based on a multilevel path model that may or may not generalize to other multilevel designs, such as confirmatory factor analyses and full structural equation models. Model complexity, non-normally distributed data, within-cluster model misspecification, and advanced analysis designs could be incorporated in future multilevel specification search studies by adapting the models used in previous non-multilevel specification search investigations.
0525: Educational psychology