Inference of haplotype effects in *case-control studies using unphased genotype and environmental data
A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The method is an extension of the approach of Epstein and Satten  in testing and estimating the effects of haplotype. We also improved our approach by incorporating geneenvironment interaction. Three models were developed, including the ACI model for qualitative environmental covariates, the NACI model for handling the data when the haplotypes are correlated with environmental covariates, and the QC model for quantitative covariates.
Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size were estimated using an Expectation Conditional-Maximization algorithm proposed by Meng and Rubin . Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted in three sample sizes (i.e., 400, 800, & 1600) for three different genetic effect models, including dominant effect, recessive effect, and additive effect. In addition, the simulation studies assessed the model performance under several sub-ideal circumstances, including departure from the Hardy-Weinberg Equilibrium (HWE) assumption and existence of correlation between haplotype and environmental data. We also evaluated the model robustness by comparing long versus short haplotypes and common versus rare disease haplotypes.
The models yielded unbiased parameter estimates, proper type I errors (i.e., nearly 0.05), and true β coverage probabilities (i.e., nearly 95%) for recessive, dominant, and additive effect models, respectively. The models performed well with small or large sample sizes, short or long haplotypes, and rare or common disease haplotypes. The models were robust to moderate departure from the HWE assumption. When the haplotypes were correlated with environmental covariates, the NACI model performed better than the ACI model in terms of type I error and accuracy of the parameter estimates.