Statistical models for removing microarray batch effects and analyzing genome tiling microarrays

2007 2007

Other formats: Order a copy

Abstract (summary)

This work is a presentation of novel statistical methods for preprocessing and downstream analysis of data from applications on microarrays. One topic discussed in this work is a method for preprocessing microarray data for non-biological variation, or batch effects, which are commonly observed across multiple batches of microarray experiments. The ability to combine microarray data sets is advantageous to researchers to increase statistical power in studies where logistical considerations restrict sample size or require the sequential hybridization of arrays. In this work, parametric and nonparametric empirical Bayes frameworks are presented for adjusting data for batch effects that are robust to outliers in small sample sizes. The method is illustrated using example data sets and show that the method is justifiable and useful in practice.

The other focus of this work is the development of methods for preprocessing and analyzing data from applications on one and two color genome tiling microarrays. Commercial tiling array platforms have been developed that file the non-repetitive genomes of many organisms. These tiling array experiments produce massive correlated data sets which are full of experimental artifacts; presenting many challenges to researchers that require innovative analysis methods and efficient computational algorithms. This work presents a two-step model-based approach for analyzing tiling microarray data from one and two color platforms. In the first step, the data are pre-processed using a method for single array normalization and background adjustment, called standardization, that utilizes probe sequence to remove a large portion of the variation in the data which can be determined to be sample or probe bias. The second step, the localization of active transcripts or protein binding regions, is accomplished using moving window-based scan statistics or a doubly stochastic latent variable Bayesian analysis method, utilizing a continuous-time Hidden Markov Model that accounts for genomic distance between probes and is robust to cross-hybridized and non-responsive probes. These methods are illustrated on simulated and real-data examples, showing that the methods are very powerful and can be used on a single sample and without control experiments, thus defraying some of the tremendous overhead cost of conducting experiments on tiling arrays.

Indexing (details)

0308: Biostatistics
0715: Bioinformatics
Identifier / keyword
Biological sciences; Batch effects; Tiling microarrays
Statistical models for removing microarray batch effects and analyzing genome tiling microarrays
Johnson, William Evan
Number of pages
Publication year
Degree date
School code
DAI-B 68/05, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Liu, Jun S.; Liu, X. Shirley
Harvard University
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.