Abstract/Details

Machine learning approaches to understanding the genetic basis of complex traits


2009 2009

Other formats: Order a copy

Abstract (summary)

Humans differ in many observable qualities, termed 'phenotypes', ranging from appearance to disease susceptibility. Many phenotypes are largely determined by each individual's specific 'genotype', stored in the 3.2 billion bases of his or her genome sequence. Deciphering the genome sequence by finding which sequence variations affect a certain phenotype would have a great impact on human life. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual's sequence information on a genome-wide scale. Classical approaches have focused on finding a significant correlation between a sequence variation S and a particular phenotype P from the genotype and phenotype data. However, it is difficult to directly infer such causal relationships between S and P from limited data, because of: (1) the complexity of cellular mechanisms, through which S causes P, and (2) environmental factors that are not necessarily measurable.

In this dissertation, we present machine learning approaches that address these challenges by explicitly modeling an intermediate process between the genotype and phenotype. More specifically, we model the genetic regulatory mechanisms that are induced by sequence variations and that lead to the phenotype, and we learn the model from genome-wide mRNA expression measurements. Using the learned model, we aim to generate a finer-grained hypothesis such as: a sequence variation S induces regulatory interactions R, which lead to changes in the phenotype P.

To achieve this goal, our approach utilizes sophisticated machine learning techniques that can robustly select relevant biological interactions among a large number of possible interactions and can efficiently solve the optimization problem from a large amount of data. For example, our 'meta-prior algorithm' can learn the regulatory potential of each sequence variation based on their intrinsic characteristics, and this improvement helps to identify a true causal sequence variation among a large number of variations in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Some of the machine learning techniques developed for biological problems are generally applicable to a wideranging set of applications such as collaborative filtering and natural language processing.

Indexing (details)


Subject
Bioinformatics;
Artificial intelligence;
Computer science
Classification
0715: Bioinformatics
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Biological sciences; Complex traits; Computational biology; Gene regulation; Machine learning; Sequence variation
Title
Machine learning approaches to understanding the genetic basis of complex traits
Author
Lee, Su-In
Number of pages
191
Publication year
2009
Degree date
2009
School code
0212
Source
DAI-B 70/01, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780549989769
University/institution
Stanford University
University location
United States -- California
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3343653
ProQuest document ID
305015565
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/305015565
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.