Abstract/Details

Finding conserved patterns in biological sequences, networks and genomes


2007 2007

Other formats: Order a copy

Abstract (summary)

Biological patterns are widely used for identifying biologically interesting regions within macromolecules, classifying biological objects, predicting functions and studying evolution. Good pattern finding algorithms will help biologists to formulate and validate hypotheses in an attempt to obtain important insights into the complex mechanisms of living things.

In this dissertation, we aim to improve and develop algorithms for five biological pattern finding problems. For the multiple sequence alignment problem, we propose an alternative formulation in which a final alignment is obtained by preserving pair-wise alignments specified by edges of a given tree. In contrast with traditional NP-hard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while having very good accuracy.

For the path matching problem, we take advantage of the linearity of the query path to reduce the problem to finding a longest weighted path in a directed acyclic graph. We can find k paths with top scores in a network from the query path in polynomial time. As many biological pathways are not linear, our graph matching approach allows a non-linear graph query to be given. Our graph matching formulation overcomes the common weakness of previous approaches that there is no guarantee on the quality of the results.

For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparisons of clusters of different sizes. We explore both a restricted version which requires that orthologous genes are strictly ordered within each cluster, and the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome. We solve the first problem in polynomial time and develop practical exact algorithms for the second one.

In the gene cluster querying problem, based on a querying strategy, we propose an efficient approach for investigating clustering of related genes across multiple genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial genomes, we show that our algorithm is efficient enough to study gene clusters across hundreds of genomes.

Indexing (details)


Subject
Bioinformatics;
Computer science
Classification
0715: Bioinformatics
0984: Computer science
Identifier / keyword
Applied sciences; Biological sciences; Biological networks; Biological patterns; Gene clusters; Graph matching; Multiple sequence alignment; Path matching
Title
Finding conserved patterns in biological sequences, networks and genomes
Author
Yang, Qingwu
Number of pages
124
Publication year
2007
Degree date
2007
School code
0803
Source
DAI-B 69/01, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780549424123
Advisor
Sze, Sing-Hoi
University/institution
Texas A&M University
University location
United States -- Texas
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3296593
ProQuest document ID
304731265
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/304731265
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.