The statistical and molecular logic of gene expression patterns in <i>Caenorhabditis elegans</i>
Gene regulation uses transcriptional control systems with a molecular logic we seek to understand. Genome-scale sequence and expression data increasingly make it possible to use genomic patterns in sequences and gene expression levels to reveal the logic of transcriptional regulation. In this dissertation, two approaches to understanding transcriptional regulation are developed and applied. First, we describe a novel method for identifying phylogenetic conservation in genomic transcriptional patterns. We use this new approach to identify gene expression programs in aging, development, and mRNA degradation that are shared by organisms as diverse as the nematode Caenorhabiditis elegans, the fruit fly Drosophila melanogaster , the yeast Saccharomyces cerevisiae, and the human Homo sapiens. We use this approach to search databases of gene expression patterns to identify relationships among the physiological programs of diverse organisms. Second, we use a statistical approach, probabilistic segmentation, to identify candidate transcriptional control sequences in the promoters of a large gene family, the chemosensory receptor genes in C. elegans . We identify many new candidate transcriptional control sequences and show that one of these is a novel E-box motif that confers expression in the ADL chemosensory neurons.