In silico bacterial gene regulatory network reconstruction from sequence
DNA sequencing techniques have evolved to the point where one can sequence millions of bases per minute, while our capacity to use this information has been left behind. One particularly notorious example is in the area of gene regulatory networks. A molecular study of gene regulation proceeds one protein at a time, requiring bench scientists months of work purifying transcription factors and performing DNA footprinting studies. Massive scale options like ChIP-Seq and microarrays are a step up, but still require considerable resources in terms of manpower and materials. While computational biologists have developed methods to predict protein function from sequence, gene locations from sequence, and even metabolic networks from sequence, the space of regulatory network reconstruction from sequence remains virtually untouched. Part of the reason comes from the fact that the components of a regulatory interaction, such as transcription factors and binding sites, are difficult to detect. The other, more prominent reason, is that there exists no "recognition code" to determine which transcription factors will bind which sites. I've created a pipeline to reconstruct regulatory networks starting from an unannotated complete genomic sequence for a prokaryotic organism. The pipeline predicts necessary information, such as gene locations and transcription factor sequences, using custom tools and third party software. The core step is to determine the likelihood of interaction between a TF and a binding site using a black box style recognition code developed by applying machine learning methods to databases of prokaryotic regulatory interactions. I show how one can use this pipeline to reconstruct the virtually unknown regulatory network of Bacillus anthracis.
0541: Biomedical engineering