Shotgun optical mapping: A comprehensive statistical and computational analysis
Shotgun Optical Mapping is a whole-genome high-throughput restriction mapping technology in which restriction maps of single DNA molecules are collected using high-magnification digital microscopy. Optical Mapping has a wide spectrum of genomic applications and thus is important subject for analysis. This thesis concerns statistical and computational aspects of Optical Mapping data. Specifically, we address optical map alignments, whole-genome de novo assembly of optical maps, and application to analysis of genomic differences.
We start by statistical modelling of Optical Mapping measurements, and validate that our models provide accurate fit to real data. The measurement distributions are then used to derive a probabilistic alignment score which we use to calculate optical-to-optical map alignments and optical-to-reference map alignments. The advantage of our approach is that it guarantees the maximal discrimination between the spurious and true alignments and also does not require ad hoc choice for the scoring parameters.
Then, we present an efficient method for the whole-genome assembly of optical maps that allows to produce accurate restriction maps of the relevant genomes in feasible time. Our assembly method follows Overlap-Layout-Consensus approach that was demonstrated to be very effective in sequence assembly problems. We also present a special error correction method that we use to eliminate spurious overlaps and chimeric maps. Application of our assembler to several optical map datasets demonstrates that it is capable to handle mammalian-sized genomes and yield accurate restriction maps provided sufficient genomic coverage.
We also demonstrate how Optical Mapping data can be used for identification of certain class of differences between genomes, specifically, insertions and deletions exceeding 5000 base pairs, and restriction fragment length polymorphisms. We develop statistical framework for analysis of these differences based on hypothesis testing approach and demonstrate how the differences can be assessed statistically.