HAPPY mapping – a tool for genome finishing
Alan T. Bankier, Helen F.
Spriggs,
Bernard A. Konfortov, Justin A. Pachebat and Paul H. Dear
The Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH
ABSTRACT
The majority of small genome sequences, including
those of most parasites, are assembled using a shotgun strategy. However,
shotgun methods alone are scarcely ever capable of producing a complete and
finished genome sequence: cloning biases, sequencing problems and repetitive
regions leave many gaps and potential mis-assemblies. For these reasons, a
genome that has only been shotgunned will be left in a large number of
un-connected contigs.
A collection of contigs is sufficient for finding many of the genes in the
organism, but is inadequate for other purposes. No reliable estimate of the
degree of completeness of the sequence can be made, and many genes lying at (or
beyond) contig ends will have been missed or mis-predicted. It becomes
impossible to comprehensively catalogue the organism's genes, or to infer the
absence of a specific orthologue simply from its absence in the contig set. Nor
can any conclusions about genome organisation or long-range synteny be drawn. In
short, a shotgun project tends to produce a pile of unbound pages rather than a
genomic atlas.
Genome finishing (ordering and joining contigs to produce a complete sequence)
is therefore one of the most challenging but important aspects of sequencing. It
is usually frustrating because the resources used for finishing – additional
sequencing templates and larger-insert clones – are essentially similar to
those used in the shotgun, and tend to have the same limitations.
HAPPY mapping is rapid in vitro technique that can be used to make extremely
accurate maps, which allow shotgun contigs to be located precisely in the genome.
Once this has been done, it becomes far simpler to close the remaining sequence
gaps. Even where gaps cannot be closed (for example, because a short region is
unsequenceable), the map provides a framework within which the sizes and
locations of the gaps are known.
HAPPY mapping works by the direct analysis of genomic DNA, and does not involve
cloning or the creation of any prior resources. Its freedom from 'biological'
steps means that it is applicable to any genome, and is not adversely influenced
by 'difficult' sequences, repetitive regions or other peculiarities of the
genome. The cost and time required to make an accurate HAPPY map of a genome is
normally a fraction of that expended on shotgun sequencing, and enables rapid
completion of the genome sequence.
The presentation will give details of HAPPY mapping, and of how it has been
applied successfully to a range of genomes including Dictyostelium,
Cryptosporidium, Eimeria and others.