Contributed Papers: Oral Presentations
Molecular biology and Biochemistry |
THE SATELLITE
DNA OF Eimeria TENELLA: A QUANTITATIVE AND QUALITATIVE
STUDY
T.J.P. Sobreira1,*, A. Gruber1,**,
A.M. Durham2 & The
Eimeria tenella Genome Consortium3
1Faculty of Veterinary Medicine and Zootechny, 2Institute
of
Mathematics and Statistics, University of São
Paulo, Brazil;
A full list of authors is available at the web address
http://www.lbm.fmvz.usp.br/Eimeria/consortium/members.html
*CNPq fellow; **argruber@usp.br
Eimeria tenella
genome has a complexity of 58 Mb, distributed in 14
chromosomes ranging from 1 to more than 7 Mb. One
of the most interesting features of this genome is
the very high tandem repeat content, with the triplet
(GCA) and heptamer (TTTAGGG) repeats constituting
the most predominant repetitive units. The genome
sequencing started in 2002 and a set of circa 800,000
shotgun reads was generated at the Wellcome Trust
Sanger Institute, and made available on the internet
at the address ftp://ftp.sanger.ac.uk/pub/pathogens/Eimeria/tenella.
Aiming at characterizing and quantitating the whole
satellite content of E. tenella genome, our group
in Brazil developed TRAP, the Tandem Repeat Analysis
Program (Sobreira, T.J.P.; Durham, A.M. & Gruber,
A. – manuscript in preparation). TRAP is a companion
tool for Tandem Repeats Finder (Benson, 1999), a popular
worldwide used application for ab initio tandem repeat
finding. The program provides a unified set of analyses
for the selection, classification and quantification
of tandemly repeated sequences. The E. tenella genome
assembly file (version of May 24, 2005) was downloaded
from the Sanger’s FTP site and processed by
TRF version 3.21. TRF output files were analyzed by
TRAP, selecting repeat loci with at least two repeat
units, a minimum repeat period of 2 bp and a maximum
period of 1,000 bp. The repetitive content of the
genome was calculated using different identity percentages
(id%), where id% values represent percentages of matches
between adjacent repeat units overall. The whole genome
satellite content varied from 1.9% at 100% identity
to 16.8% at 70% identity. From this latter result,
9.0% corresponded to microsatellite (repeat period
size of 2-6 bp), 7.5% to minisatellite (period size
of 7-100 bp) and 0.3% to satellite (period size longer
than 100 bp) sequences. The five most prevalent repeat
units were GCA, TTTAGGG, TAAA, GCTA and AAATT, with
the two former units corresponding to 8.8% of the
genome content. A complete catalogue of the repeat
units and statistics will be publicly available on
the internet upon publication of the corresponding
paper. In the meantime, the authors can provide a
username and password under request.