:: The IXth International Coccidiosis Conference ::

Contributed Papers: Oral Presentations
Molecular biology and Biochemistry

THE SATELLITE DNA OF Eimeria TENELLA: A QUANTITATIVE AND QUALITATIVE STUDY

T.J.P. Sobreira1,*, A. Gruber1,**, A.M. Durham2 & The
Eimeria tenella Genome Consortium3
1Faculty of Veterinary Medicine and Zootechny, 2Institute of
Mathematics and Statistics, University of São Paulo, Brazil;
A full list of authors is available at the web address http://www.lbm.fmvz.usp.br/Eimeria/consortium/members.html
*CNPq fellow; **argruber@usp.br

Eimeria tenella genome has a complexity of 58 Mb, distributed in 14 chromosomes ranging from 1 to more than 7 Mb. One of the most interesting features of this genome is the very high tandem repeat content, with the triplet (GCA) and heptamer (TTTAGGG) repeats constituting the most predominant repetitive units. The genome sequencing started in 2002 and a set of circa 800,000 shotgun reads was generated at the Wellcome Trust Sanger Institute, and made available on the internet at the address ftp://ftp.sanger.ac.uk/pub/pathogens/Eimeria/tenella. Aiming at characterizing and quantitating the whole satellite content of E. tenella genome, our group in Brazil developed TRAP, the Tandem Repeat Analysis Program (Sobreira, T.J.P.; Durham, A.M. & Gruber, A. – manuscript in preparation). TRAP is a companion tool for Tandem Repeats Finder (Benson, 1999), a popular worldwide used application for ab initio tandem repeat finding. The program provides a unified set of analyses for the selection, classification and quantification of tandemly repeated sequences. The E. tenella genome assembly file (version of May 24, 2005) was downloaded from the Sanger’s FTP site and processed by TRF version 3.21. TRF output files were analyzed by TRAP, selecting repeat loci with at least two repeat units, a minimum repeat period of 2 bp and a maximum period of 1,000 bp. The repetitive content of the genome was calculated using different identity percentages (id%), where id% values represent percentages of matches between adjacent repeat units overall. The whole genome satellite content varied from 1.9% at 100% identity to 16.8% at 70% identity. From this latter result, 9.0% corresponded to microsatellite (repeat period size of 2-6 bp), 7.5% to minisatellite (period size of 7-100 bp) and 0.3% to satellite (period size longer than 100 bp) sequences. The five most prevalent repeat units were GCA, TTTAGGG, TAAA, GCTA and AAATT, with the two former units corresponding to 8.8% of the genome content. A complete catalogue of the repeat units and statistics will be publicly available on the internet upon publication of the corresponding paper. In the meantime, the authors can provide a username and password under request.