SAGE Suite

A complete suite for SAGE data manipulation
SAGE Suite is a free suite of Perl scripts for SAGE data analysis and in-silico data creation.

Institutions

USP - Universidade de São Paulo - University of São Paulo (in Portuguese).
 

ICB - USP Instituto de Ciências Biomédicas - Universidade de São Paulo (Institute of Biomedical Sciences - University of São Paulo)

IME - USP Instituto de Matemática e Estatística Universidade de São Paulo (Institute of Mathematics and Statistics - University of São Paulo).

The Project:

image

SAGE Suite consists in a suite of Perl scripts developed for the manipulation of SAGE data. The suite is composed by two main modules: GenSuite and SAGE Analysis. SAGE Analysis comprises a set of three scripts for complete SAGE tag extraction and counting. SAGE GenSuite is a module for in silico generation of SAGE data. It models the different steps involved on SAGE library synthesis and creates different populations of tags, following user-defined options.

Both modules have a graphic interface using Perl/Tk module. With the graphic interface the results of the extraction and creation in the middle of the execution and modify as it is done

SAGE Suite project is developed by joint research groups at the Institute of Biomedical Sciences and the Institute of Mathematics and Statistics. The main goal of the project is to provide the scientific community with a tool for generating virtual SAGE libraries for modeling and validation purposes, and a set of programs for the reliable tag extraction and counting. A third module for statistics analysis is under development

The SAGE Analysis

This is a module for SAGE data extraction and counting. It can be used for both conventional SAGE and for LongSAGE data. The package offers a set of parameters that can be adjusted according to the user's requirements.
The program is composed by two distinct and independent components. An integration programs allows for a complete analysis of the SAGE data, including the generation of a full report and analysis. A brief description of the programs follows below:

  • extract_ditags.pl - Uses as an input a multiple sequence FASTA format file containing the concatamers. The program extracts the ditags in a size range chosen by the user, according to the SAGE protocol employed (conventional or LongSAGE).
  • extract_tags.pl - Using the ditag file provided by extract_ditags.pl, the program extracts the tags according to a set of user-defined parameters.
  • sage_analysis.pl - Runs and integrates boths scripts decribed above and generates a set of reports listing all processing steps and tag counts.


The SAGE GenSuite

This module consists of four independent scripts that models the SAGE library synthesis.
Each program is fully compatible with the next one, permitting a full data interchange and task integration. The scripts are described below:

  • tag_count_generator.pl - Generates a list of random tags with easily parseable counts
  • trim_tags.pl - Prunes the tags, cutting the ends randomly or, alternatively, making all tags to have the same size
  • counts2ditags.pl - Given a tag count, in a [tag=count] format, generates a file containing a list of ditags, with different tag-to-tag combinations.
  • ditags2concatamers.pl - Using a file containing a list of ditags, generates a multiple sequence FASTA format file composed by a set of concatamer sequences.