Electronic Supplementary Material accompanying the paper entitled "EGene: a configurable pipeline generation system for automated sequence analysis" by Durham, A.M., Kashiwabara, A.I., Matsunaga, F.T.G., Ahagon, P.H., Rainone, F., Varuzza, L. and Gruber, A. Bioinformatics 21(12): 2812-2813, 2005. |
Table 1. Components developed for EGene pipeline generation system. By convention, component names have two parts: the prefix, stating the component function, and the suffix, stating the third-party software used, when appropriate.
Component |
Third party software required
|
Function |
assemble_cap3.pl |
CAP3
|
Creates a directory structure, runs CAP3 and analyzes redundancy |
assemble_phrap.pl |
Phrap
|
Creates a directory structure, runs Phrap and analyzes redundancy |
bigou.pl |
-
|
Initializes and runs the pipeline |
filter_blast.pl |
BLAST
|
Marks as invalid the sequences with a significant alignment block obtained with BLAST against a specified database |
filter_cross_match.pl |
Cross_match
|
Marks as invalid the sequences with a significant alignment block obtained with Cross_match |
filter_quality.pl |
-
|
Marks as invalid the sequences not attaining a set of minimum quality criteria |
filter_size.pl |
-
|
Marks as invalid the sequences below a threshold size |
mask_cross_match.pl |
Cross_match
|
Masks sequence blocks with significant Cross_match alignments against a database |
outsave.pl |
-
|
Produces a snapshot of all sequences in multi-PHD or XML files |
report_bases.pl |
-
|
Creates a HTML file reporting on masked, trimmed and good bases for each sequence and the respective averages for a pipeline run |
report_filtering.pl |
-
|
Produces a HTML report of the filtering performed on the sequences |
report_graphic_complete.pl |
-
|
Generates a detailed graphic report of the quality assignment, vector/primer masking and trimming in multiple HTML files |
report_graphic_simple.pl |
-
|
Generates a graphic report of the quality assignment, vector/primer masking and trimming in a single HTML file |
snoop_filtered.pl |
-
|
Produces a FASTA, XML or PHD file with either the valid sequences or the sequences invalidated by specified filters. |
trimming.pl |
-
|
Trims sequences based on quality and masking |
upload_fasta.pl |
-
|
Uploads sequences from a multi-FASTA file |
upload_fasta_STDIN.pl |
-
|
Uploads multiple FASTA formatted sequences from standard inputa |
upload_phd_dir.pl |
-
|
Uploads PHD files contained in a directory |
upload_phd.pl |
-
|
Uploads multiple PHDs from a single concatenated file |
upload_traces_phred.pl |
Phred
|
Uploads trace files using Phred for base-calling and quality assignment |
upload_seq_names_db.pl |
PostgreSQL
|
Uploads sequences from the database based on their names. Names can be Perl regular expressions. Only runs on database mode. |
upload_sql.pl |
PostgreSQL
|
Uploads sequences from the database based on an SQL query. Query should return sequence identifiers. Only runs on database mode. |
upload_traces_phred.pl |
Phred
|
Uploads trace files using Phred for base-calling and quality assignment |
upload_xml.pl |
-
|
Uploads sequences from an XML file |
aThis program can be used to feed FASTA output from other UNIX software into a pipeline (e.g. using UNIX pipes).