High throughput functional annotation and data mining with the Blast2GO suite
last update 22.08.2008
A review of the currently available resources for automatic gene ontology annotation was performed and a map of existing tools was elaborated, indicating features, strengths and limitations. The tools to be included in this survey had to address at least GO term assignment to new sequence data and be freely available. A detailed listing of the 10 features (input data type, short description, annotation types, evidence codes, manual GO curation, GO tree visualization, GO graph visualization, availability, publications, high-throughput functionality) reviewed for 13 tools (AutoFact , Blast2GO , GOanna/AgBase , GOAnno , Goblet , GoFigure with GoDel , GoPET , Gotcha , HT-GO-FAT, InterProScan , JAFA , OntoBlast , PFP ) is given in the Table above.
Majority of the tools can be accessed via the web and accept input data as FASTA formatted protein or DNA sequences. Most tools limit themselves to GO term assignment and take only one sequence at a time. A few tools provide DAG (GOFigure, Blast2GO) or tree visualization (Gotcha, Goblet, HT-GO-FAT) of GO terms and others offer annotation with alternative vocabularies such as enzyme codes and KEGG or protein domains (e.g. InterPro). What stood out while analyzing the available tools was that nearly all lack high-throughput capabilities and offer few functions to summarize or manipulate the generated information.
Goblet, a well known tool available via the web, assigns GO term annotations providing probability scores but GO evidence codes are not taken into account. Goblet allows user-friendly results visualization by a java applet which permits browsing of the GO tree structure. It only provides limited high throughput functionality (~150 sequences at a time) although improved throughput has been announced as a public web-service for the near future.
AutoFACT proposes an interesting annotation strategy but its use appears discouraging as it requires elaborate installation tasks (Perl-scripts, database installations) and has no direct graphical interface. GOPet is another interesting approach to predict GO term annotations based on homology searches combined with Support Vector Machines (SVM) for the prediction and the assignment of confidence values. The web interface only allows the analysis of a single sequence at a time.
HT-GO-FAT, available after personal request, claims to provide good high throughput functionality of various annotation types but lacks in result processing and is platform dependent (.NET). HT-GO-FAT retrieves annotations based on simple sequence similarities performed against a custom GO annotated sequence database without any further term selection procedure or scoring for the retrieved terms.
Other tools available via web-interface (e.g. GOAnno, GOtcha, GoFigure) only allow the processing of one sequence at a time and are therefore not directly suitable for large scale sequence analysis.
All together we found that some valuable conceptual work has been done to assign GO terms to novel sequence data but there are few solutions offered to address current needs of high-throughput, fast, versatile and reliable automatic GO function predictions. Blast2GO, to our knowledge, is the only tool which combines these features within one integrated, biologist-oriented and easy to start up application. Blast2GO permits the researcher to quickly transfer, combine and modulate functional information from various sources, gain insights about the dataset functional composition and directly apply the generated annotation to statistical assessment of the functional meaning of experimental data.
|Listing of tools|
|AutoFact||An Automatic Functional Annotation and Classification Tool: sequences are classified into 6 different annotation categories, blast-based. (http://www.bch.umontreal.ca/Software/AutoFACT.htm)||Koski et al., 2005|
|Blast2GO||A universal Gene Ontology annotation, visualization and analysis tool for functional genomics research: Combination of similarity search based (blast), domain based (interproscan) and datamining based (annex) annotation. (http://www.blast2go.org)||Conesa et al., 2005|
|GOanna/ AgBase||GOanna is used to find annotations for proteins using similarity search. The resulting file contains GO annotations of the top BLAST hits. Ssequence alignments are provided so the user can use these to access the quality of the match. (http://agbase.msstate.edu/GOAnna.html)||Mccarthy et al., 2006, 2007|
|GOAnno||GO annotation based on multiple alignment: Evolutionary information in multiple alignments organized herarchically into functional subfamilies. (http://bips.u-strasbg.fr/GOAnno)||Chalmel et al., 2005|
|GOblet||A platform for Gene Ontology annotation of anonymous sequence data: Returns the GO terms of all blast hits providing probability scoring. (http://goblet.molgen.mpg.de)||Hennig et al., 2003 and Groth et al., 2004|
|GoFigure + GoDel(discontinued)||BLAST to predict Gene Ontology annotation: GoFigure mappes GO terms to blast hits by minimum covering graph construction and GODel filters terms by eValue and evicdence codes (http://udgenome.ags.udel.edu/gofigure)||Khan et al., 2003|
|GoPET||GoPet is a complete automated tool for assigning molecular function or biological process terms to cDNA or protein sequences utilising Gene Ontology for annotation terms, GO-mapped protein databases for performing homology searches, and Support Vector Machines for the prediction and the assignment of confidence values. (http://genome.dkfz-heidelberg.de/menu/biounit/open-husar/)||Vinayagam et al., 2006|
|GOtcha||Based on sequence similarity (BLAST) searches associates GO terms with sequence data. All terms are ranked by a probability scores and are displayed graphically on a subtree of Gene Ontology. (http://www.compbio.dundee.ac.uk/gotcha/gotcha.php/)||Martin et al., 2004|
|HT-GO-FAT (discontinued)||High Throughput Gene Ontology Functional Annotation Toolkit (Ht-Go-Fat) utilized for animal and plant sequence based on sequence similaritys against a custom BLAST DB. (http://genome4.ars.usda.gov/farm/dload.php)||na|
|InterProScan||Domain searches agains: BlastProDom, FPrintScan, HMMPIR, HMMPfam, HMMSmart, HMMTigr, ProfileScan, ScanRegExp, SuperFamily, SignalPHMM, TMHMM, HMMPanther, Gene3D (mapping InterPro-IDs to GO provided) (http://www.ebi.ac.uk/InterProScan/)||Zdobnov et al., 2001, Quevillon et al., 2005, Mulder et al., 2007|
|JAFA||Joined Assembly of Function Annotations (InterProScan, GOtcha, GoFigure, Goblet, Phydbac) (http://jafa.burnham.org)||Friedberg et al., 2006|
|OntoBlast||OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms. (http://functionalgenomics.de/ontogate)||Zehetner, 2003|
|PFP||PSI BLAST coupled with curated associateion matrices to increase sensitivity and specificity of predictions. (http://dragon.bio.purdue.edu/pfp)||Hawkins et al., 2006|
|Bioinformatics and Genomics Department
Centro de Investigación Príncipe Felipe