Blast2GO (B2G) is a tool designed to enable Gene Ontology (GO) based data mining on sequence data for which no GO annotation is yet available. This is done by associating sequences to a putative function using sequence homology criterion and providing tools for statistical and visual analysis on this information. The aim of this evaluation is to identify optimal parameters for correct annotation as well as evaluating the overall performance of the methodology. The strategy we follow in our evaluation has been to use a sequence data set of a model organisms for which a true functional annotation is available. These sequences have been processed using Blast2GO against a gene bank nr database depleted by sequences of the test species. The comparison between the inferred and the original annotation has allowed us to evaluate the performance of the tool as a function of the annotation parameters. Our results show that Blast2GO has a good annotation accuracy, typical of automatic annotation method, and more important, that the tool is successful in extracting relevant functional features of these sequences based on use of this annotation.
In this evaluation we used the sequences represented in the AMT microarray originally designed in Dr Amtmann’s laboratory.. This oligonucleotide microrray represents 1090 Arabidopsis transporter genes and has been used to study the transporter transcriptome in roots under different salt stress conditions (Maathuis et al.). GO annotatios are available for these sequences as well as a specific functional classification made by the authors (see supplementing material). This data set is ideal for the purpose of evaluating Blast2GO because it represents a typical scenario where Blast2Go is likely to be used for sequence annotation and as a function-based data mining tool.
The results of the evaluation of the annotation procedure and its parameters are given in Table 1 and are summarized in Figure 1 and 2. As expected, rising the Annotation cut-off resulted in a increase in the quality of the annotation but decreased the number of annotated genes. For the tested GO weights we observed an increase in positive annotations when the value was increased, indicating that abstraction can be an adequate way of valid GO annotation. Setting EC weights all to 1 (no EC weight) resulted in an increase in positive annotations. However it was noticed that EC weights=default resulted in a much less annotation coverage, suggesting that the lower performance of this option may be more the result of failing to annotate than of annotating at other branch. In general, good annotation results (up to 65% identical annotation and 70% annotation at the same branch) were obtained for some values of the annotation parameters, which is similar to the performance reported by other automatic annotation systems (e.g. Martin et al., 2004; Khan et al., 2003). In addition Blast2GO offers a graphical environment for functional annotation. Evaluation of annotation in other biological systems (Saccharomyces, Plasmodium…) shows similar behaviour of the annotation parameters, although absolute values may vary slightly.
From the results of the annotation evaluation we took suitable values of the annotation parameters for performing the functional evaluation (annot.cutoff=50, GOweight=10) Comparison of the results of the Combined Graph with the Functional Annotation provided by the authors (Funcional_analysis_AMT.xls, worksheet 1) showed how the B2G visualization tool is successful in showing the most relevant biological aspects of this data set. The terms Transport (BP) and Transport activity (MF) clearly appear as the heaviest colored ones in their graphs (Figs. 3 and 4). Others like cation transport, ion transport or multidrug transport for the Biological Process category (Fig.3), ATPase activity coupled to the transmembrane movement of substances, ATP binding or antiporter activity for the Molecular Function category (Fig.4), and integral to membrane or intracellular membrane-bound organelle for the Cellular Component category (Fig.5) are highlighted in the corresponding graphs. For the second aspect of the Functional Genomics evaluation the AMT_specific_Na&Ca&K data subset was used. This data set shows a significant enrichment of the category aquaporin when a Fisher´s Exact Test is performed using the Functional classification provided by the authors (Funcional_analysis_AMT.xls, worksheets 2 and 3). B2G Enrichment Analysis of this data set successfully detected a significant enrichment for the same functional category (Fig.6)
This example illustrates the validity of Blast2GO as a research tool in Functional Genomics studies. Its ideal application is for the functional analysis of non-annotated sequence data in non-model organisms. Annotation accuracy using default parameters reached 65-70% . These are typical values obtained by automatic annotation methods. In addition, Blast2GO offers a versatile and user friendly graphical environment for functional annotation combining functionality that has been so far available in different implementations. Our results show that Blast2GO is a valuable tool for gathering functional information of otherwise not characterized sequences. Our approach can be useful in guiding the interpretation of experimental results in genomics approaches such as gene expression studies, EST projects etc.
|Bioinformatics and Genomics Department
Centro de Investigación Príncipe Felipe