Evaluation of Blast2GO

Summary

Blast2GO (B2G) is a tool designed to enable Gene Ontology (GO) based data mining on sequence data for which no GO annotation is yet available. This is done by associating sequences to a putative function using sequence homology criterion and providing tools for statistical and visual analysis on this information. The aim of this evaluation is to identify optimal parameters for correct annotation as well as evaluating the overall performance of the methodology. The strategy we follow in our evaluation has been to use a sequence data set of a model organisms for which a true functional annotation is available. These sequences have been processed using Blast2GO against a gene bank nr database depleted by sequences of the test species. The comparison between the inferred and the original annotation has allowed us to evaluate the performance of the tool as a function of the annotation parameters. Our results show that Blast2GO has a good annotation accuracy, typical of automatic annotation method, and more important, that the tool is successful in extracting relevant functional features of these sequences based on use of this annotation.

Material and Methods

In this evaluation we used the sequences represented in the AMT microarray originally designed in Dr Amtmann’s laboratory.. This oligonucleotide microrray represents 1090 Arabidopsis transporter genes and has been used to study the transporter transcriptome in roots under different salt stress conditions (Maathuis et al.). GO annotatios are available for these sequences as well as a specific functional classification made by the authors (see supplementing material). This data set is ideal for the purpose of evaluating Blast2GO because it represents a typical scenario where Blast2Go is likely to be used for sequence annotation and as a function-based data mining tool.

Schema of the evaluation procedure

  1. Firstly, a filtered NCBI nr database has been generated from which all Arabidopsis sequences were removed (nr –ATH)
  2. The AMT set has been analysed by Blast2GO.
    1. Blast was done against the nr-AMT using WWWBlast and default Blast parameters of the application
    2. Sequences were mapped
    3. Annotation was done for different values of the annotation parameters following a factorial design of 3 factors with the levels:
      1. GO weight: 0,5 and 10
      2. Annotation cut-off: 0,30,35,40,45,50,55,60
      3. EC weights: B2G default, all to 1
  3. Annotation results were compared to the True GO annotation of these sequences obtained from the Tair site (www.tair.org). Each B2G annotated GO term were scored as (Note that identical+general+specific are annotations in the Same branch as the True annotation values):
    1. Indentical: if the B2G annotation is present among the True annotations of the sequence
    2. General: if the B2GO annotated GO term is a parent term of one of the True annotations of the sequence
    3. Specific: if the B2GO annotated GO term is a children term of one of the True annotations of the sequence
    4. Other branch: if no True annotation terms lay in the same DAG branch of the considered B2GO annotated GO term.
  4. Combined graphs were generated for the whole data set at each of the three main branches of the Gene Ontology, and the highlighted nodes were compared to the Functional annotation provided by the authors. These graphs were generated with a Seq Filter value of 50 (i.e. only nodes with more than 50 associated sequences are shown), to control de size of the graph.
  5. Finally, we took all the significant gene lists provided in the Westernhuis paper and computed, for each list, Fisher´s Exact Tests for evaluating functional category enrichment using the Functional Classification provided by the authors (see supplementing material of this publication and .txt).We selected a gene list for which there were significantly enriched categories at a multiple testing corrected p-value of 0.01, performed a B2G Enrichment Analysis for this list using the B2G annotation, and compared results

Results and discussion

Evaluation of the annotating function

The results of the evaluation of the annotation procedure and its parameters are given in Table 1 and are summarized in Figure 1 and 2. As expected, rising the Annotation cut-off resulted in a increase in the quality of the annotation but decreased the number of annotated genes. For the tested GO weights we observed an increase in positive annotations when the value was increased, indicating that abstraction can be an adequate way of valid GO annotation. Setting EC weights all to 1 (no EC weight) resulted in an increase in positive annotations. However it was noticed that EC weights=default resulted in a much less annotation coverage, suggesting that the lower performance of this option may be more the result of failing to annotate than of annotating at other branch. In general, good annotation results (up to 65% identical annotation and 70% annotation at the same branch) were obtained for some values of the annotation parameters, which is similar to the performance reported by other automatic annotation systems (e.g. Martin et al., 2004; Khan et al., 2003). In addition Blast2GO offers a graphical environment for functional annotation. Evaluation of annotation in other biological systems (Saccharomyces, Plasmodium…) shows similar behaviour of the annotation parameters, although absolute values may vary slightly.

Evaluation as Functional Genomics Data Mining Tool

From the results of the annotation evaluation we took suitable values of the annotation parameters for performing the functional evaluation (annot.cutoff=50, GOweight=10) Comparison of the results of the Combined Graph with the Functional Annotation provided by the authors (Funcional_analysis_AMT.xls, worksheet 1) showed how the B2G visualization tool is successful in showing the most relevant biological aspects of this data set. The terms Transport (BP) and Transport activity (MF) clearly appear as the heaviest colored ones in their graphs (Figs. 3 and 4). Others like cation transport, ion transport or multidrug transport for the Biological Process category (Fig.3), ATPase activity coupled to the transmembrane movement of substances, ATP binding or antiporter activity for the Molecular Function category (Fig.4), and integral to membrane or intracellular membrane-bound organelle for the Cellular Component category (Fig.5) are highlighted in the corresponding graphs. For the second aspect of the Functional Genomics evaluation the AMT_specific_Na&Ca&K data subset was used. This data set shows a significant enrichment of the category aquaporin when a Fisher´s Exact Test is performed using the Functional classification provided by the authors (Funcional_analysis_AMT.xls, worksheets 2 and 3). B2G Enrichment Analysis of this data set successfully detected a significant enrichment for the same functional category (Fig.6)

Conclusions

This example illustrates the validity of Blast2GO as a research tool in Functional Genomics studies. Its ideal application is for the functional analysis of non-annotated sequence data in non-model organisms. Annotation accuracy using default parameters reached 65-70% . These are typical values obtained by automatic annotation methods. In addition, Blast2GO offers a versatile and user friendly graphical environment for functional annotation combining functionality that has been so far available in different implementations. Our results show that Blast2GO is a valuable tool for gathering functional information of otherwise not characterized sequences. Our approach can be useful in guiding the interpretation of experimental results in genomics approaches such as gene expression studies, EST projects etc.

References

  • Khan,S., Situ,G., Decker,K. and Schmidt,C.J. (2003) GoFigure: Automated Gene OntologyTM annotation. Bioinformatics 19, 2484-2485.
  • Maathuis, Frans J. M., Filatov, Victor, Herzyk, Pawel, C. Krijger, Gerard, B. Axelsen, Kristian, Chen, Sixue, Green, Brian J., Li, Yi, Madagan, Kathryn L., Sánchez-Fernández, Rocío, Forde, Brian G., Palmgren, Michael G., Rea, Philip A., Williams, Lorraine E., Sanders, Dale & Amtmann, Anna (2003) Transcriptome analysis of root transporters reveals participation of multiple gene families in the response to cation stress. The Plant Journal 35 (6), 675-692.
  • Martin,D., Berriman,M. and Barton,G. (2004) GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5, 178

Tables and Figures

evaluation.txt · Last modified: 2010/06/07 11:45 by sgoetz
Bioinformatics and Genomics Department
Centro de Investigación Príncipe Felipe
Valencia, SPAIN
Terms of Use