Format and blast a fasta-file and import results into Blast2GO

Tips to format a database so that the results can than later be imported to blast2go without problems. The important thing here is to adapt the fasta sequence descriptions in a way that they can be parsed without problems. additionlly you will have to run “formatdb” with the ”-o T” and ·-a F” option.

1. Download a fasta file of your choice and unzip it.

2. Have a look at the IDs and make sure that the IDs you want to use for the GO-Mapping are at the second possiton like this ”>ref|myid|seq definition”.

3. For example if you want to blast against the GO-lite database you could the following command:

RefSeq IDs

cat go_20091108-seqdb.fasta | sed s/^.*RefSeq:/\>ref\|/ | sed s/[[:blank:]].*$//g > go_20091108-seqdb_refseq.fasta

Uniprot IDs

cat go_20091108-seqdb.fasta | sed s/^.*Uniprot:/\>ref\|/ | sed s/[[:blank:]].*$//g > go_20091108-seqdb_refseq.fasta

4. Than you will have to format you fasta file to be used by the blast programm. Here you should draw special attention to the -o and -a flag.

./formatdb -p T -o T -a F -i go_20091108-seqdb_refseq.fasta -n go_refseqids

5. Now you can BLAST against this database. These parameters would simulate b2G default parameters. We use this command on a dual core mashine (-a 2)

./blastall -a 2 -b 20 -v 20 -p blastx -e 0.001 -m 7 -d go_refseqids -i data_example.fasta -o results.xml

6. Finally import your xml-file (results.xml) into Blast2GO and visualize serveral Blast results to see if the accession appear in the right place. 7. Now you can proceed to the mapping step as usual.

tut.txt · Last modified: 2010/01/07 13:45 by sgoetz
Bioinformatics and Genomics Department
Centro de Investigación Príncipe Felipe
Valencia, SPAIN
Terms of Use