Tips to format a database so that the results can than later be imported to blast2go without problems. The important thing here is to adapt the fasta sequence descriptions in a way that they can be parsed without problems. additionlly you will have to run “formatdb” with the ”-o T” and ·-a F” option.
1. Download a fasta file of your choice and unzip it.
2. Have a look at the IDs and make sure that the IDs you want to use for the GO-Mapping are at the second possiton like this ”>ref|myid|seq definition”.
3. For example if you want to blast against the GO-lite database you could the following command:
cat go_20091108-seqdb.fasta | sed s/^.*RefSeq:/\>ref\|/ | sed s/[[:blank:]].*$//g > go_20091108-seqdb_refseq.fasta
cat go_20091108-seqdb.fasta | sed s/^.*Uniprot:/\>ref\|/ | sed s/[[:blank:]].*$//g > go_20091108-seqdb_refseq.fasta
4. Than you will have to format you fasta file to be used by the blast programm. Here you should draw special attention to the -o and -a flag.
./formatdb -p T -o T -a F -i go_20091108-seqdb_refseq.fasta -n go_refseqids
5. Now you can BLAST against this database. These parameters would simulate b2G default parameters. We use this command on a dual core mashine (-a 2)
./blastall -a 2 -b 20 -v 20 -p blastx -e 0.001 -m 7 -d go_refseqids -i data_example.fasta -o results.xml
6. Finally import your xml-file (results.xml) into Blast2GO and visualize serveral Blast results to see if the accession appear in the right place. 7. Now you can proceed to the mapping step as usual.
|Bioinformatics and Genomics Department
Centro de Investigación Príncipe Felipe