[Biopython] blast to go annotation

Peter Cock p.j.a.cock at googlemail.com
Thu Dec 6 11:17:54 UTC 2012

On Thu, Dec 6, 2012 at 11:09 AM, Fernando <fpiston at gmail.com> wrote:
> Hello everybody,
> I am a beginner in python programming and I do not know if did well.
> I had wrote a script to do the following task:
> - BLAST my sequences against the uniprot_sprot (UniProtKB/Swiss-Prot)
> - Take the best match swiss-prot accession
> - Take the GOs associated to the swiss-prot accession
> - Make a file with the my sequence id, best match swiss-prot accession,
> GOs associated.I am doing this file to use with topGO in bioconductor.
> I have some question:
> - The 'NCBIXML.parse' step has a problem. The function does not take the
> firth accession of the .xml file. I need to insert a fake fasta sequence
> at the beginning of the multifasta file to have all blast result of my
> sequences.

Do you mean it is ignoring the first (1st) set of results in the XML file?
That is because you skipped the first BLAST results - try removing this
line before your for loop:

blast_record = blast_records.next()

> - En general. It is correct the script? and, can I improve it?

It would be worth reading the Blast2GO paper for some of the technical
issues and how to weight evidence in assigning GO terms based on
BLAST matches. Note Blast2GO has a command line variant called
"Blast2GO for pipelines" (b2g4pipe).


More information about the Biopython mailing list