[Biopython] blast to go annotation

Peter Cock p.j.a.cock at googlemail.com
Thu Dec 6 11:17:54 UTC 2012


On Thu, Dec 6, 2012 at 11:09 AM, Fernando <fpiston at gmail.com> wrote:
> Hello everybody,
> I am a beginner in python programming and I do not know if did well.
> I had wrote a script to do the following task:
> - BLAST my sequences against the uniprot_sprot (UniProtKB/Swiss-Prot)
> - Take the best match swiss-prot accession
> - Take the GOs associated to the swiss-prot accession
> - Make a file with the my sequence id, best match swiss-prot accession,
> GOs associated.I am doing this file to use with topGO in bioconductor.
>
> I have some question:
> - The 'NCBIXML.parse' step has a problem. The function does not take the
> firth accession of the .xml file. I need to insert a fake fasta sequence
> at the beginning of the multifasta file to have all blast result of my
> sequences.

Do you mean it is ignoring the first (1st) set of results in the XML file?
That is because you skipped the first BLAST results - try removing this
line before your for loop:

blast_record = blast_records.next()

> - En general. It is correct the script? and, can I improve it?
>

It would be worth reading the Blast2GO paper for some of the technical
issues and how to weight evidence in assigning GO terms based on
BLAST matches. Note Blast2GO has a command line variant called
"Blast2GO for pipelines" (b2g4pipe).

Peter



More information about the Biopython mailing list