[Biopython] blast to go annotation

Fernando fpiston at gmail.com
Thu Dec 6 11:09:53 UTC 2012


Hello everybody,
I am a beginner in python programming and I do not know if did well.
I had wrote a script to do the following task:
- BLAST my sequences against the uniprot_sprot (UniProtKB/Swiss-Prot)
- Take the best match swiss-prot accession
- Take the GOs associated to the swiss-prot accession
- Make a file with the my sequence id, best match swiss-prot accession,
GOs associated.I am doing this file to use with topGO in bioconductor.

I have some question:
- The 'NCBIXML.parse' step has a problem. The function does not take the
firth accession of the .xml file. I need to insert a fake fasta sequence
at the beginning of the multifasta file to have all blast result of my
sequences.
- En general. It is correct the script? and, can I improve it?


Here is the code:

from Bio.Blast.Applications import NcbiblastxCommandline
blastx_cline = NcbiblastxCommandline(query='/home/fpiston/Desktop/test/test2.fasta', db='uniprot_sprot', out='/home/fpiston/Desktop/test/test.xml', evalue='0.001', outfmt='5', best_hit_overhang='0.1', best_hit_score_edge='0.05', max_target_seqs='1')
stdout, stderr = blastx_cline()
result_handle = open("/home/fpiston/Desktop/test/test.xml")
from Bio.Blast import NCBIXML
from Bio import SeqIO
import re
from Bio import SwissProt 
q_dict =  SeqIO.to_dict(SeqIO.parse(open("/home/fpiston/Desktop/test/test2.fasta"), "fasta"))
blast_records = NCBIXML.parse(result_handle)
save_file = open("/home/fpiston/Desktop/test/test.out", 'w')
blast_record = blast_records.next()
hits = []

for blast_record in blast_records:
    if blast_record.alignments:
        list = (blast_record.query).split()
        if re.match('ENA|\w*|\w*', list[0]) != None:
            list2 = list[0].split("|")
            save_file.write('\n%s\t' % list2[1])
        else:
            save_file.write('\n%s\t' % list[0])
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                list = (alignment.hit_def).split()
                list2 = list[0].split("|")
                save_file.write('%s\t' % list2[2])
                for record in SwissProt.parse(open('/home/db/uniprot_sprot.dat')):
                    if record.entry_name in list2[2]:
                      for cross_reference in record.cross_references:
                          for item in cross_reference:
                              if 'GO:' in item:
                                  save_file.write('%s\t' % item)
        hits.append(blast_record.query.split()[0])
misses = set(q_dict.keys()) - set(hits)
for item in misses:
     save_file.write('\n%s\t' % item)
     save_file.write('%s' % 'no_match')

save_file.close() 



 Fernando
 -- 



More information about the Biopython mailing list