[Biopython] blast to go annotation
Fernando
fpiston at gmail.com
Thu Dec 6 11:09:53 UTC 2012
Hello everybody,
I am a beginner in python programming and I do not know if did well.
I had wrote a script to do the following task:
- BLAST my sequences against the uniprot_sprot (UniProtKB/Swiss-Prot)
- Take the best match swiss-prot accession
- Take the GOs associated to the swiss-prot accession
- Make a file with the my sequence id, best match swiss-prot accession,
GOs associated.I am doing this file to use with topGO in bioconductor.
I have some question:
- The 'NCBIXML.parse' step has a problem. The function does not take the
firth accession of the .xml file. I need to insert a fake fasta sequence
at the beginning of the multifasta file to have all blast result of my
sequences.
- En general. It is correct the script? and, can I improve it?
Here is the code:
from Bio.Blast.Applications import NcbiblastxCommandline
blastx_cline = NcbiblastxCommandline(query='/home/fpiston/Desktop/test/test2.fasta', db='uniprot_sprot', out='/home/fpiston/Desktop/test/test.xml', evalue='0.001', outfmt='5', best_hit_overhang='0.1', best_hit_score_edge='0.05', max_target_seqs='1')
stdout, stderr = blastx_cline()
result_handle = open("/home/fpiston/Desktop/test/test.xml")
from Bio.Blast import NCBIXML
from Bio import SeqIO
import re
from Bio import SwissProt
q_dict = SeqIO.to_dict(SeqIO.parse(open("/home/fpiston/Desktop/test/test2.fasta"), "fasta"))
blast_records = NCBIXML.parse(result_handle)
save_file = open("/home/fpiston/Desktop/test/test.out", 'w')
blast_record = blast_records.next()
hits = []
for blast_record in blast_records:
if blast_record.alignments:
list = (blast_record.query).split()
if re.match('ENA|\w*|\w*', list[0]) != None:
list2 = list[0].split("|")
save_file.write('\n%s\t' % list2[1])
else:
save_file.write('\n%s\t' % list[0])
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
list = (alignment.hit_def).split()
list2 = list[0].split("|")
save_file.write('%s\t' % list2[2])
for record in SwissProt.parse(open('/home/db/uniprot_sprot.dat')):
if record.entry_name in list2[2]:
for cross_reference in record.cross_references:
for item in cross_reference:
if 'GO:' in item:
save_file.write('%s\t' % item)
hits.append(blast_record.query.split()[0])
misses = set(q_dict.keys()) - set(hits)
for item in misses:
save_file.write('\n%s\t' % item)
save_file.write('%s' % 'no_match')
save_file.close()
Fernando
--
More information about the Biopython
mailing list