[Biopython] multiple sequence blast
Dilara Ally
dilara.ally at gmail.com
Wed Jun 29 22:55:35 UTC 2011
Hi All
I'm new to biopython and python. I have 1000 files each with 100
contigs and I'm interested in blasting each one of those contigs. I can
get a single file with multiple sequences to blast each file and then
write the output. But the problem comes with reading the file from a
loop in the first place. Thanks in advance for the help. If I don't
use the loop but instead assign fname=allfiles[1] then it will work.
Does it have something to do with lists vs seq records??
Cheers, Dilara
Here is the code:
from Bio import SeqIO
from Bio.Blast import NCBIWWW
import time
import os
allfiles=os.listdir("/Users/dally/Desktop/NextGenData/Python_Scripts/pract_input/")
for fname in allfiles:
print fname
handle = open(fname, "rU") <==it doesn't recognize the file just
the name?
contigs =list(SeqIO.parse(handle,"fasta"))
handle.close()
i = 0
start=time.time()
for seq_record in contigs:
print seq_record.id
print seq_record.seq
result_handle=NCBIWWW.qblast("blastn", "nr",
seq_record.format("fasta"),hitlist_size=10)
filename = "contig_%i.xml" % (i+1)
print filename
save_file = open(filename, "w")
save_file.write(result_handle.read())
save_file.close()
result_handle.close()
end=time.clock()
elapsed=end-start
min=elapsed/60 #CONVERT TO MINUTE
print "Your stuff took", elapsed, "seconds to run, which is the
same as ",min, "minutes"
More information about the Biopython
mailing list