[Biopython] multiple sequence blast

Dilara Ally dilara.ally at gmail.com
Wed Jun 29 22:55:35 UTC 2011


Hi All

I'm new to biopython and python.  I have 1000 files each with 100 
contigs and I'm interested in blasting each one of those contigs.  I can 
get a single file with multiple sequences to blast each file and then 
write the output.  But the problem comes with reading the file from a 
loop in the first place.  Thanks in advance for the help.  If I don't 
use the loop but instead assign fname=allfiles[1] then it will work.  
Does it have something to do with lists vs seq records??
Cheers, Dilara

Here is the code:

from Bio import SeqIO
from Bio.Blast import NCBIWWW
import time
import os

allfiles=os.listdir("/Users/dally/Desktop/NextGenData/Python_Scripts/pract_input/")
for fname in allfiles:
     print fname
     handle = open(fname, "rU") <==it doesn't recognize the file just 
the name?
     contigs =list(SeqIO.parse(handle,"fasta"))
     handle.close()
     i = 0
     start=time.time()
     for seq_record in contigs:
         print seq_record.id
         print seq_record.seq
         result_handle=NCBIWWW.qblast("blastn", "nr", 
seq_record.format("fasta"),hitlist_size=10)
         filename = "contig_%i.xml" % (i+1)
         print filename
         save_file = open(filename, "w")
         save_file.write(result_handle.read())
         save_file.close()
         result_handle.close()
         end=time.clock()
         elapsed=end-start
         min=elapsed/60 #CONVERT TO MINUTE
         print "Your stuff took", elapsed, "seconds to run, which is the 
same as ",min, "minutes"



More information about the Biopython mailing list