[BioPython] Parsing BLAST

Alex Garbino agarbino at gmail.com
Thu Aug 28 18:51:47 UTC 2008


Thanks for all the help!
I'm now almost done. My script is to take a fasta file, run blast, and
output a comma-separated-values list in the following format:
AccessionID, Source, Length, FASTA sequence.

I have one last issue: How do I get the fasta sequence out? I can
easily get the raw sequence, but I need it in fasta format. I left a
couple of things I've tried from tutorials commented out at the
bottom, in case it helps.
My csv output may also need help, depending on how the Fasta output
behaves in a csv...


from Bio import SeqIO
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
from Bio import Entrez
#Open file to blast
file = "protein.txt"  #In fasta format

#Blast, save copy
record = SeqIO.read(open(file), format="fasta")
result_handle = NCBIWWW.qblast("blastp", "nr", record.seq.tostring(),
hitlist_size=1) #Don't hit the servers hard until ready

blast_results = result_handle.read()
save_file = open(file[:-4]+".xml", "w")
save_file.write(blast_results)
save_file.close()

result_handle = open(file[:-4]+".xml")
#Load the blast record
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()
output = {}
for x in blast_record.alignments:
    output[x.accession] = [x.length]
for x in output:
    handle = Entrez.efetch(db="protein", id=x, rettype="genbank")
    record = SeqIO.parse(handle, "genbank")
    recurd = record.next()
    output[x].insert(0, recurd.id)
    output[x].insert(1, recurd.annotations["source"])

    #SeqIO.write(recurd, output[x].extend, "fasta")

    """
    handle2 = Entrez.efetch(db="protein", id=x, rettype="fasta")
    recurd2 = SeqIO.read(handle2, "fasta")
    output[x].extend = [recurd2.seq.tostring()]
    """
print output
save_file = open(file[:-4]+".csv", "w")
#Generate CSV
for item in output:
    save_file.write('%s,%s,%s\n' %
(output[item][0],output[item][1],output[item][2]))
    #save_file.write('%s,%s,%s\n' %
(output[item][0],output[item][1],output[item][2],output[item][3])
(When Fasta works)
save_file.close()

--------------------------
Thanks!
Alex



More information about the Biopython mailing list