[Biopython] Having a hard time getting a handle on handles
Dilara Ally
dilara.ally at gmail.com
Fri Jul 1 01:21:47 UTC 2011
Hi All
I have ~700,000 contigs that I would like to blast search and then from
the blast record parse out particular pieces of information from the
BLAST report. I can get my code to pull in files and then loop over
seq_records, blast, and then write a BLAST report. But since I don't
want to have 700,000 BLAST reports, I'd like to parse particular pieces
of information from the report and store it in a table. This is the
error I get from the code I have pasted below:
/Users/dally/Desktop/NextGenData/Python_Scripts/batchedfastafiles/group_1.fasta
1
0
GTCTTCGGCGTTGCACCGGCGATGAAGAACCAGTACGAGGCGTCTGGCGAGAGTAACAACGCTG
Traceback (most recent call last):
File "<stdin>", line 13, in <module>
NameError: name 'NCBIXML' is not defined
Do i have to close the result_handle and then reopen it? If so why?
Thanks in advance for your help.
>
> from Bio import SeqIO
> from Bio.Blast import NCBIWWW
> import time
> import os
> import os.path
>
> dirname1="/Users/dally/Desktop/NextGenData/Python_Scripts/batchedfastafiles/"
> allfiles=os.listdir(dirname1)
> fanddir=[os.path.join(dirname1,fname) for fname in allfiles]
> i = 0
> for f in fanddir:
> print f
> handle = open(f, "rU")
> contigs =list(SeqIO.parse(handle,"fasta"))
> handle.close()
> start=time.time()
> for seq_record in contigs:
> i=i+1
> print seq_record.id
> print seq_record.seq
> result_handle=NCBIWWW.qblast("blastn", "nr",
> seq_record.format("fasta"),hitlist_size=10)
> blast_record=list(NCBIXML.read(result_handle)) <== HERE IS THE
> PROBLEM
> E_VALUE_THRESH = 0.000004
> countr=0
> for alignment in blast_record.alignments:
> countr=countr+1
> for hsp in alignment.hsps:
> if hsp.expect < E_VALUE_THRESH:
> print '****Alignment****'
> print 'sequence:', alignment.title
> print 'length:', alignment.length
> print 'e value:', hsp.expect
> print hsp.query[0:75] + '...'
> print hsp.match[0:75] + '...'
> print hsp.sbjct[0:75] + '...'
More information about the Biopython
mailing list