[Biopython] Having a hard time getting a handle on handles

Dilara Ally dilara.ally at gmail.com
Fri Jul 1 01:21:47 UTC 2011


Hi All

I have ~700,000 contigs that I would like to blast search and then from 
the blast record parse out particular pieces of information from the 
BLAST report.  I can get my code to pull in files and then loop over 
seq_records, blast, and then write a BLAST report.  But since I don't 
want to have 700,000 BLAST reports, I'd like to parse particular pieces 
of information from the report and store it in a table.  This is the 
error I get from the code I have pasted below:

/Users/dally/Desktop/NextGenData/Python_Scripts/batchedfastafiles/group_1.fasta
1
0
GTCTTCGGCGTTGCACCGGCGATGAAGAACCAGTACGAGGCGTCTGGCGAGAGTAACAACGCTG
Traceback (most recent call last):
   File "<stdin>", line 13, in <module>
NameError: name 'NCBIXML' is not defined

Do i have to close the result_handle and then reopen it?  If so why?  
Thanks in advance for your help.
>
> from Bio import SeqIO
> from Bio.Blast import NCBIWWW
> import time
> import os
> import os.path
>
> dirname1="/Users/dally/Desktop/NextGenData/Python_Scripts/batchedfastafiles/"
> allfiles=os.listdir(dirname1)
> fanddir=[os.path.join(dirname1,fname) for fname in allfiles]
> i = 0
> for f in fanddir:
>     print f
>     handle = open(f, "rU")
>     contigs =list(SeqIO.parse(handle,"fasta"))
>     handle.close()
>     start=time.time()
>     for seq_record in contigs:
>         i=i+1
>         print seq_record.id
>         print seq_record.seq
>         result_handle=NCBIWWW.qblast("blastn", "nr", 
> seq_record.format("fasta"),hitlist_size=10)
>         blast_record=list(NCBIXML.read(result_handle)) <== HERE IS THE 
> PROBLEM
>         E_VALUE_THRESH = 0.000004
>         countr=0
>         for alignment in blast_record.alignments:
>             countr=countr+1
>             for hsp in alignment.hsps:
>                 if hsp.expect < E_VALUE_THRESH:
>                     print '****Alignment****'
>                     print 'sequence:', alignment.title
>                     print 'length:', alignment.length
>                     print 'e value:', hsp.expect
>                     print hsp.query[0:75] + '...'
>                     print hsp.match[0:75] + '...'
>                     print hsp.sbjct[0:75] + '...'




More information about the Biopython mailing list