[BioPython] How can I get a more explicite error

Brad Chapman chapmanb at 50mail.com
Wed Mar 18 17:20:07 EDT 2009


Hi Yvan;

> I try to get a grip on Biopython and followed the chapter 6 form the  
> tutorial (http://www.biopython.org/DIST/docs/tutorial/Tutorial.html)
> 
> I run this script:
[...]
> blast_results = result_handle.read()
[...]
> [yvans at lundalm BEE]$ python bioblast.py s_1_2_eland_extended.8000000.fta
> Traceback (most recent call last):
>    File "bioblast.py", line 16, in <module>
>      blast_results = result_handle.read()
> SystemError: Objects/stringobject.c:4271: bad argument to internal function
> 
> if the number of sequence blasted agianst the db is greater than 500000.
> The sequence are small reads from a solexa sequencing project.

The result_handle.read() line is pulling the entire large BLAST result
file into memory as a string. You will run out of memory with huge files,
leading to the errors you are seeing.

To limit the problem, run BLAST initially at the command line,
and then process the resulting XML file with the BLAST parser
as described here:

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc56

This iterates over 1 record at a time, avoiding the memory issue.

However, you should be using a short read aligner to map these reads
to the genome. BLAST is not the right tool for this particular
application; massive BLAST report files are going to be one of many
problems you will run into analyzing the data. Here are a couple of
popular aligners designed for the exact problem you are tackling:

Bowtie: http://bowtie-bio.sourceforge.net/index.shtml
Maq: http://maq.sourceforge.net/

Hope this helps,
Brad


More information about the BioPython mailing list