[Biopython] SeqIO.index for csfasta files memory issues

Kevin Lam aboulia at gmail.com
Tue Jan 19 08:31:43 UTC 2010


What are the memory limitations for SeqIO.index?
I am trying to create an index for a 4.5 gb csfasta file
~ 60 million reads
but the script crashes at 5 Gb ram usage
the machine has 31 Gb ram.


#!/usr/bin/python
from Bio import SeqIO

data = SeqIO.index("Sample3.csfasta", "fasta")
print data.keys()[:3]
print data["853_15_296_F3"].seq



Resource usage summary:

    CPU time   :    381.24 sec.
    Max Memory :      5103 MB
    Max Swap   :      5347 MB

    Max Processes  :         4
    Max Threads    :         5

Traceback (most recent call last):
  File "./extractfasta.py", line 7, in ?
    data = SeqIO.index("Sample3.csfasta", "fasta")
  File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/__init__.py",
line 703, in index
    return indexer(filename, alphabet, key_function)
  File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/_index.py",
line 209, in __init__
    "fasta", ">")
  File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/_index.py",
line 203, in __init__
    self._record_key(line[marker_offset:].strip().split(None,1)[0], offset)
  File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/_index.py",
line 86, in _record_key
    dict.__setitem__(self, key, seek_position)
MemoryError



More information about the Biopython mailing list