[Biopython] SeqIO.index for csfasta files memory issues
Kevin Lam
aboulia at gmail.com
Tue Jan 19 03:31:43 EST 2010
What are the memory limitations for SeqIO.index?
I am trying to create an index for a 4.5 gb csfasta file
~ 60 million reads
but the script crashes at 5 Gb ram usage
the machine has 31 Gb ram.
#!/usr/bin/python
from Bio import SeqIO
data = SeqIO.index("Sample3.csfasta", "fasta")
print data.keys()[:3]
print data["853_15_296_F3"].seq
Resource usage summary:
CPU time : 381.24 sec.
Max Memory : 5103 MB
Max Swap : 5347 MB
Max Processes : 4
Max Threads : 5
Traceback (most recent call last):
File "./extractfasta.py", line 7, in ?
data = SeqIO.index("Sample3.csfasta", "fasta")
File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/__init__.py",
line 703, in index
return indexer(filename, alphabet, key_function)
File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/_index.py",
line 209, in __init__
"fasta", ">")
File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/_index.py",
line 203, in __init__
self._record_key(line[marker_offset:].strip().split(None,1)[0], offset)
File "/home//biopython-1.53/build/lib.linux-x86_64-2.4/Bio/SeqIO/_index.py",
line 86, in _record_key
dict.__setitem__(self, key, seek_position)
MemoryError
More information about the Biopython
mailing list