[Biopython] Retrieving fasta seqs

Tue Feb 2 15:54:35 UTC 2010

Kevin and Peter;

> On Tue, Feb 2, 2010 at 3:30 PM, Kevin Lam <aboulia at gmail.com> wrote:
> > Traceback (most recent call last):
> >  File "test.py", line 22, in ?
> >    ids.add(recordf3)
> > # Then add each line to .ids.
> > MemoryError
> 
> OK, so it fails way before you do anything with Biopython - the
> problem is simply building a very large set of strings in memory.
> You could try using a list instead of a set (trivial code change),
> which I would expect to use less memory but run slower.

This is a nice discussion on stack overflow of the lookup/run time
versus memory trade off of lists versus sets/dictionaries:

http://stackoverflow.com/questions/513882/python-list-vs-dict-for-look-up-table

My guess is building the hash table for the string IDs gets memory
expensive.

Brad