[Biopython] Retrieving fasta seqs

Tue Feb 2 14:19:43 UTC 2010

On Tue, Feb 2, 2010 at 2:13 PM, Kevin <aboulia at gmail.com> wrote:
>
> My version uses set to store the Ids. It fails with too many records ( 60
> million ) on 31 gb ram 64 bit centos python 2.4  can't figure why. But works
> well with 1 million ids.

Using sets rather than a list should be faster.

How does it fail on your large dataset - a memory error?

> Can I propose this be part of the tutorial? It seems quite a popular
> request.  I was going to post on my blog but think more people will benefit
> if it's on the wiki
> I don't mind contributing the code and lessons
>
> Kevin

I was also thinking we should turn this into an example, either as a
wiki cookbook or just as an example in the tutorial.

Peter