[Biopython] Retrieving fasta seqs
Kevin Lam
aboulia at gmail.com
Tue Feb 2 14:29:04 UTC 2010
Yes I got a "memory error" when the job died.
The uncompressed ids file is about 680 mb. Perhaps storing in set will
increase the file space but
I assumed that it would still fit comfortably in 4gb of ram even if its a
32bit limit.
its a mystery I am dying to solve if I have more time.
I do not have the code right now will post up soon but it is almost the same
as the list method
On Tue, Feb 2, 2010 at 10:19 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:
> On Tue, Feb 2, 2010 at 2:13 PM, Kevin <aboulia at gmail.com> wrote:
> >
> > My version uses set to store the Ids. It fails with too many records ( 60
> > million ) on 31 gb ram 64 bit centos python 2.4 can't figure why. But
> works
> > well with 1 million ids.
>
> Using sets rather than a list should be faster.
>
> How does it fail on your large dataset - a memory error?
>
> > Can I propose this be part of the tutorial? It seems quite a popular
> > request. I was going to post on my blog but think more people will
> benefit
> > if it's on the wiki
> > I don't mind contributing the code and lessons
> >
> > Kevin
>
> I was also thinking we should turn this into an example, either as a
> wiki cookbook or just as an example in the tutorial.
>
> Peter
>
More information about the Biopython
mailing list