[Biopython] [Biopython-dev] Lazy-loading parsers, was: Biopython GSoC 2013 applications via NESCent
Zhigang Wu
zhigangwu.bgi at gmail.com
Thu May 2 02:11:38 UTC 2013
Thanks so much Alex.
I definitely will take a look at it. Thanks again for making your code available.
Zhigang
On May 1, 2013, at 3:56 PM, "Alex Leach" <albl500 at york.ac.uk> wrote:
> Dear all,
>
> I also left some minor comments on the proposal; I hope they're helpful and I wish you every success!
>
> You should focus on the proposal for now, but I thought I'd share a more presentable version of the fasta lazy-loader I wrote a couple of years ago. The focus at the time was to minimise memory usage and increase the speed of random access to fasta-formatted sequences, stored on disk. Only sequence accessions and file locations are stored in-memory (in a dict). Once the index has been populated, it can 'pickle' the dictionary to a file on disk, for later re-use.
>
> It doesn't exactly fulfill all of your needs, but I hope it might help you in the right direction..
>
> Also, were there plans for making the lazy loader thread-safe? I've done it in the past by passing a `multiprocessing.Pipe` instance to a method (`pipe_sequences`) of the lazy loader. If redesigning the code, I'd try to implement a callback scheme, but passing a Pipe did the job.. Maybe it's outside the current scope of the project, but anyway, I put the module up on github if you want to check it out[1].
>
>
> Cheers,
> Alex
>
>
> [1] - https://github.com/alexleach/fasta_lazy_loader/blob/master/fasta_lazy_loader.py
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
More information about the Biopython
mailing list