[Biopython] [Biopython-dev] Lazy-loading parsers, was: Biopython GSoC 2013 applications via NESCent

Zhigang Wu zhigangwu.bgi at gmail.com
Thu May 2 02:11:38 UTC 2013


Thanks so much Alex. 
I definitely will take a look at it. Thanks again for making your code available. 

Zhigang



On May 1, 2013, at 3:56 PM, "Alex Leach" <albl500 at york.ac.uk> wrote:

> Dear all,
> 
> I also left some minor comments on the proposal; I hope they're helpful and I wish you every success!
> 
> You should focus on the proposal for now, but I thought I'd share a more presentable version of the fasta lazy-loader I wrote a couple of years ago. The focus at the time was to minimise memory usage and increase the speed of random access to fasta-formatted sequences, stored on disk. Only sequence accessions and file locations are stored in-memory (in a dict). Once the index has been populated, it can 'pickle' the dictionary to a file on disk, for later re-use.
> 
> It doesn't exactly fulfill all of your needs, but I hope it might help you in the right direction..
> 
> Also, were there plans for making the lazy loader thread-safe? I've done it in the past by passing a `multiprocessing.Pipe` instance to a method (`pipe_sequences`) of the lazy loader. If redesigning the code, I'd try to implement a callback scheme, but passing a Pipe did the job.. Maybe it's outside the current scope of the project, but anyway, I put the module up on github if you want to check it out[1].
> 
> 
> Cheers,
> Alex
> 
> 
> [1] - https://github.com/alexleach/fasta_lazy_loader/blob/master/fasta_lazy_loader.py
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev




More information about the Biopython mailing list