[Biopython-dev] Lazy-loading parsers, was: Biopython GSoC 2013 applications via NESCent

Alex Leach albl500 at york.ac.uk
Wed May 1 18:56:12 EDT 2013


Dear all,

I also left some minor comments on the proposal; I hope they're helpful  
and I wish you every success!

You should focus on the proposal for now, but I thought I'd share a more  
presentable version of the fasta lazy-loader I wrote a couple of years  
ago. The focus at the time was to minimise memory usage and increase the  
speed of random access to fasta-formatted sequences, stored on disk. Only  
sequence accessions and file locations are stored in-memory (in a dict).  
Once the index has been populated, it can 'pickle' the dictionary to a  
file on disk, for later re-use.

It doesn't exactly fulfill all of your needs, but I hope it might help you  
in the right direction..

Also, were there plans for making the lazy loader thread-safe? I've done  
it in the past by passing a `multiprocessing.Pipe` instance to a method  
(`pipe_sequences`) of the lazy loader. If redesigning the code, I'd try to  
implement a callback scheme, but passing a Pipe did the job.. Maybe it's  
outside the current scope of the project, but anyway, I put the module up  
on github if you want to check it out[1].


Cheers,
Alex


[1] -  
https://github.com/alexleach/fasta_lazy_loader/blob/master/fasta_lazy_loader.py


More information about the Biopython-dev mailing list