[Biopython] Parsing GB seq files with BioPython into BioSQL

Peter Cock p.j.a.cock at googlemail.com
Tue Mar 26 13:50:42 UTC 2013


On Tue, Mar 26, 2013 at 1:08 PM, Shyam Saladi <saladi at caltech.edu> wrote:
> Hi,
>
> I am parsing genbank genome files for microbial genomes and loading the
> sequence and annotations into a BioSQL database.
>
> The program I have is quite simple (same as given onlinehttp://
> biopython.org/wiki/BioSQL#Loading_Sequences_into_a_database).
>
> The issue is that each record when loaded into memory is huge. Some genomes
> take up the entire 32 gb ram + 32 gb swap.
>
> Does anyone have suggestions on how to make this process more efficient?

Could you show us your code and/or give some examples were
you find a single microbial genome is taking that much RAM -
it does seem more likely there is something else happening,
like keeping old records in memory (possibly as simple as
failing to commit the data to the database regularly). Which
database are you using? How are you doing the commits?

Thanks,

Peter



More information about the Biopython mailing list