[Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file

Kevin Brown Kevin.M.Brown at asu.edu
Fri Oct 12 18:19:48 UTC 2007


> I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
> ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on 
> it to generate Homo_sapiens.xml which is 5821420628 bytes.  I 
> cannot parse this file with Bio::ASN1::EntrezGene, even on a 
> machine with 256GB of memory.  I get a simple "Out of memory" 
> output even with the following code:
> 
> #!/usr/bin/perl
> use strict;
> use Bio::ASN1::EntrezGene;
>    my $parser = Bio::ASN1::EntrezGene->new('file' => 
> "Homo_sapiens.xml");
>    while(my $result = $parser->next_seq)
>    {
>    }

I think most systems have a per process memory limit (either hardcoded
in the OS or configured depending on the OS) and IIRC most of the IO
handlers for BioPerl load entire file contents into memory to process
them.  Some of the IO parsers have been changed recently (a new one
added for blast) so that it only pulls into memory as much as it needs
to process the next result rather than the whole file in one shebang.




More information about the Bioperl-l mailing list