[Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file

Stefan Kirov stefan.kirov at bms.com
Fri Oct 12 14:34:49 EDT 2007


Kevin Brown wrote:
>> I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
>> ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on 
>> it to generate Homo_sapiens.xml which is 5821420628 bytes.  I 
>> cannot parse this file with Bio::ASN1::EntrezGene, even on a 
>> machine with 256GB of memory.  I get a simple "Out of memory" 
>> output even with the following code:
>>
>> #!/usr/bin/perl
>> use strict;
>> use Bio::ASN1::EntrezGene;
>>    my $parser = Bio::ASN1::EntrezGene->new('file' => 
>> "Homo_sapiens.xml");
>>    while(my $result = $parser->next_seq)
>>    {
>>    }
>>     
>
> I think most systems have a per process memory limit (either hardcoded
> in the OS or configured depending on the OS) and IIRC most of the IO
> handlers for BioPerl load entire file contents into memory to process
> them.  Some of the IO parsers have been changed recently (a new one
> added for blast) so that it only pulls into memory as much as it needs
> to process the next result rather than the whole file in one shebang.
>   
The file is approx. 6GB, so on a 256GB machine this is not going to
create any problem. I think this might be deep not well controlled
recursion problem.
Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   




More information about the Bioperl-l mailing list