[Bioperl-l] Memory requirements for conversion from embl to genbank

Chris Fields cjfields at uiuc.edu
Thu Aug 31 13:32:59 UTC 2006


Martin,

Do you get the same issue using SeqIO?

#/usr/bin/perl -w

use strict;
use warnings;
use Bio::SeqIO;

$file_in = '5UTR.Vrl_nr.dat';

$file_out = '5UTR.Vrl_nr.gb';

my $seqin = Bio::SeqIO->new(-format => 'embl',
                             -file   => "<$file_in");

my $seqout = Bio::SeqIO->new(-format => 'genbank',
                             -file   => ">$file_out");

while (my $seq = $seqin->next_seq) {
     print "Acc:",$seq->accession,"\n";
     $seqout->write_seq($seq);
}


Chris


On Aug 31, 2006, at 7:44 AM, Martin MOKREJŠ wrote:

> Hi,
>   I use bp_sreformat.pl to convert a file from embl format
> to genbank. I use current cvs HEAD version and cannot parse
> two files. Each record is small and I don't understand why
> is the such a huge memory requirement. The machine has 1GB
> RAM and running recent recent linux kernel. Moreover, I could
> parse the same file with bioperl-1.5.1 when I have manually
> fixed some missing quotes in the file.
>
>   With current changes to the embl & genbank parsing (bug #2077)
> I no longer can parse the file.
>
>   Here is the memory status at the moment when the machine ran
> out of memory and linux kernel killed the application:
>
>  1  0 803212  20936      8   2184    0    0     0     0 1062   38  
> 99  1  0  0
>  1  0 803208  19944      8   2184    0    0     0     0 1062   38  
> 100  0  0  0
>  1  0 803208  18828      8   2184    0    0     0     0 1061   37  
> 100  0  0  0
>  1  0 803204  17836      8   2184    0    0     0     0 1062   40  
> 100  0  0  0
>  1  0 803204  16844      8   2184    0    0     0     0 1062   48  
> 100  0  0  0
>  1  0 803200  15728      8   2184   32    0    32     0 1063   41  
> 100  0  0  0
>  1  0 803200  14736      8   2184    0    0     0     0 1062   41  
> 99  1  0  0
>  1  0 803196  13744      8   2184    0    0     0     0 1061   38  
> 100  0  0  0
>  1  0 803240  13640      8   2184    0   48     0    48 1063   68  
> 99  1  0  0
>  1  1 803240  12920      8   1984    0   40     0    40 1065  136  
> 100  0  0  0
>  1  1 803240  13192      8   1872    0 1056     0  1056 1114  326  
> 96  4  0  0
>  1  1 803240  14448      8   1336    0   20     0    20 1081  192  
> 90 10  0  0
>  1  1 803240  13656      8   1232    0   28     0    28 1070  104  
> 87 13  0  0
>  1  1 803240  12892      8   1260   32    4   176     4 1069  113  
> 86 14  0  0
>  0  4 803240  12144      8   1344  192   24   612    24 1088  185  
> 44 16  0 40
>  0  7 803240  11952      8   1180   32   32   508    32 1113  591  
> 46 23  0 32
>  0  3 803240  11948      8   1336 1120  500 10816   500 4390 1397   
> 2 31  0 66
>  2  6 803240  12056      8   1788  752  136  9412   136 6101 1795   
> 0 27  0 73
>  0  7 803240  12176      8   1748   12    0  2180     0 1132  326   
> 0 20  0 80
> procs -----------memory---------- ---swap-- -----io---- -system--  
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs  
> us sy id wa
>  0  5 803240  12492      8   1508  136   32  7508    32 2610  865   
> 4 45  0 51
>  0  6 803240  12056      8   2004   64    8  1456     8 1138  312   
> 9 18  0 73
>  1  6 803240  12668      8   1452   96   28 14856    28 2434  658   
> 0 31  0 69
>  0  7 803240  13240      8    564    0    0  3112     0 4602 1492   
> 4 38  0 58
>  0 10 803240  12768      8    688   36 15272  6000 15272 2026  431  
> 26 39  0 35
>  0  2  81780 966512      8   5692  108    0  2904     0 2204  372   
> 0 11  0 89
>  0  3  81780 966204      8   6056  128    0   488     3 1155   82   
> 1  0  0 99
>  0  1  81780 965460      8   6260  492    0   696     0 1150  161   
> 0  1 13 86
>  0  1  81732 963652      8   7860    8    0  1608     0 1147  199   
> 1  2 42 55
>  0  1  81732 962052      8   8560    4    0   704     0 1129  177   
> 6  1 43 50
>  0  1  81732 960120      8   9128    0    0   568     0 1124  161  
> 12  2 57 29
>  0  1  81732 957512      8   9840    4    0   716     0 1137  191  
> 13  2 27 58
>  1  0  81732 954992      8  10640   32    0   832     0 1135  191  
> 14  1 47 38
>  1  0  81732 952824      8  11016    0    0   340     0 1096  128  
> 64  1 18 16
>  1  0  81732 952152      8  11092    0    0     0     0 1062   80  
> 99  1  0  0
>  1  0  81732 951424      8  11196    0    0     0     0 1062  105  
> 99  1  0  0
>  1  0  81732 950808      8  11264    0    0     0     0 1062   74  
> 99  1  0  0
>
>
> $ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o  
> 5UTR.Vrl_nr.gb
> Killed
> $
>
> The file can be obtained from ftp://bighost.ba.itb.cnr.it-fixed/pub/ 
> Embnet/Database/UTR/data/
>
> I am not a perl guru so nor am familiar with bioperl code. Does  
> someone know
> whether the parsed records are held in the memory or not? It seems so.
> I guess deleting the objects from memory can be done by dereferencing
> them after they get written down in the new format immediately. Or,  
> the
> garbage collector does not work well in perl 5.8.8.
>
> Thanks for any help.
> Martin
>
> -- 
> Dr. Martin Mokrejs
> Faculty of Science, Charles University
> Vinicna 5, 128 43 Prague, Czech Republic
> http://www.iresite.org
> http://www.iresite.org/~mmokrejs
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list