[BioRuby] Parse big PDB use up all memory

Tue Dec 18 13:55:57 UTC 2007

Hi,

Objects inside Bio::PDB often refer another objects
in the same Bio::PDB object, and this might cause
infinite recursion in Bio::PDB#inspect.

To define customized Bio::PDB#inspect seems to prevent
the memory exhaust problem.

  class Bio::PDB
    # returns a string containing human-readable representation
    # of this object.
    def inspect
      "#<#{self.class.to_s} entry_id=#{entry_id.inspect}>"
    end
  end

I also defined Bio::PDB::(Model|Chain|Residue)#inspect 
like above, and committed them into CVS.

Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp

On Thu, 13 Dec 2007 14:22:59 +0900
Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:

> Yup, I see the same behavior on linux and osx. Bio::PDB.new kills irb  
> but runs fine in a script. Thanks for the bug report. I'll see if I  
> can identify what's going on.
> 
> AlexG
> 
> On 13 Dec 2007, at 14:11, Yen-Ju Chen wrote:
> 
> > I did a quick test and found the problem is that I ran it in irb.
> > If I run it in script, like 'ruby test.rb', then it works fine.
> >
> > Yen-Ju
> >
> > On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
> >> Thank you for the hint for retrieve only header.
> >>
> >> I am using the default Ruby on Mac OS X 10.5.
> >> Here is the output of 'ruby -v'
> >>
> >> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
> >>
> >> And bioruby is 1.1.0 from gems.
> >>
> >> I will test it on Linux and see.
> >>
> >> Yen-Ju
> >>
> >>
> >> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto- 
> >> u.ac.jp> wrote:
> >>> Hi,
> >>>
> >>> Could you give some more details on what system and ruby/bioruby
> >>> version you are running? The same script uses less than 20MB on my
> >>> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> >>> seem so bad. Also 1w6k is biggish, but there are certainly bigger  
> >>> PDB
> >>> files out there so if you're having trouble with this one then  
> >>> others
> >>> will certainly be a problem.
> >>>
> >>> In answer to your second question, yes you should be able to just
> >>> extract the header (everything up to the ATOM records). But if  
> >>> you're
> >>> really running out of memory just parsing that file then I suspect  
> >>> you
> >>> have deeper issues. Anyway, the sample below works for me for  
> >>> parsing
> >>> the header from 1w6k:
> >>>
> >>> require 'bio'
> >>>
> >>> serv = Bio::Fetch.new
> >>> entry = serv.fetch('pdb','1w6k')
> >>>
> >>> header = ''
> >>> entry.each do |l|
> >>>   break if l.match(/^ATOM/)
> >>>   header << l
> >>> end
> >>>
> >>> pdb = Bio::PDB.new(header)
> >>> p pdb.accession
> >>>
> >>>
> >>> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
> >>>
> >>>> This is what I did:
> >>>>
> >>>> require 'bio'
> >>>> serv = Bio::Fetch.new()
> >>>> entry = serv.fetch('pdb', '1w6k')
> >>>> pdb = Bio::PDB.new(entry)
> >>>>
> >>>> The last step use up all memory and quit.
> >>>> The pdb file is quite big and I only need the information from  
> >>>> header.
> >>>> Is it possible to do something like this ?
> >>>>
> >>>> pdb = Bio::PDB.new(entry[0-40000])
> >>>>
> >>>> Thanx for the help
> >>>> _______________________________________________
> >>>> BioRuby mailing list
> >>>> BioRuby at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>>>
> >>>
> >>> Alex Gutteridge
> >>>
> >>> Bioinformatics Center
> >>> Kyoto University
> >>>
> >>>
> >>>
> >>
> >
> 
> Alex Gutteridge
> 
> Bioinformatics Center
> Kyoto University
> 
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby