[BioRuby] Parse big PDB use up all memory

Alex Gutteridge alexg at kuicr.kyoto-u.ac.jp
Thu Dec 13 05:22:59 UTC 2007


Yup, I see the same behavior on linux and osx. Bio::PDB.new kills irb  
but runs fine in a script. Thanks for the bug report. I'll see if I  
can identify what's going on.

AlexG

On 13 Dec 2007, at 14:11, Yen-Ju Chen wrote:

> I did a quick test and found the problem is that I ran it in irb.
> If I run it in script, like 'ruby test.rb', then it works fine.
>
> Yen-Ju
>
> On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
>> Thank you for the hint for retrieve only header.
>>
>> I am using the default Ruby on Mac OS X 10.5.
>> Here is the output of 'ruby -v'
>>
>> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
>>
>> And bioruby is 1.1.0 from gems.
>>
>> I will test it on Linux and see.
>>
>> Yen-Ju
>>
>>
>> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto- 
>> u.ac.jp> wrote:
>>> Hi,
>>>
>>> Could you give some more details on what system and ruby/bioruby
>>> version you are running? The same script uses less than 20MB on my
>>> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
>>> seem so bad. Also 1w6k is biggish, but there are certainly bigger  
>>> PDB
>>> files out there so if you're having trouble with this one then  
>>> others
>>> will certainly be a problem.
>>>
>>> In answer to your second question, yes you should be able to just
>>> extract the header (everything up to the ATOM records). But if  
>>> you're
>>> really running out of memory just parsing that file then I suspect  
>>> you
>>> have deeper issues. Anyway, the sample below works for me for  
>>> parsing
>>> the header from 1w6k:
>>>
>>> require 'bio'
>>>
>>> serv = Bio::Fetch.new
>>> entry = serv.fetch('pdb','1w6k')
>>>
>>> header = ''
>>> entry.each do |l|
>>>   break if l.match(/^ATOM/)
>>>   header << l
>>> end
>>>
>>> pdb = Bio::PDB.new(header)
>>> p pdb.accession
>>>
>>>
>>> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
>>>
>>>> This is what I did:
>>>>
>>>> require 'bio'
>>>> serv = Bio::Fetch.new()
>>>> entry = serv.fetch('pdb', '1w6k')
>>>> pdb = Bio::PDB.new(entry)
>>>>
>>>> The last step use up all memory and quit.
>>>> The pdb file is quite big and I only need the information from  
>>>> header.
>>>> Is it possible to do something like this ?
>>>>
>>>> pdb = Bio::PDB.new(entry[0-40000])
>>>>
>>>> Thanx for the help
>>>> _______________________________________________
>>>> BioRuby mailing list
>>>> BioRuby at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>>
>>>
>>> Alex Gutteridge
>>>
>>> Bioinformatics Center
>>> Kyoto University
>>>
>>>
>>>
>>
>

Alex Gutteridge

Bioinformatics Center
Kyoto University





More information about the BioRuby mailing list