[BioRuby] Parse big PDB use up all memory

Thu Dec 13 03:49:04 UTC 2007

Hi,

Could you give some more details on what system and ruby/bioruby  
version you are running? The same script uses less than 20MB on my  
machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't  
seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB  
files out there so if you're having trouble with this one then others  
will certainly be a problem.

In answer to your second question, yes you should be able to just  
extract the header (everything up to the ATOM records). But if you're  
really running out of memory just parsing that file then I suspect you  
have deeper issues. Anyway, the sample below works for me for parsing  
the header from 1w6k:

require 'bio'

serv = Bio::Fetch.new
entry = serv.fetch('pdb','1w6k')

header = ''
entry.each do |l|
   break if l.match(/^ATOM/)
   header << l
end

pdb = Bio::PDB.new(header)
p pdb.accession

On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:

> This is what I did:
>
> require 'bio'
> serv = Bio::Fetch.new()
> entry = serv.fetch('pdb', '1w6k')
> pdb = Bio::PDB.new(entry)
>
> The last step use up all memory and quit.
> The pdb file is quite big and I only need the information from header.
> Is it possible to do something like this ?
>
> pdb = Bio::PDB.new(entry[0-40000])
>
> Thanx for the help
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>

Alex Gutteridge

Bioinformatics Center
Kyoto University