[BioRuby] Preparing for 1.1 release

Tue Jul 10 14:58:48 UTC 2007

On Tue, 2007-10-07 at 19:40 +0900, Naohisa GOTO wrote:
> Hi,
> 
> On Mon, 09 Jul 2007 16:00:47 -0400
> Mikael Borg <mikael.borg at utoronto.ca> wrote:
> 
> > There are still a few bugs in the pdb parser. I have tried to correct
> > the ones I've found (see below), but as I find the original code
> > difficult to understand, I might have introduced new bugs. Maybe you can
> > have a look and either use my suggested changes, or come up with other
> > solutions?
> > 
> > Cheers,
> > 
> > Mikael
> > 
> > 1. empty records causes parser to crash through
> > Bio::PDB::Record.Pdb_LString(nil).
> > Solution: if empty record, make empty string String.new('').
> 
> Thank you for bug report.
> I changed "str" to "str.to_s" to fix the bug.
> 
> > 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that
> > doesn't contain any sheets, the parser crashes.
> > Solution: return nil if there are no sheets in structure
> 
> The same or similar error could also be occurred for REMARK (remark),
> JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet),
> SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords),
> AUTHOR (authors), HEADER (entry_id, accession, classification),
> TITLE (definition), and REVDAT (version) records (methods).
> 
> This is mostly caused by the Bio::PDB#record method which
> returned nil when the specified record did not exist.
> I changed it to return an empty array for nonexistent records.
> 
> All of the above bugs are now fixed and committed into CVS.
> For your convenience, patch is attached below.
> 
> Thanks,
> 
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org
> 
> -------------------------------------------------------------------
> --- lib/bio/db/pdb/pdb.rb       19 Apr 2007 13:59:29 -0000      1.22
> +++ lib/bio/db/pdb/pdb.rb       10 Jul 2007 10:17:38 -0000
> @@ -119,7 +119,7 @@
>            m
>          end
>          def self.new(str)
> -          String.new(str)
> +          String.new(str.to_s)
>          end
>        end
> 
> @@ -1674,7 +1674,7 @@
>      # p pdb.record['HETATM']
>      #
>      def record(name = nil)
> -      name ? @hash[name] : @hash
> +      name ? (@hash[name] || []) : @hash
>      end
> 
>      #--
> @@ -1837,12 +1837,13 @@
> 
>      # Classification in "HEADER".
>      def classification
> -      self.record('HEADER').first.classification
> +      f = self.record('HEADER').first
> +      f ? f.classification : nil
>      end
> 
>      # Get authors in "AUTHOR".
>      def authors
> -      self.record('AUTHOR').first.authorList
> +      self.record('AUTHOR').collect { |f| f.authorList }.flatten
>      end
> 
>      #--
> @@ -1851,7 +1852,10 @@
> 
>      # PDB identifier written in "HEADER". (e.g. 1A00)
>      def entry_id
> -      @id = self.record('HEADER').first.idCode unless @id
> +      unless @id
> +        f = self.record('HEADER').first
> +        @id = f ? f.idCode : nil
> +      end
>        @id
>      end
> 
> @@ -1862,12 +1866,14 @@
> 
>      # Title of this entry in "TITLE".
>      def definition
> -      self.record('TITLE').first.title
> +      f = self.record('TITLE').first
> +      f ? f.title : nil
>      end
> 
>      # Current modification number in "REVDAT".
>      def version
> -      self.record('REVDAT').first.modNum
> +      f = self.record('REVDAT').first
> +      f ? f.modNum : nil
>      end
> 
>    end #class PDB
> -------------------------------------------------------------------

Thank you for taking care of this so fast, great job!

Have you considered adding an optional argument to Bio::PDB.new, so that
it would be possible to prevent parsing parts of the pdb info, e.g.
remarks/hydrogen atoms/water molecules? The parser is using a lot of
memory, especially when calling Bio::PDB.inspect so that every record is
parsed. Maybe something for the next version, after 1.1 is done?

/Mikael