From ktym at hgc.jp Mon Jul 9 07:57:18 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 9 Jul 2007 20:57:18 +0900 Subject: [BioRuby] Preparing for 1.1 release Message-ID: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> Hi all, Finally, I'm preparing for the BioRuby 1.1 release. Developers, are you ready for the next release? * If you have modules still working on, please let me know ASAP. - Which module should be excluded in the next release? - When will you finish and commit the final version? * If you have not filled the ChangeLog file, please document it now. I hope to pack this weekend. Regards, Toshiaki From mikael.borg at utoronto.ca Mon Jul 9 16:00:47 2007 From: mikael.borg at utoronto.ca (Mikael Borg) Date: Mon, 09 Jul 2007 16:00:47 -0400 Subject: [BioRuby] Preparing for 1.1 release In-Reply-To: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> Message-ID: <1184011247.19558.44.camel@localhost.localdomain> On Mon, 2007-09-07 at 20:57 +0900, Toshiaki Katayama wrote: > Hi all, > > Finally, I'm preparing for the BioRuby 1.1 release. > > Developers, are you ready for the next release? > > * If you have modules still working on, please let me know ASAP. > - Which module should be excluded in the next release? > - When will you finish and commit the final version? > > * If you have not filled the ChangeLog file, please document it now. > > I hope to pack this weekend. > > Regards, > Toshiaki > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby There are still a few bugs in the pdb parser. I have tried to correct the ones I've found (see below), but as I find the original code difficult to understand, I might have introduced new bugs. Maybe you can have a look and either use my suggested changes, or come up with other solutions? Cheers, Mikael 1. empty records causes parser to crash through Bio::PDB::Record.Pdb_LString(nil). Solution: if empty record, make empty string String.new(''). 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that doesn't contain any sheets, the parser crashes. Solution: return nil if there are no sheets in structure # diff -u ~mborg/tmp/bioruby-1.1.0-pre4/lib/bio/db/pdb/pdb.rb pdb.rb --- /home/mborg/tmp/bioruby-1.1.0-pre4/lib/bio/db/pdb/pdb.rb 2007-04-19 09:59:29.000000000 -0400 +++ pdb.rb 2007-07-09 14:44:01.000000000 -0400 @@ -119,7 +119,11 @@ m end def self.new(str) - String.new(str) + if str.nil? + String.new('') + else + String.new(str) + end end end @@ -1755,6 +1759,7 @@ # If sheetID is given, it returns an array of # Bio::PDB::Record::SHEET instances. def sheet(sheetID = nil) + return nil unless @sheet unless defined?(@sheet) @sheet = make_grouping(self.record('SHEET'), :sheetID) end From ngoto at gen-info.osaka-u.ac.jp Tue Jul 10 06:40:10 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 10 Jul 2007 19:40:10 +0900 Subject: [BioRuby] Preparing for 1.1 release In-Reply-To: <1184011247.19558.44.camel@localhost.localdomain> References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> <1184011247.19558.44.camel@localhost.localdomain> Message-ID: <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp> Hi, On Mon, 09 Jul 2007 16:00:47 -0400 Mikael Borg wrote: > There are still a few bugs in the pdb parser. I have tried to correct > the ones I've found (see below), but as I find the original code > difficult to understand, I might have introduced new bugs. Maybe you can > have a look and either use my suggested changes, or come up with other > solutions? > > Cheers, > > Mikael > > 1. empty records causes parser to crash through > Bio::PDB::Record.Pdb_LString(nil). > Solution: if empty record, make empty string String.new(''). Thank you for bug report. I changed "str" to "str.to_s" to fix the bug. > 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that > doesn't contain any sheets, the parser crashes. > Solution: return nil if there are no sheets in structure The same or similar error could also be occurred for REMARK (remark), JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet), SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords), AUTHOR (authors), HEADER (entry_id, accession, classification), TITLE (definition), and REVDAT (version) records (methods). This is mostly caused by the Bio::PDB#record method which returned nil when the specified record did not exist. I changed it to return an empty array for nonexistent records. All of the above bugs are now fixed and committed into CVS. For your convenience, patch is attached below. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org ------------------------------------------------------------------- --- lib/bio/db/pdb/pdb.rb 19 Apr 2007 13:59:29 -0000 1.22 +++ lib/bio/db/pdb/pdb.rb 10 Jul 2007 10:17:38 -0000 @@ -119,7 +119,7 @@ m end def self.new(str) - String.new(str) + String.new(str.to_s) end end @@ -1674,7 +1674,7 @@ # p pdb.record['HETATM'] # def record(name = nil) - name ? @hash[name] : @hash + name ? (@hash[name] || []) : @hash end #-- @@ -1837,12 +1837,13 @@ # Classification in "HEADER". def classification - self.record('HEADER').first.classification + f = self.record('HEADER').first + f ? f.classification : nil end # Get authors in "AUTHOR". def authors - self.record('AUTHOR').first.authorList + self.record('AUTHOR').collect { |f| f.authorList }.flatten end #-- @@ -1851,7 +1852,10 @@ # PDB identifier written in "HEADER". (e.g. 1A00) def entry_id - @id = self.record('HEADER').first.idCode unless @id + unless @id + f = self.record('HEADER').first + @id = f ? f.idCode : nil + end @id end @@ -1862,12 +1866,14 @@ # Title of this entry in "TITLE". def definition - self.record('TITLE').first.title + f = self.record('TITLE').first + f ? f.title : nil end # Current modification number in "REVDAT". def version - self.record('REVDAT').first.modNum + f = self.record('REVDAT').first + f ? f.modNum : nil end end #class PDB ------------------------------------------------------------------- From mikael.borg at utoronto.ca Tue Jul 10 10:58:48 2007 From: mikael.borg at utoronto.ca (Mikael Borg) Date: Tue, 10 Jul 2007 10:58:48 -0400 Subject: [BioRuby] Preparing for 1.1 release In-Reply-To: <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp> References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> <1184011247.19558.44.camel@localhost.localdomain> <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <1184079528.16555.18.camel@localhost.localdomain> On Tue, 2007-10-07 at 19:40 +0900, Naohisa GOTO wrote: > Hi, > > On Mon, 09 Jul 2007 16:00:47 -0400 > Mikael Borg wrote: > > > There are still a few bugs in the pdb parser. I have tried to correct > > the ones I've found (see below), but as I find the original code > > difficult to understand, I might have introduced new bugs. Maybe you can > > have a look and either use my suggested changes, or come up with other > > solutions? > > > > Cheers, > > > > Mikael > > > > 1. empty records causes parser to crash through > > Bio::PDB::Record.Pdb_LString(nil). > > Solution: if empty record, make empty string String.new(''). > > Thank you for bug report. > I changed "str" to "str.to_s" to fix the bug. > > > 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that > > doesn't contain any sheets, the parser crashes. > > Solution: return nil if there are no sheets in structure > > The same or similar error could also be occurred for REMARK (remark), > JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet), > SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords), > AUTHOR (authors), HEADER (entry_id, accession, classification), > TITLE (definition), and REVDAT (version) records (methods). > > This is mostly caused by the Bio::PDB#record method which > returned nil when the specified record did not exist. > I changed it to return an empty array for nonexistent records. > > All of the above bugs are now fixed and committed into CVS. > For your convenience, patch is attached below. > > Thanks, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org > > ------------------------------------------------------------------- > --- lib/bio/db/pdb/pdb.rb 19 Apr 2007 13:59:29 -0000 1.22 > +++ lib/bio/db/pdb/pdb.rb 10 Jul 2007 10:17:38 -0000 > @@ -119,7 +119,7 @@ > m > end > def self.new(str) > - String.new(str) > + String.new(str.to_s) > end > end > > @@ -1674,7 +1674,7 @@ > # p pdb.record['HETATM'] > # > def record(name = nil) > - name ? @hash[name] : @hash > + name ? (@hash[name] || []) : @hash > end > > #-- > @@ -1837,12 +1837,13 @@ > > # Classification in "HEADER". > def classification > - self.record('HEADER').first.classification > + f = self.record('HEADER').first > + f ? f.classification : nil > end > > # Get authors in "AUTHOR". > def authors > - self.record('AUTHOR').first.authorList > + self.record('AUTHOR').collect { |f| f.authorList }.flatten > end > > #-- > @@ -1851,7 +1852,10 @@ > > # PDB identifier written in "HEADER". (e.g. 1A00) > def entry_id > - @id = self.record('HEADER').first.idCode unless @id > + unless @id > + f = self.record('HEADER').first > + @id = f ? f.idCode : nil > + end > @id > end > > @@ -1862,12 +1866,14 @@ > > # Title of this entry in "TITLE". > def definition > - self.record('TITLE').first.title > + f = self.record('TITLE').first > + f ? f.title : nil > end > > # Current modification number in "REVDAT". > def version > - self.record('REVDAT').first.modNum > + f = self.record('REVDAT').first > + f ? f.modNum : nil > end > > end #class PDB > ------------------------------------------------------------------- Thank you for taking care of this so fast, great job! Have you considered adding an optional argument to Bio::PDB.new, so that it would be possible to prevent parsing parts of the pdb info, e.g. remarks/hydrogen atoms/water molecules? The parser is using a lot of memory, especially when calling Bio::PDB.inspect so that every record is parsed. Maybe something for the next version, after 1.1 is done? /Mikael From ktym at hgc.jp Mon Jul 16 14:14:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Tue, 17 Jul 2007 03:14:34 +0900 Subject: [BioRuby] A couple of changes to DAS... In-Reply-To: <466732A5.3060802@cs.man.ac.uk> References: <466732A5.3060802@cs.man.ac.uk> Message-ID: Hi Dave, I'm very sorry that I have missed your contribution. I appreciate your fixes and congratulations to your Python version. On 2007/06/07, at 7:18, Dave Thorne wrote: > I have just spent a successful couple of hours porting the latest bio/io/das.rb file to Python (their DAS support is rather meagre). During the process I found a couple of lines in the original ruby module that I think contain mistakes. I have attached an appropriate diff file. The two small changes are as follows: > > line 71: > dsn.mapmaster = e.name > should be (?): > dsn.mapmaster = e.text I've just committed this. > line 97: > segment.stop = e.attributes['orientation'] > should be: > segment.orientation = e.attributes['orientation'] This had already been fixed in the repository. Thank you! Regards, Toshiaki Katayama From trevor at corevx.com Thu Jul 19 17:21:26 2007 From: trevor at corevx.com (Trevor Wennblom) Date: Thu, 19 Jul 2007 16:21:26 -0500 Subject: [BioRuby] v1.1 Message-ID: Hey guys, Good job on getting version 1.1.0 out there! What's the most difficult part of getting these releases ready? Is there a way that we could automate it to make life easier? How difficult would it be to have regular minor-revision releases? (say 1.1.1, 1.1.2, etc) Trevor From ktym at hgc.jp Thu Jul 19 12:07:31 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 20 Jul 2007 01:07:31 +0900 Subject: [BioRuby] BioRuby 1.1 released in BOSC2007 presentation Message-ID: Hi all, I have finally released the BioRuby 1.1 at http://bioruby.org/archive/bioruby-1.1.0.tar.gz and gem package is also available at http://rubyforge.org/projects/bioruby/ I also put my presentation of BOSC 2007 held today http://bioruby.org/archive/doc/BR070719-bosc.pdf Enjoy! Toshiaki Katayama From ktym at hgc.jp Fri Jul 20 04:01:18 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 20 Jul 2007 17:01:18 +0900 Subject: [BioRuby] v1.1 In-Reply-To: References: Message-ID: Hi, This time, we have challenged difficult tasks including - changing license - rdoc formatting - phyloinformatics - attempt for rails integration etc. and these might lead the delay of the release as estimating how long they may take to be stabilized was not predictable. Several other reasons I guess: * Targetting priority We have a lot of items in our todo list, but what should be done before the next release is not easily decided. * Time for development To spare dedicated span of time for development is getting difficult for core developers as the project is running as a volunteer bases and they have their own jobs (not students with unlimited time any more...) Anyway, I'll try to release more often! * I will release 1.1.1, 1.1.2, ... as soon as the critical bugs are found and fixed. * We need to fix goals (todo items) for the 1.2 release. Thanks, Toshiaki from conference room of BOSC2007 day2 On 2007/07/20, at 6:21, Trevor Wennblom wrote: > Hey guys, > > Good job on getting version 1.1.0 out there! > > What's the most difficult part of getting these releases ready? Is > there a way that we could automate it to make life easier? How > difficult would it be to have regular minor-revision releases? (say > 1.1.1, 1.1.2, etc) > > Trevor > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From aidanfindlater at gmail.com Fri Jul 20 14:54:43 2007 From: aidanfindlater at gmail.com (Aidan Findlater) Date: Fri, 20 Jul 2007 14:54:43 -0400 Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's Bio::DB::Flat Message-ID: *Summary:* Attached is a diff that allows Bio::FlatFileIndex to access BDB flatfile databases created by BioPerl. I have not changed the way BioRuby creates its databases, so this likely breaks access to BioRuby-created flatfiles. *Description:* I have some flatfile databases that were created with BioPerl, but it seems that BioRuby does things a little differently. Specifically, BioRuby tries to get config and fileid information from BDB databases; BioPerl stores this information in config.dat. As well, it returns sequences shifted one character to the right (the '>' from my FASTA file was at the end of the returned sequence, and none was at the beginning). I've hacked it up so that it works for me. If anyone else is having this problem, the diff from my changes is attached below. Sample usage: Bio::FlatFileIndex.open('/path/to/the/database/directory') do |db| p db.search("SPAC11H11.06") # My favourite pombe gene! end Now I just have to figure out what to do with the Bio::FlatFileIndex::Results mess that is returned... Aidan Findlater Index: bioruby/lib/bio/io/flatfile/index.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/index.rb,v retrieving revision 1.19 diff -r1.19 index.rb 561c561 < seek(pos, IO::SEEK_SET) --- > seek(pos-1, IO::SEEK_SET) 1147,1148c1147,1148 < @config = BDBwrapper.new(@dbname, 'config') < @bdb_fileids = BDBwrapper.new(@dbname, 'fileids') --- > @config = hash.reject{|k,v| k.include?("fileid_") } > @bdb_fileids = hash.reject{|k,v| !k.include?("fileid_") } 1196,1199d1195 < @config.close < @config.open(*bdbarg) < @bdb_fileids.close < @bdb_fileids.open(*bdbarg) 1229,1232d1224 < if @bdb then < @config.close < @bdb_fileids.close < end 1287c1279 < @fileids = FileIDs.new('', @bdb_fileids) --- > @fileids = FileIDs.new('fileid_', @bdb_fileids) From ngoto at gen-info.osaka-u.ac.jp Sun Jul 22 06:25:00 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 22 Jul 2007 19:25:00 +0900 Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's Bio::DB::Flat In-Reply-To: References: Message-ID: <20070722102500.DC62E1CBC412@idnmail.gen-info.osaka-u.ac.jp> Hello, I'm a maintainer of Bio::FlatFileIndex in bioruby. On Fri, 20 Jul 2007 14:54:43 -0400 "Aidan Findlater" wrote: > *Summary:* Attached is a diff that allows Bio::FlatFileIndex to access BDB > flatfile databases created by BioPerl. I have not changed the way BioRuby > creates its databases, so this likely breaks access to BioRuby-created > flatfiles. > > > *Description:* I have some flatfile databases that were created with > BioPerl, but it seems that BioRuby does things a little differently. > Specifically, BioRuby tries to get config and fileid information from BDB > databases; BioPerl stores this information in config.dat. The OBDA flat-file indexing specification (*1) says that configiguration data is stored in the BDB database, not config.dat. (excerpted from indexing.txt (*1)) | 2) The subdirectory contains a file named "config.dat" containing tab | separated key/value pairs. The first line contains the key "index" | and value "index\tBerkeleyDB/1". This means the first few characters | of the config.dat file is "index\tBerkeleyDB/1\n". | | There is no other data in this file. | | 3) Global configuration data is stored in the database named "config". The specification text was last modified in 5 years ago, and it might have been changed in somewhere I don't know. Does someone know changes of specifications, or how to get new specification text? *1 http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/flatfile/indexing.txt?rev=1.3&cvsroot=obf-common&content-type=text/vnd.viewcvs-markup > As well, it returns sequences shifted one character to the right (the '>' > from my FASTA file was at the end of the returned sequence, and none was at > the beginning). I suppose this is BioPerl's indexer's issue. I prepared the file /tmp/flat/tmp.fst as below. ----------------------------------------------------------- >TEST00001 EOL aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >TEST00002 EOL ccccccccccccccccccccccccccccccccccccccccccccccccc >TEST00003 EOL ggggggggggggggggggggggggggggggggggggggggggggggggg >TEST00004 EOL ttttttttttttttttttttttttttttttttttttttttttttttttt ----------------------------------------------------------- (Each line of the above file is 50 byte in UNIX). % bp_bioflat_index.pl --create --format fasta \ --location /tmp/flat --dbname testbdb --indextype bdb \ /tmp/flat/tmp.fst Then, I confirmed the contents of generated BDB data. % ruby -r bdb -e 'BDB::Btree.open("/tmp/flat/testbdb/key_ACC").to_a.sort.each { |x| puts x.join("\t") }' TEST00001 0 0 101 TEST00002 0 101 100 TEST00003 0 201 100 TEST00004 0 301 99 (Each column shows ID, FileID, start position, and size.) The start positions of TEST00002, TEST00003, and TEST00004 are wrong, and the size of TEST00001 and TEST00004 is wrong. I'm using BioPerl 1.5.2_102. % perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.005002102 In addition, I also tried flat database. % bp_bioflat_index.pl --create --format fasta \ --location /tmp/flat --dbname testflat --indextype flat \ /tmp/flat/tmp.fst % cat testflat2/key_ACC.key 19TEST00001 0 0 100 TEST00002 0 100 100TEST00003 0 200 100TEST00004 0 300 50 It sesms that the index is correctly created. However, according to the specification (*1), the first 4 bytes of the key_ACC.key file should be "0019", but was " 19" in the above index created with BioPerl. (excerpted from indexing.txt (*1)) | Each record of this file is in a fixed width format. There is no | special termination character. Instead, the first four bytes of the | file contain the mapping record size, in bytes, represented as text | string. The string is left padded with zeros to fit in four bytes, so | the allowed text strings are "0000", "0001", "0002", ..., "9999". Regards, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org From ngoto at gen-info.osaka-u.ac.jp Sun Jul 22 06:48:42 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 22 Jul 2007 19:48:42 +0900 Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's Bio::DB::Flat References: Message-ID: <20070722104843.212521CBC412@idnmail.gen-info.osaka-u.ac.jp> On Sun, 22 Jul 2007 19:25:00 +0900 Naohisa GOTO wrote: > In addition, I also tried flat database. > > % bp_bioflat_index.pl --create --format fasta \ > --location /tmp/flat --dbname testflat --indextype flat \ > /tmp/flat/tmp.fst > > % cat testflat2/key_ACC.key This is my typo. I meant % cat /tmp/flat/testflat/key_ACC.key > 19TEST00001 0 0 100 TEST00002 0 100 100TEST00003 0 200 100TEST00004 0 300 50 > > It sesms that the index is correctly created. The index was not correct. The size of TEST00004 is misrecognized as 50 (should be 100). I think this is also a bug in BioPerl. Regards, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org From ktym at hgc.jp Mon Jul 9 11:57:18 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 9 Jul 2007 20:57:18 +0900 Subject: [BioRuby] Preparing for 1.1 release Message-ID: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> Hi all, Finally, I'm preparing for the BioRuby 1.1 release. Developers, are you ready for the next release? * If you have modules still working on, please let me know ASAP. - Which module should be excluded in the next release? - When will you finish and commit the final version? * If you have not filled the ChangeLog file, please document it now. I hope to pack this weekend. Regards, Toshiaki From mikael.borg at utoronto.ca Mon Jul 9 20:00:47 2007 From: mikael.borg at utoronto.ca (Mikael Borg) Date: Mon, 09 Jul 2007 16:00:47 -0400 Subject: [BioRuby] Preparing for 1.1 release In-Reply-To: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> Message-ID: <1184011247.19558.44.camel@localhost.localdomain> On Mon, 2007-09-07 at 20:57 +0900, Toshiaki Katayama wrote: > Hi all, > > Finally, I'm preparing for the BioRuby 1.1 release. > > Developers, are you ready for the next release? > > * If you have modules still working on, please let me know ASAP. > - Which module should be excluded in the next release? > - When will you finish and commit the final version? > > * If you have not filled the ChangeLog file, please document it now. > > I hope to pack this weekend. > > Regards, > Toshiaki > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby There are still a few bugs in the pdb parser. I have tried to correct the ones I've found (see below), but as I find the original code difficult to understand, I might have introduced new bugs. Maybe you can have a look and either use my suggested changes, or come up with other solutions? Cheers, Mikael 1. empty records causes parser to crash through Bio::PDB::Record.Pdb_LString(nil). Solution: if empty record, make empty string String.new(''). 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that doesn't contain any sheets, the parser crashes. Solution: return nil if there are no sheets in structure # diff -u ~mborg/tmp/bioruby-1.1.0-pre4/lib/bio/db/pdb/pdb.rb pdb.rb --- /home/mborg/tmp/bioruby-1.1.0-pre4/lib/bio/db/pdb/pdb.rb 2007-04-19 09:59:29.000000000 -0400 +++ pdb.rb 2007-07-09 14:44:01.000000000 -0400 @@ -119,7 +119,11 @@ m end def self.new(str) - String.new(str) + if str.nil? + String.new('') + else + String.new(str) + end end end @@ -1755,6 +1759,7 @@ # If sheetID is given, it returns an array of # Bio::PDB::Record::SHEET instances. def sheet(sheetID = nil) + return nil unless @sheet unless defined?(@sheet) @sheet = make_grouping(self.record('SHEET'), :sheetID) end From ngoto at gen-info.osaka-u.ac.jp Tue Jul 10 10:40:10 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 10 Jul 2007 19:40:10 +0900 Subject: [BioRuby] Preparing for 1.1 release In-Reply-To: <1184011247.19558.44.camel@localhost.localdomain> References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> <1184011247.19558.44.camel@localhost.localdomain> Message-ID: <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp> Hi, On Mon, 09 Jul 2007 16:00:47 -0400 Mikael Borg wrote: > There are still a few bugs in the pdb parser. I have tried to correct > the ones I've found (see below), but as I find the original code > difficult to understand, I might have introduced new bugs. Maybe you can > have a look and either use my suggested changes, or come up with other > solutions? > > Cheers, > > Mikael > > 1. empty records causes parser to crash through > Bio::PDB::Record.Pdb_LString(nil). > Solution: if empty record, make empty string String.new(''). Thank you for bug report. I changed "str" to "str.to_s" to fix the bug. > 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that > doesn't contain any sheets, the parser crashes. > Solution: return nil if there are no sheets in structure The same or similar error could also be occurred for REMARK (remark), JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet), SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords), AUTHOR (authors), HEADER (entry_id, accession, classification), TITLE (definition), and REVDAT (version) records (methods). This is mostly caused by the Bio::PDB#record method which returned nil when the specified record did not exist. I changed it to return an empty array for nonexistent records. All of the above bugs are now fixed and committed into CVS. For your convenience, patch is attached below. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org ------------------------------------------------------------------- --- lib/bio/db/pdb/pdb.rb 19 Apr 2007 13:59:29 -0000 1.22 +++ lib/bio/db/pdb/pdb.rb 10 Jul 2007 10:17:38 -0000 @@ -119,7 +119,7 @@ m end def self.new(str) - String.new(str) + String.new(str.to_s) end end @@ -1674,7 +1674,7 @@ # p pdb.record['HETATM'] # def record(name = nil) - name ? @hash[name] : @hash + name ? (@hash[name] || []) : @hash end #-- @@ -1837,12 +1837,13 @@ # Classification in "HEADER". def classification - self.record('HEADER').first.classification + f = self.record('HEADER').first + f ? f.classification : nil end # Get authors in "AUTHOR". def authors - self.record('AUTHOR').first.authorList + self.record('AUTHOR').collect { |f| f.authorList }.flatten end #-- @@ -1851,7 +1852,10 @@ # PDB identifier written in "HEADER". (e.g. 1A00) def entry_id - @id = self.record('HEADER').first.idCode unless @id + unless @id + f = self.record('HEADER').first + @id = f ? f.idCode : nil + end @id end @@ -1862,12 +1866,14 @@ # Title of this entry in "TITLE". def definition - self.record('TITLE').first.title + f = self.record('TITLE').first + f ? f.title : nil end # Current modification number in "REVDAT". def version - self.record('REVDAT').first.modNum + f = self.record('REVDAT').first + f ? f.modNum : nil end end #class PDB ------------------------------------------------------------------- From mikael.borg at utoronto.ca Tue Jul 10 14:58:48 2007 From: mikael.borg at utoronto.ca (Mikael Borg) Date: Tue, 10 Jul 2007 10:58:48 -0400 Subject: [BioRuby] Preparing for 1.1 release In-Reply-To: <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp> References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp> <1184011247.19558.44.camel@localhost.localdomain> <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <1184079528.16555.18.camel@localhost.localdomain> On Tue, 2007-10-07 at 19:40 +0900, Naohisa GOTO wrote: > Hi, > > On Mon, 09 Jul 2007 16:00:47 -0400 > Mikael Borg wrote: > > > There are still a few bugs in the pdb parser. I have tried to correct > > the ones I've found (see below), but as I find the original code > > difficult to understand, I might have introduced new bugs. Maybe you can > > have a look and either use my suggested changes, or come up with other > > solutions? > > > > Cheers, > > > > Mikael > > > > 1. empty records causes parser to crash through > > Bio::PDB::Record.Pdb_LString(nil). > > Solution: if empty record, make empty string String.new(''). > > Thank you for bug report. > I changed "str" to "str.to_s" to fix the bug. > > > 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that > > doesn't contain any sheets, the parser crashes. > > Solution: return nil if there are no sheets in structure > > The same or similar error could also be occurred for REMARK (remark), > JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet), > SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords), > AUTHOR (authors), HEADER (entry_id, accession, classification), > TITLE (definition), and REVDAT (version) records (methods). > > This is mostly caused by the Bio::PDB#record method which > returned nil when the specified record did not exist. > I changed it to return an empty array for nonexistent records. > > All of the above bugs are now fixed and committed into CVS. > For your convenience, patch is attached below. > > Thanks, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org > > ------------------------------------------------------------------- > --- lib/bio/db/pdb/pdb.rb 19 Apr 2007 13:59:29 -0000 1.22 > +++ lib/bio/db/pdb/pdb.rb 10 Jul 2007 10:17:38 -0000 > @@ -119,7 +119,7 @@ > m > end > def self.new(str) > - String.new(str) > + String.new(str.to_s) > end > end > > @@ -1674,7 +1674,7 @@ > # p pdb.record['HETATM'] > # > def record(name = nil) > - name ? @hash[name] : @hash > + name ? (@hash[name] || []) : @hash > end > > #-- > @@ -1837,12 +1837,13 @@ > > # Classification in "HEADER". > def classification > - self.record('HEADER').first.classification > + f = self.record('HEADER').first > + f ? f.classification : nil > end > > # Get authors in "AUTHOR". > def authors > - self.record('AUTHOR').first.authorList > + self.record('AUTHOR').collect { |f| f.authorList }.flatten > end > > #-- > @@ -1851,7 +1852,10 @@ > > # PDB identifier written in "HEADER". (e.g. 1A00) > def entry_id > - @id = self.record('HEADER').first.idCode unless @id > + unless @id > + f = self.record('HEADER').first > + @id = f ? f.idCode : nil > + end > @id > end > > @@ -1862,12 +1866,14 @@ > > # Title of this entry in "TITLE". > def definition > - self.record('TITLE').first.title > + f = self.record('TITLE').first > + f ? f.title : nil > end > > # Current modification number in "REVDAT". > def version > - self.record('REVDAT').first.modNum > + f = self.record('REVDAT').first > + f ? f.modNum : nil > end > > end #class PDB > ------------------------------------------------------------------- Thank you for taking care of this so fast, great job! Have you considered adding an optional argument to Bio::PDB.new, so that it would be possible to prevent parsing parts of the pdb info, e.g. remarks/hydrogen atoms/water molecules? The parser is using a lot of memory, especially when calling Bio::PDB.inspect so that every record is parsed. Maybe something for the next version, after 1.1 is done? /Mikael From ktym at hgc.jp Mon Jul 16 18:14:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Tue, 17 Jul 2007 03:14:34 +0900 Subject: [BioRuby] A couple of changes to DAS... In-Reply-To: <466732A5.3060802@cs.man.ac.uk> References: <466732A5.3060802@cs.man.ac.uk> Message-ID: Hi Dave, I'm very sorry that I have missed your contribution. I appreciate your fixes and congratulations to your Python version. On 2007/06/07, at 7:18, Dave Thorne wrote: > I have just spent a successful couple of hours porting the latest bio/io/das.rb file to Python (their DAS support is rather meagre). During the process I found a couple of lines in the original ruby module that I think contain mistakes. I have attached an appropriate diff file. The two small changes are as follows: > > line 71: > dsn.mapmaster = e.name > should be (?): > dsn.mapmaster = e.text I've just committed this. > line 97: > segment.stop = e.attributes['orientation'] > should be: > segment.orientation = e.attributes['orientation'] This had already been fixed in the repository. Thank you! Regards, Toshiaki Katayama From trevor at corevx.com Thu Jul 19 21:21:26 2007 From: trevor at corevx.com (Trevor Wennblom) Date: Thu, 19 Jul 2007 16:21:26 -0500 Subject: [BioRuby] v1.1 Message-ID: Hey guys, Good job on getting version 1.1.0 out there! What's the most difficult part of getting these releases ready? Is there a way that we could automate it to make life easier? How difficult would it be to have regular minor-revision releases? (say 1.1.1, 1.1.2, etc) Trevor From ktym at hgc.jp Thu Jul 19 16:07:31 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 20 Jul 2007 01:07:31 +0900 Subject: [BioRuby] BioRuby 1.1 released in BOSC2007 presentation Message-ID: Hi all, I have finally released the BioRuby 1.1 at http://bioruby.org/archive/bioruby-1.1.0.tar.gz and gem package is also available at http://rubyforge.org/projects/bioruby/ I also put my presentation of BOSC 2007 held today http://bioruby.org/archive/doc/BR070719-bosc.pdf Enjoy! Toshiaki Katayama From ktym at hgc.jp Fri Jul 20 08:01:18 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 20 Jul 2007 17:01:18 +0900 Subject: [BioRuby] v1.1 In-Reply-To: References: Message-ID: Hi, This time, we have challenged difficult tasks including - changing license - rdoc formatting - phyloinformatics - attempt for rails integration etc. and these might lead the delay of the release as estimating how long they may take to be stabilized was not predictable. Several other reasons I guess: * Targetting priority We have a lot of items in our todo list, but what should be done before the next release is not easily decided. * Time for development To spare dedicated span of time for development is getting difficult for core developers as the project is running as a volunteer bases and they have their own jobs (not students with unlimited time any more...) Anyway, I'll try to release more often! * I will release 1.1.1, 1.1.2, ... as soon as the critical bugs are found and fixed. * We need to fix goals (todo items) for the 1.2 release. Thanks, Toshiaki from conference room of BOSC2007 day2 On 2007/07/20, at 6:21, Trevor Wennblom wrote: > Hey guys, > > Good job on getting version 1.1.0 out there! > > What's the most difficult part of getting these releases ready? Is > there a way that we could automate it to make life easier? How > difficult would it be to have regular minor-revision releases? (say > 1.1.1, 1.1.2, etc) > > Trevor > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From aidanfindlater at gmail.com Fri Jul 20 18:54:43 2007 From: aidanfindlater at gmail.com (Aidan Findlater) Date: Fri, 20 Jul 2007 14:54:43 -0400 Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's Bio::DB::Flat Message-ID: *Summary:* Attached is a diff that allows Bio::FlatFileIndex to access BDB flatfile databases created by BioPerl. I have not changed the way BioRuby creates its databases, so this likely breaks access to BioRuby-created flatfiles. *Description:* I have some flatfile databases that were created with BioPerl, but it seems that BioRuby does things a little differently. Specifically, BioRuby tries to get config and fileid information from BDB databases; BioPerl stores this information in config.dat. As well, it returns sequences shifted one character to the right (the '>' from my FASTA file was at the end of the returned sequence, and none was at the beginning). I've hacked it up so that it works for me. If anyone else is having this problem, the diff from my changes is attached below. Sample usage: Bio::FlatFileIndex.open('/path/to/the/database/directory') do |db| p db.search("SPAC11H11.06") # My favourite pombe gene! end Now I just have to figure out what to do with the Bio::FlatFileIndex::Results mess that is returned... Aidan Findlater Index: bioruby/lib/bio/io/flatfile/index.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/index.rb,v retrieving revision 1.19 diff -r1.19 index.rb 561c561 < seek(pos, IO::SEEK_SET) --- > seek(pos-1, IO::SEEK_SET) 1147,1148c1147,1148 < @config = BDBwrapper.new(@dbname, 'config') < @bdb_fileids = BDBwrapper.new(@dbname, 'fileids') --- > @config = hash.reject{|k,v| k.include?("fileid_") } > @bdb_fileids = hash.reject{|k,v| !k.include?("fileid_") } 1196,1199d1195 < @config.close < @config.open(*bdbarg) < @bdb_fileids.close < @bdb_fileids.open(*bdbarg) 1229,1232d1224 < if @bdb then < @config.close < @bdb_fileids.close < end 1287c1279 < @fileids = FileIDs.new('', @bdb_fileids) --- > @fileids = FileIDs.new('fileid_', @bdb_fileids) From ngoto at gen-info.osaka-u.ac.jp Sun Jul 22 10:25:00 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 22 Jul 2007 19:25:00 +0900 Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's Bio::DB::Flat In-Reply-To: References: Message-ID: <20070722102500.DC62E1CBC412@idnmail.gen-info.osaka-u.ac.jp> Hello, I'm a maintainer of Bio::FlatFileIndex in bioruby. On Fri, 20 Jul 2007 14:54:43 -0400 "Aidan Findlater" wrote: > *Summary:* Attached is a diff that allows Bio::FlatFileIndex to access BDB > flatfile databases created by BioPerl. I have not changed the way BioRuby > creates its databases, so this likely breaks access to BioRuby-created > flatfiles. > > > *Description:* I have some flatfile databases that were created with > BioPerl, but it seems that BioRuby does things a little differently. > Specifically, BioRuby tries to get config and fileid information from BDB > databases; BioPerl stores this information in config.dat. The OBDA flat-file indexing specification (*1) says that configiguration data is stored in the BDB database, not config.dat. (excerpted from indexing.txt (*1)) | 2) The subdirectory contains a file named "config.dat" containing tab | separated key/value pairs. The first line contains the key "index" | and value "index\tBerkeleyDB/1". This means the first few characters | of the config.dat file is "index\tBerkeleyDB/1\n". | | There is no other data in this file. | | 3) Global configuration data is stored in the database named "config". The specification text was last modified in 5 years ago, and it might have been changed in somewhere I don't know. Does someone know changes of specifications, or how to get new specification text? *1 http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/flatfile/indexing.txt?rev=1.3&cvsroot=obf-common&content-type=text/vnd.viewcvs-markup > As well, it returns sequences shifted one character to the right (the '>' > from my FASTA file was at the end of the returned sequence, and none was at > the beginning). I suppose this is BioPerl's indexer's issue. I prepared the file /tmp/flat/tmp.fst as below. ----------------------------------------------------------- >TEST00001 EOL aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >TEST00002 EOL ccccccccccccccccccccccccccccccccccccccccccccccccc >TEST00003 EOL ggggggggggggggggggggggggggggggggggggggggggggggggg >TEST00004 EOL ttttttttttttttttttttttttttttttttttttttttttttttttt ----------------------------------------------------------- (Each line of the above file is 50 byte in UNIX). % bp_bioflat_index.pl --create --format fasta \ --location /tmp/flat --dbname testbdb --indextype bdb \ /tmp/flat/tmp.fst Then, I confirmed the contents of generated BDB data. % ruby -r bdb -e 'BDB::Btree.open("/tmp/flat/testbdb/key_ACC").to_a.sort.each { |x| puts x.join("\t") }' TEST00001 0 0 101 TEST00002 0 101 100 TEST00003 0 201 100 TEST00004 0 301 99 (Each column shows ID, FileID, start position, and size.) The start positions of TEST00002, TEST00003, and TEST00004 are wrong, and the size of TEST00001 and TEST00004 is wrong. I'm using BioPerl 1.5.2_102. % perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.005002102 In addition, I also tried flat database. % bp_bioflat_index.pl --create --format fasta \ --location /tmp/flat --dbname testflat --indextype flat \ /tmp/flat/tmp.fst % cat testflat2/key_ACC.key 19TEST00001 0 0 100 TEST00002 0 100 100TEST00003 0 200 100TEST00004 0 300 50 It sesms that the index is correctly created. However, according to the specification (*1), the first 4 bytes of the key_ACC.key file should be "0019", but was " 19" in the above index created with BioPerl. (excerpted from indexing.txt (*1)) | Each record of this file is in a fixed width format. There is no | special termination character. Instead, the first four bytes of the | file contain the mapping record size, in bytes, represented as text | string. The string is left padded with zeros to fit in four bytes, so | the allowed text strings are "0000", "0001", "0002", ..., "9999". Regards, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org From ngoto at gen-info.osaka-u.ac.jp Sun Jul 22 10:48:42 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 22 Jul 2007 19:48:42 +0900 Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's Bio::DB::Flat References: Message-ID: <20070722104843.212521CBC412@idnmail.gen-info.osaka-u.ac.jp> On Sun, 22 Jul 2007 19:25:00 +0900 Naohisa GOTO wrote: > In addition, I also tried flat database. > > % bp_bioflat_index.pl --create --format fasta \ > --location /tmp/flat --dbname testflat --indextype flat \ > /tmp/flat/tmp.fst > > % cat testflat2/key_ACC.key This is my typo. I meant % cat /tmp/flat/testflat/key_ACC.key > 19TEST00001 0 0 100 TEST00002 0 100 100TEST00003 0 200 100TEST00004 0 300 50 > > It sesms that the index is correctly created. The index was not correct. The size of TEST00004 is misrecognized as 50 (should be 100). I think this is also a bug in BioPerl. Regards, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org