From ktym at hgc.jp Mon Jul 9 07:57:18 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Mon, 9 Jul 2007 20:57:18 +0900
Subject: [BioRuby] Preparing for 1.1 release
Message-ID: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp>
Hi all,
Finally, I'm preparing for the BioRuby 1.1 release.
Developers, are you ready for the next release?
* If you have modules still working on, please let me know ASAP.
- Which module should be excluded in the next release?
- When will you finish and commit the final version?
* If you have not filled the ChangeLog file, please document it now.
I hope to pack this weekend.
Regards,
Toshiaki
From mikael.borg at utoronto.ca Mon Jul 9 16:00:47 2007
From: mikael.borg at utoronto.ca (Mikael Borg)
Date: Mon, 09 Jul 2007 16:00:47 -0400
Subject: [BioRuby] Preparing for 1.1 release
In-Reply-To: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp>
References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp>
Message-ID: <1184011247.19558.44.camel@localhost.localdomain>
On Mon, 2007-09-07 at 20:57 +0900, Toshiaki Katayama wrote:
> Hi all,
>
> Finally, I'm preparing for the BioRuby 1.1 release.
>
> Developers, are you ready for the next release?
>
> * If you have modules still working on, please let me know ASAP.
> - Which module should be excluded in the next release?
> - When will you finish and commit the final version?
>
> * If you have not filled the ChangeLog file, please document it now.
>
> I hope to pack this weekend.
>
> Regards,
> Toshiaki
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
There are still a few bugs in the pdb parser. I have tried to correct
the ones I've found (see below), but as I find the original code
difficult to understand, I might have introduced new bugs. Maybe you can
have a look and either use my suggested changes, or come up with other
solutions?
Cheers,
Mikael
1. empty records causes parser to crash through
Bio::PDB::Record.Pdb_LString(nil).
Solution: if empty record, make empty string String.new('').
2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that
doesn't contain any sheets, the parser crashes.
Solution: return nil if there are no sheets in structure
# diff -u ~mborg/tmp/bioruby-1.1.0-pre4/lib/bio/db/pdb/pdb.rb pdb.rb
--- /home/mborg/tmp/bioruby-1.1.0-pre4/lib/bio/db/pdb/pdb.rb
2007-04-19 09:59:29.000000000 -0400
+++ pdb.rb 2007-07-09 14:44:01.000000000 -0400
@@ -119,7 +119,11 @@
m
end
def self.new(str)
- String.new(str)
+ if str.nil?
+ String.new('')
+ else
+ String.new(str)
+ end
end
end
@@ -1755,6 +1759,7 @@
# If sheetID is given, it returns an array of
# Bio::PDB::Record::SHEET instances.
def sheet(sheetID = nil)
+ return nil unless @sheet
unless defined?(@sheet)
@sheet = make_grouping(self.record('SHEET'), :sheetID)
end
From ngoto at gen-info.osaka-u.ac.jp Tue Jul 10 06:40:10 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 10 Jul 2007 19:40:10 +0900
Subject: [BioRuby] Preparing for 1.1 release
In-Reply-To: <1184011247.19558.44.camel@localhost.localdomain>
References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp>
<1184011247.19558.44.camel@localhost.localdomain>
Message-ID: <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp>
Hi,
On Mon, 09 Jul 2007 16:00:47 -0400
Mikael Borg wrote:
> There are still a few bugs in the pdb parser. I have tried to correct
> the ones I've found (see below), but as I find the original code
> difficult to understand, I might have introduced new bugs. Maybe you can
> have a look and either use my suggested changes, or come up with other
> solutions?
>
> Cheers,
>
> Mikael
>
> 1. empty records causes parser to crash through
> Bio::PDB::Record.Pdb_LString(nil).
> Solution: if empty record, make empty string String.new('').
Thank you for bug report.
I changed "str" to "str.to_s" to fix the bug.
> 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that
> doesn't contain any sheets, the parser crashes.
> Solution: return nil if there are no sheets in structure
The same or similar error could also be occurred for REMARK (remark),
JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet),
SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords),
AUTHOR (authors), HEADER (entry_id, accession, classification),
TITLE (definition), and REVDAT (version) records (methods).
This is mostly caused by the Bio::PDB#record method which
returned nil when the specified record did not exist.
I changed it to return an empty array for nonexistent records.
All of the above bugs are now fixed and committed into CVS.
For your convenience, patch is attached below.
Thanks,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org
-------------------------------------------------------------------
--- lib/bio/db/pdb/pdb.rb 19 Apr 2007 13:59:29 -0000 1.22
+++ lib/bio/db/pdb/pdb.rb 10 Jul 2007 10:17:38 -0000
@@ -119,7 +119,7 @@
m
end
def self.new(str)
- String.new(str)
+ String.new(str.to_s)
end
end
@@ -1674,7 +1674,7 @@
# p pdb.record['HETATM']
#
def record(name = nil)
- name ? @hash[name] : @hash
+ name ? (@hash[name] || []) : @hash
end
#--
@@ -1837,12 +1837,13 @@
# Classification in "HEADER".
def classification
- self.record('HEADER').first.classification
+ f = self.record('HEADER').first
+ f ? f.classification : nil
end
# Get authors in "AUTHOR".
def authors
- self.record('AUTHOR').first.authorList
+ self.record('AUTHOR').collect { |f| f.authorList }.flatten
end
#--
@@ -1851,7 +1852,10 @@
# PDB identifier written in "HEADER". (e.g. 1A00)
def entry_id
- @id = self.record('HEADER').first.idCode unless @id
+ unless @id
+ f = self.record('HEADER').first
+ @id = f ? f.idCode : nil
+ end
@id
end
@@ -1862,12 +1866,14 @@
# Title of this entry in "TITLE".
def definition
- self.record('TITLE').first.title
+ f = self.record('TITLE').first
+ f ? f.title : nil
end
# Current modification number in "REVDAT".
def version
- self.record('REVDAT').first.modNum
+ f = self.record('REVDAT').first
+ f ? f.modNum : nil
end
end #class PDB
-------------------------------------------------------------------
From mikael.borg at utoronto.ca Tue Jul 10 10:58:48 2007
From: mikael.borg at utoronto.ca (Mikael Borg)
Date: Tue, 10 Jul 2007 10:58:48 -0400
Subject: [BioRuby] Preparing for 1.1 release
In-Reply-To: <20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp>
References: <2F9C761F-52B9-4F8A-9314-91024DE1B71A@hgc.jp>
<1184011247.19558.44.camel@localhost.localdomain>
<20070710104012.C36341CBC4F7@idnmail.gen-info.osaka-u.ac.jp>
Message-ID: <1184079528.16555.18.camel@localhost.localdomain>
On Tue, 2007-10-07 at 19:40 +0900, Naohisa GOTO wrote:
> Hi,
>
> On Mon, 09 Jul 2007 16:00:47 -0400
> Mikael Borg wrote:
>
> > There are still a few bugs in the pdb parser. I have tried to correct
> > the ones I've found (see below), but as I find the original code
> > difficult to understand, I might have introduced new bugs. Maybe you can
> > have a look and either use my suggested changes, or come up with other
> > solutions?
> >
> > Cheers,
> >
> > Mikael
> >
> > 1. empty records causes parser to crash through
> > Bio::PDB::Record.Pdb_LString(nil).
> > Solution: if empty record, make empty string String.new('').
>
> Thank you for bug report.
> I changed "str" to "str.to_s" to fix the bug.
>
> > 2. if calling method sheet (Bio::PDB) for a Bio::PDB structure that
> > doesn't contain any sheets, the parser crashes.
> > Solution: return nil if there are no sheets in structure
>
> The same or similar error could also be occurred for REMARK (remark),
> JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet),
> SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords),
> AUTHOR (authors), HEADER (entry_id, accession, classification),
> TITLE (definition), and REVDAT (version) records (methods).
>
> This is mostly caused by the Bio::PDB#record method which
> returned nil when the specified record did not exist.
> I changed it to return an empty array for nonexistent records.
>
> All of the above bugs are now fixed and committed into CVS.
> For your convenience, patch is attached below.
>
> Thanks,
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org
>
> -------------------------------------------------------------------
> --- lib/bio/db/pdb/pdb.rb 19 Apr 2007 13:59:29 -0000 1.22
> +++ lib/bio/db/pdb/pdb.rb 10 Jul 2007 10:17:38 -0000
> @@ -119,7 +119,7 @@
> m
> end
> def self.new(str)
> - String.new(str)
> + String.new(str.to_s)
> end
> end
>
> @@ -1674,7 +1674,7 @@
> # p pdb.record['HETATM']
> #
> def record(name = nil)
> - name ? @hash[name] : @hash
> + name ? (@hash[name] || []) : @hash
> end
>
> #--
> @@ -1837,12 +1837,13 @@
>
> # Classification in "HEADER".
> def classification
> - self.record('HEADER').first.classification
> + f = self.record('HEADER').first
> + f ? f.classification : nil
> end
>
> # Get authors in "AUTHOR".
> def authors
> - self.record('AUTHOR').first.authorList
> + self.record('AUTHOR').collect { |f| f.authorList }.flatten
> end
>
> #--
> @@ -1851,7 +1852,10 @@
>
> # PDB identifier written in "HEADER". (e.g. 1A00)
> def entry_id
> - @id = self.record('HEADER').first.idCode unless @id
> + unless @id
> + f = self.record('HEADER').first
> + @id = f ? f.idCode : nil
> + end
> @id
> end
>
> @@ -1862,12 +1866,14 @@
>
> # Title of this entry in "TITLE".
> def definition
> - self.record('TITLE').first.title
> + f = self.record('TITLE').first
> + f ? f.title : nil
> end
>
> # Current modification number in "REVDAT".
> def version
> - self.record('REVDAT').first.modNum
> + f = self.record('REVDAT').first
> + f ? f.modNum : nil
> end
>
> end #class PDB
> -------------------------------------------------------------------
Thank you for taking care of this so fast, great job!
Have you considered adding an optional argument to Bio::PDB.new, so that
it would be possible to prevent parsing parts of the pdb info, e.g.
remarks/hydrogen atoms/water molecules? The parser is using a lot of
memory, especially when calling Bio::PDB.inspect so that every record is
parsed. Maybe something for the next version, after 1.1 is done?
/Mikael
From ktym at hgc.jp Mon Jul 16 14:14:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Tue, 17 Jul 2007 03:14:34 +0900
Subject: [BioRuby] A couple of changes to DAS...
In-Reply-To: <466732A5.3060802@cs.man.ac.uk>
References: <466732A5.3060802@cs.man.ac.uk>
Message-ID:
Hi Dave,
I'm very sorry that I have missed your contribution.
I appreciate your fixes and congratulations to your Python version.
On 2007/06/07, at 7:18, Dave Thorne wrote:
> I have just spent a successful couple of hours porting the latest bio/io/das.rb file to Python (their DAS support is rather meagre). During the process I found a couple of lines in the original ruby module that I think contain mistakes. I have attached an appropriate diff file. The two small changes are as follows:
>
> line 71:
> dsn.mapmaster = e.name
> should be (?):
> dsn.mapmaster = e.text
I've just committed this.
> line 97:
> segment.stop = e.attributes['orientation']
> should be:
> segment.orientation = e.attributes['orientation']
This had already been fixed in the repository.
Thank you!
Regards,
Toshiaki Katayama
From trevor at corevx.com Thu Jul 19 17:21:26 2007
From: trevor at corevx.com (Trevor Wennblom)
Date: Thu, 19 Jul 2007 16:21:26 -0500
Subject: [BioRuby] v1.1
Message-ID:
Hey guys,
Good job on getting version 1.1.0 out there!
What's the most difficult part of getting these releases ready? Is
there a way that we could automate it to make life easier? How
difficult would it be to have regular minor-revision releases? (say
1.1.1, 1.1.2, etc)
Trevor
From ktym at hgc.jp Thu Jul 19 12:07:31 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Fri, 20 Jul 2007 01:07:31 +0900
Subject: [BioRuby] BioRuby 1.1 released in BOSC2007 presentation
Message-ID:
Hi all,
I have finally released the BioRuby 1.1 at
http://bioruby.org/archive/bioruby-1.1.0.tar.gz
and gem package is also available at
http://rubyforge.org/projects/bioruby/
I also put my presentation of BOSC 2007 held today
http://bioruby.org/archive/doc/BR070719-bosc.pdf
Enjoy!
Toshiaki Katayama
From ktym at hgc.jp Fri Jul 20 04:01:18 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Fri, 20 Jul 2007 17:01:18 +0900
Subject: [BioRuby] v1.1
In-Reply-To:
References:
Message-ID:
Hi,
This time, we have challenged difficult tasks including
- changing license
- rdoc formatting
- phyloinformatics
- attempt for rails integration
etc. and these might lead the delay of the release
as estimating how long they may take to be stabilized
was not predictable.
Several other reasons I guess:
* Targetting priority
We have a lot of items in our todo list, but what should be done
before the next release is not easily decided.
* Time for development
To spare dedicated span of time for development is getting difficult
for core developers as the project is running as a volunteer bases
and they have their own jobs (not students with unlimited time any more...)
Anyway, I'll try to release more often!
* I will release 1.1.1, 1.1.2, ... as soon as the critical bugs are found and fixed.
* We need to fix goals (todo items) for the 1.2 release.
Thanks,
Toshiaki from conference room of BOSC2007 day2
On 2007/07/20, at 6:21, Trevor Wennblom wrote:
> Hey guys,
>
> Good job on getting version 1.1.0 out there!
>
> What's the most difficult part of getting these releases ready? Is
> there a way that we could automate it to make life easier? How
> difficult would it be to have regular minor-revision releases? (say
> 1.1.1, 1.1.2, etc)
>
> Trevor
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
From aidanfindlater at gmail.com Fri Jul 20 14:54:43 2007
From: aidanfindlater at gmail.com (Aidan Findlater)
Date: Fri, 20 Jul 2007 14:54:43 -0400
Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with BioPerl's
Bio::DB::Flat
Message-ID:
*Summary:* Attached is a diff that allows Bio::FlatFileIndex to access BDB
flatfile databases created by BioPerl. I have not changed the way BioRuby
creates its databases, so this likely breaks access to BioRuby-created
flatfiles.
*Description:* I have some flatfile databases that were created with
BioPerl, but it seems that BioRuby does things a little differently.
Specifically, BioRuby tries to get config and fileid information from BDB
databases; BioPerl stores this information in config.dat.
As well, it returns sequences shifted one character to the right (the '>'
from my FASTA file was at the end of the returned sequence, and none was at
the beginning).
I've hacked it up so that it works for me. If anyone else is having this
problem, the diff from my changes is attached below. Sample usage:
Bio::FlatFileIndex.open('/path/to/the/database/directory') do |db|
p db.search("SPAC11H11.06") # My favourite pombe gene!
end
Now I just have to figure out what to do with
the Bio::FlatFileIndex::Results mess that is returned...
Aidan Findlater
Index: bioruby/lib/bio/io/flatfile/index.rb
===================================================================
RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/index.rb,v
retrieving revision 1.19
diff -r1.19 index.rb
561c561
< seek(pos, IO::SEEK_SET)
---
> seek(pos-1, IO::SEEK_SET)
1147,1148c1147,1148
< @config = BDBwrapper.new(@dbname, 'config')
< @bdb_fileids = BDBwrapper.new(@dbname, 'fileids')
---
> @config = hash.reject{|k,v| k.include?("fileid_") }
> @bdb_fileids = hash.reject{|k,v| !k.include?("fileid_") }
1196,1199d1195
< @config.close
< @config.open(*bdbarg)
< @bdb_fileids.close
< @bdb_fileids.open(*bdbarg)
1229,1232d1224
< if @bdb then
< @config.close
< @bdb_fileids.close
< end
1287c1279
< @fileids = FileIDs.new('', @bdb_fileids)
---
> @fileids = FileIDs.new('fileid_', @bdb_fileids)
From ngoto at gen-info.osaka-u.ac.jp Sun Jul 22 06:25:00 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Sun, 22 Jul 2007 19:25:00 +0900
Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with
BioPerl's Bio::DB::Flat
In-Reply-To:
References:
Message-ID: <20070722102500.DC62E1CBC412@idnmail.gen-info.osaka-u.ac.jp>
Hello,
I'm a maintainer of Bio::FlatFileIndex in bioruby.
On Fri, 20 Jul 2007 14:54:43 -0400
"Aidan Findlater" wrote:
> *Summary:* Attached is a diff that allows Bio::FlatFileIndex to access BDB
> flatfile databases created by BioPerl. I have not changed the way BioRuby
> creates its databases, so this likely breaks access to BioRuby-created
> flatfiles.
>
>
> *Description:* I have some flatfile databases that were created with
> BioPerl, but it seems that BioRuby does things a little differently.
> Specifically, BioRuby tries to get config and fileid information from BDB
> databases; BioPerl stores this information in config.dat.
The OBDA flat-file indexing specification (*1) says that
configiguration data is stored in the BDB database, not config.dat.
(excerpted from indexing.txt (*1))
| 2) The subdirectory contains a file named "config.dat" containing tab
| separated key/value pairs. The first line contains the key "index"
| and value "index\tBerkeleyDB/1". This means the first few characters
| of the config.dat file is "index\tBerkeleyDB/1\n".
|
| There is no other data in this file.
|
| 3) Global configuration data is stored in the database named "config".
The specification text was last modified in 5 years ago,
and it might have been changed in somewhere I don't know.
Does someone know changes of specifications,
or how to get new specification text?
*1 http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/flatfile/indexing.txt?rev=1.3&cvsroot=obf-common&content-type=text/vnd.viewcvs-markup
> As well, it returns sequences shifted one character to the right (the '>'
> from my FASTA file was at the end of the returned sequence, and none was at
> the beginning).
I suppose this is BioPerl's indexer's issue.
I prepared the file /tmp/flat/tmp.fst as below.
-----------------------------------------------------------
>TEST00001 EOL
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>TEST00002 EOL
ccccccccccccccccccccccccccccccccccccccccccccccccc
>TEST00003 EOL
ggggggggggggggggggggggggggggggggggggggggggggggggg
>TEST00004 EOL
ttttttttttttttttttttttttttttttttttttttttttttttttt
-----------------------------------------------------------
(Each line of the above file is 50 byte in UNIX).
% bp_bioflat_index.pl --create --format fasta \
--location /tmp/flat --dbname testbdb --indextype bdb \
/tmp/flat/tmp.fst
Then, I confirmed the contents of generated BDB data.
% ruby -r bdb -e 'BDB::Btree.open("/tmp/flat/testbdb/key_ACC").to_a.sort.each { |x| puts x.join("\t") }'
TEST00001 0 0 101
TEST00002 0 101 100
TEST00003 0 201 100
TEST00004 0 301 99
(Each column shows ID, FileID, start position, and size.)
The start positions of TEST00002, TEST00003, and TEST00004
are wrong, and the size of TEST00001 and TEST00004 is wrong.
I'm using BioPerl 1.5.2_102.
% perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.005002102
In addition, I also tried flat database.
% bp_bioflat_index.pl --create --format fasta \
--location /tmp/flat --dbname testflat --indextype flat \
/tmp/flat/tmp.fst
% cat testflat2/key_ACC.key
19TEST00001 0 0 100 TEST00002 0 100 100TEST00003 0 200 100TEST00004 0 300 50
It sesms that the index is correctly created.
However, according to the specification (*1),
the first 4 bytes of the key_ACC.key file should be "0019",
but was " 19" in the above index created with BioPerl.
(excerpted from indexing.txt (*1))
| Each record of this file is in a fixed width format. There is no
| special termination character. Instead, the first four bytes of the
| file contain the mapping record size, in bytes, represented as text
| string. The string is left padded with zeros to fit in four bytes, so
| the allowed text strings are "0000", "0001", "0002", ..., "9999".
Regards,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org
From ngoto at gen-info.osaka-u.ac.jp Sun Jul 22 06:48:42 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Sun, 22 Jul 2007 19:48:42 +0900
Subject: [BioRuby] BioRuby's Bio::FlatFileIndex compatibility with
BioPerl's Bio::DB::Flat
References:
Message-ID: <20070722104843.212521CBC412@idnmail.gen-info.osaka-u.ac.jp>
On Sun, 22 Jul 2007 19:25:00 +0900
Naohisa GOTO wrote:
> In addition, I also tried flat database.
>
> % bp_bioflat_index.pl --create --format fasta \
> --location /tmp/flat --dbname testflat --indextype flat \
> /tmp/flat/tmp.fst
>
> % cat testflat2/key_ACC.key
This is my typo. I meant
% cat /tmp/flat/testflat/key_ACC.key
> 19TEST00001 0 0 100 TEST00002 0 100 100TEST00003 0 200 100TEST00004 0 300 50
>
> It sesms that the index is correctly created.
The index was not correct.
The size of TEST00004 is misrecognized as 50 (should be 100).
I think this is also a bug in BioPerl.
Regards,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org