From bugzilla-daemon at portal.open-bio.org Sun Apr 1 13:03:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Apr 2007 13:03:15 -0400
Subject: [Biojava-dev] [Bug 2253] NullPointerException in
MultiSourceCompoundRichLocation
In-Reply-To:
Message-ID: <200704011703.l31H3FTF011220@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2253
gwaldon at geneinfinity.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|VERIFIED |CLOSED
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From russ at kepler-eng.com Mon Apr 2 14:59:26 2007
From: russ at kepler-eng.com (Russ Kepler)
Date: Mon, 2 Apr 2007 12:59:26 -0600
Subject: [Biojava-dev] Changing the sample name of the ABI file
In-Reply-To:
References:
Message-ID: <200704021259.26814.russ@kepler-eng.com>
On Tuesday 13 February 2007 00:01, Lee Heewook wrote:
> Is there way to change the sample name of the ABI file?
I'm not sure what're you're asking. There's a sample name field in the file
in SMPL, to rewrite that you'd have to rewrite pretty much the whole file.
Most of the time I found it easier to re-export the data from the instrument.
But if you're parsing the name of the file for info (frequently done as
grubbing in the file is a PITA) then you can usually simply change the file
name.
From gwaldon at geneinfinity.org Mon Apr 2 19:56:09 2007
From: gwaldon at geneinfinity.org (george waldon)
Date: Mon, 02 Apr 2007 16:56:09 -0700
Subject: [Biojava-dev] Isoelectric point calculation
Message-ID: <20070402235609.52369.qmail@mmm1924.dulles19-verio.com>
Hi,
Trying to solve the problem of symbol ambiguity in pI calculation that was brought to our attention on Biojava-1, I found a few problems, in particular (!) calculated pI values are incorrect, BinarySearch throw exceptions, and the ResidueProperties.xml has some strange values, such as pK of Glu at -4.25.
The class IsoelectricPointCalc was written a long time ago and I hope to get in touch with the original author and have a corrected code rapidly.
As a general rule, scientific biomethods and biodata put in biojava need precise literature references. Javadocs are a good place for that.
- George
From mark.schreiber at novartis.com Mon Apr 2 21:03:20 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 3 Apr 2007 09:03:20 +0800
Subject: [Biojava-dev] reading a subsequence from a .nib file
Message-ID:
Hi -
Too my knowledge nothing like this exists in BioJava. Could someone take
it the last mile and make it produce SymbolLists?
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
Josh Burdick
Sent by: biojava-dev-bounces at lists.open-bio.org
01/23/2007 12:29 AM
To: biojava-dev at lists.open-bio.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] reading a subsequence from a .nib file
I wrote some code to read a chunk of DNA sequence from a file in Jim
Kent's blat ".nib" file format. This is a simple format using four
bits/base.
I didn't attach the code, to avoid spamming the whole list; but it,
and a (very crude!) JUnit test, are at
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
You could use 2 bits/base, but then you can't have ambiguous bases. 4
bits/base seems like a reasonable compromise; plus sites that have
"blat" installed will need to have the .nib files on a server somewhere
anyway, and this way repeat-masking can be included, which may be
convenient.
Also, it doesn't support writing a .nib file; again, presumably people
will be using Jim Kent's faToNib program to do that.
It would need some tweaking to be included in BioJava, because it
returns a plain String of ACGT, instead of a PackedSequence object.
(Probably this would just involve rewriting the setupBuffer() and
addToBuffer() methods in the code.) Also, the coordinate information
could come from a Range object.
If similar code is already somewhere in BioJava, please ignore this;
but I couldn't find it with thirty seconds of Googling, so I figured it
hadn't been written...
Josh Burdick
programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia
jburdick at keyfitz.org
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From mark.schreiber at novartis.com Mon Apr 2 21:06:11 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 3 Apr 2007 09:06:11 +0800
Subject: [Biojava-dev] Isoelectric point calculation
Message-ID:
Hi George -
This probably should be reported as a bug in bugzilla to make sure we get
around to fixing it.
Thanks,
- Mark
"george waldon"
Sent by: biojava-dev-bounces at lists.open-bio.org
04/03/2007 07:56 AM
Please respond to george waldon
To: biojava-dev at biojava.org
cc: smh1008 at cam.ac.uk, (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] Isoelectric point calculation
Hi,
Trying to solve the problem of symbol ambiguity in pI calculation that was
brought to our attention on Biojava-1, I found a few problems, in
particular (!) calculated pI values are incorrect, BinarySearch throw
exceptions, and the ResidueProperties.xml has some strange values, such as
pK of Glu at -4.25.
The class IsoelectricPointCalc was written a long time ago and I hope to
get in touch with the original author and have a corrected code rapidly.
As a general rule, scientific biomethods and biodata put in biojava need
precise literature references. Javadocs are a good place for that.
- George
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From mark.schreiber at novartis.com Mon Apr 2 21:09:10 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 3 Apr 2007 09:09:10 +0800
Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG
Message-ID:
Hi Caroline -
Could you post some example code that we could use to replicate the
problem?
Thanks.
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
"Caroline Renaux"
Sent by: biojava-dev-bounces at lists.open-bio.org
03/26/2007 09:18 PM
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG
Bonjour,
j'ai r?cemment utilis? le Package org.biojava.bio.symbol et plus
particuli?rement la classe UkkonenSuffixTree. Cependant lorsque que je
veux
ajouter un ensemble de s?quences ? l'arbre et que je les s?pares par le
caract?re de s?paration '$' cel? ne fonctionne pas. Lorsqu'il traite la
seconde s?quence j'obtiens une erreur "NullPointerException" dans la
m?thode
jumpTo ? la ligne :
arrivedAt=(SuffixNode)currentNode.children.get(*new*
Character(source.charAt
(from)));
Je ne comprend pas ce que j'aurai pu faire de travers.
D'avance merci de votre r?ponse.
RENAUX C.
--------------------------------
Hello,
I used for a java application the org.biojava.bio.symbol package and
particularly the UkkonenSuffixTree class. When i want to add a set of
sequences to the tree, i add a '$' between the sequences but it doesn't
work. I have a NullPointerException when the system add the second
sequence
int the method jumTo at the line :
arrivedAt=(SuffixNode)currentNode.children.get(*new*
Character(source.charAt
(from)));
I don't understand why it doesn't work.
Thank you in advance.
RENAUX C.
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 00:03:53 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 00:03:53 -0400
Subject: [Biojava-dev] [Bug 2244] uniprot files do not load
In-Reply-To:
Message-ID: <200704030403.l3343rrF032035@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2244
------- Comment #9 from gwaldon at geneinfinity.org 2007-04-03 00:03 EST -------
*** Bug 2249 has been marked as a duplicate of this bug. ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 03:12:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 03:12:48 -0400
Subject: [Biojava-dev] [Bug 2258] New: ConcurrentModificationException in
SimpleRichAnnotation
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2258
Summary: ConcurrentModificationException in SimpleRichAnnotation
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: seq
AssignedTo: biojava-dev at biojava.org
ReportedBy: gwaldon at geneinfinity.org
Exception thrown by the method clear(), apparently resulting of trying to
change the note set while iterating over it.
- George
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From holland at ebi.ac.uk Tue Apr 3 06:00:01 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 03 Apr 2007 11:00:01 +0100
Subject: [Biojava-dev] JDBCPooledDataSource regression
In-Reply-To: <45C090A6.10909@ebi.ac.uk>
References: <416B41DF-91E1-4D1F-A4B4-799FE712B032@sanger.ac.uk> <45C0781B.7030304@ebi.ac.uk> <5DDBC6F5-7DC3-4446-A982-3CD9B3931A06@sanger.ac.uk> <45C08639.7080600@ebi.ac.uk> <8689C307-0643-46D9-90A6-A9958681D1D0@sanger.ac.uk> <45C08B7A.3060102@ebi.ac.uk>
<45C08E4E.2050308@ebi.ac.uk> <45C090A6.10909@ebi.ac.uk>
Message-ID: <461225A1.4000705@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Great stuff. You should commit it when you get your CVS account. :)
There's one or two typos (can't spell deprecated!) but I'm sure once you
get the rest of BioJava into Eclipse or something to make this change
permanent they'll show up.
cheers,
Richard
Andy Yates wrote:
> Okay I've attached the fix here.
>
> I just did this in a text editor but I believe that the imports are
> okay. If you can just do a quick scan as well to make sure I haven't
> deleted anything that was very important.
>
> I'll get on to the helpdesk now as well :)
>
> Andy
>
> Richard Holland wrote:
> Andy could you make the change to your local copy of the source file and
> email the file to me, that way I can make sure I don't get it wrong when
> I commit it.
>
> Richard.
>
> PS. You should probably have your own CVS account - email the OBF
> helpdesk and ask for one, saying I told you to. :)
>
>
> Andy Yates wrote:
>>>> Thomas Down wrote:
>>>>> On 31 Jan 2007, at 12:06, Andy Yates wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Sorry I was meaning if that if that method just becomes:
>>>>>>
>>>>>> public static DataSource getDataSource(final String driver,
>>>>>> final String url,
>>>>>> final String user,
>>>>>> final String pass)
>>>>>> throws Exception {
>>>>>>
>>>>>> BasicDataSource ds = new BasicDataSource();
>>>>>> ds.setUrl(url);
>>>>>> ds.setDriverClassName(driver);
>>>>>> ds.setUsername(user);
>>>>>> ds.setPassword(pass);
>>>>>> // Set BasicDataSource properties such as maxActive and
>>>>>> maxIdle, as described in
>>>>>> //
>>>>>> http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html
>>>>>>
>>>>>> ds.setMaxActive(10);
>>>>>> ds.setMaxIdle(5);
>>>>>> ds.setMaxWait(10000);
>>>>>>
>>>>>> return ds;
>>>>>> }
>>>>>>
>>>>>> Does that still work?
>>>>> Hmmm, I was assuming that BasicDataSource didn't actually do any
>>>>> pooling itself, and that you needed another layer on top to manage a
>>>>> connection pool -- that seems to be how all previous revisions of
>>>>> JDBCConnectionPool worked, so I guess I wasn't alone in thinking
>>>>> this. But yes, BasicDataSource does seem to do pooling itself
>>>>> (confirmed by reading the source), so maybe your simpler version is
>>>>> a better idea. It certainly works okay for me.
>>>>>
>>>>> Thomas.
>>>> That's what I thought should have happened :). Can I suggest that
>>>> this revised version goes into CVS? Anyone got any objections?
>>>>
>>>> Andy
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
> ------------------------------------------------------------------------
> /*
> * BioJava development code
> *
> * This code may be freely distributed and modified under the
> * terms of the GNU Lesser General Public Licence. This should
> * be distributed with the code. If you do not have a copy,
> * see:
> *
> * http://www.gnu.org/copyleft/lesser.html
> *
> * Copyright for this code is held jointly by the individual
> * authors. These should be listed in @author doc comments.
> *
> * For more information on the BioJava project and its aims,
> * or to join the biojava-l mailing list, visit the home page
> * at:
> *
> * http://www.biojava.org/
> *
> */
> package org.biojava.utils;
> import javax.sql.DataSource;
> import org.apache.commons.dbcp.BasicDataSource;
> import org.apache.commons.dbcp.PoolingDataSource;
> import org.apache.commons.pool.ObjectPool;
> /**
> * Returns a DataSource that implements connection pooling
> *
> * Uses Jakarta Commons DBCP and Pool packages.
> * See the description of the dbcp package at
> * http://jakarta.apache.org/commons/dbcp/api/overview-summary.html#overview_description
> *
> * @author Simon Foote
> * @author Len Trigg
> */
> public class JDBCPooledDataSource {
> public static DataSource getDataSource(final String driver,
> final String url,
> final String user,
> final String pass)
> throws Exception {
> BasicDataSource ds = new BasicDataSource();
> ds.setUrl(url);
> ds.setDriverClassName(driver);
> ds.setUsername(user);
> ds.setPassword(pass);
> // Set BasicDataSource properties such as maxActive and maxIdle, as described in
> // http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html
> ds.setMaxActive(10);
> ds.setMaxIdle(5);
> ds.setMaxWait(10000);
> return dataSource;
> }
> // Adds simple equals and hashcode methods so that we can compare if
> // two connections are to the same database. This will fail if the
> // DataSource is redirected to another database etc (I doubt this is
> // ever likely to be used).
> /**
> * @depercated This is no longer used in favor of {@link BasicDataSource}
> * from DBCP
> */
> static class MyPoolingDataSource extends PoolingDataSource {
> final String source;
> public MyPoolingDataSource(ObjectPool connectionPool, String source) {
> super(connectionPool);
> this.source = source;
> }
> public boolean equals(Object o2) {
> if ((o2 == null) || !(o2 instanceof MyPoolingDataSource)) {
> return false;
> }
> MyPoolingDataSource b2 = (MyPoolingDataSource) o2;
> return source.equals(b2.source);
> }
> public int hashCode() {
> return source.hashCode();
> }
> }
> public static void main(String[] args) {
> try {
> DataSource ds1 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", "");
> DataSource ds2 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", "");
> System.err.println(ds1);
> System.err.println(ds2);
> System.err.println(ds1.equals(ds2));
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
> }
> ------------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGEiWh4C5LeMEKA/QRAgYbAJ4yoE6dsLuOOS8sg1wOCybV6rsNUwCeN0c8
oiFz/0yblV4P8a35RbU+nDM=
=imiK
-----END PGP SIGNATURE-----
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 06:05:22 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 06:05:22 -0400
Subject: [Biojava-dev] [Bug 2038] test bug
In-Reply-To:
Message-ID: <200704031005.l33A5M9N015780@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2038
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |INVALID
------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:05 EST -------
This has been lying around for ages, thought I'd tidy it up. :)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 06:20:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 06:20:48 -0400
Subject: [Biojava-dev] [Bug 2258] ConcurrentModificationException in
SimpleRichAnnotation
In-Reply-To:
Message-ID: <200704031020.l33AKmpP016736@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2258
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:20 EST -------
Fixed later today in CVS. Test also added.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 06:23:54 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 06:23:54 -0400
Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer
In-Reply-To:
Message-ID: <200704031023.l33ANsV1016903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2107
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from holland at ebi.ac.uk 2007-04-03 06:23 EST -------
I have committed Jolyon's changes (or will do so later today). These are
untestable with a JUnit so no test has been added.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Apr 4 01:37:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Apr 2007 01:37:59 -0400
Subject: [Biojava-dev] [Bug 2260] New: Bug in UkkonenSuffixTree
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2260
Summary: Bug in UkkonenSuffixTree
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Linux
Status: NEW
Severity: minor
Priority: P2
Component: symbol
AssignedTo: biojava-dev at biojava.org
ReportedBy: mark.schreiber at novartis.com
There is a bug in the UkkonenSuffixTree when one tries to add concatenated
Strings which are delimited with a $. This doesn't seem to be a problem when
Strings are added individually. A simple work around is to add Strings
individually. The following code causes the bug:
public class Main {
/** Creates a new instance of Main */
public Main() {
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
String seqs = "atcgcgcgcgctcggcctgggggctcgcgct$acgggtggtggt";
UkkonenSuffixTree suff = new UkkonenSuffixTree(seqs);
}
}
Someone with a better knowledge of suffix trees than I have would need to look
at this...
Additionally there are several places in code where variables are declared and
never used or declared globally when they need not be or are not declared final
when they are never modified. There is also System.out.println() statements
that should be messages in exceptions or errors. The code could do with a good
clean up.
I am marking this as minor because there is a work around. The simplest thing
might be to disable the advertised feature of being able to deal with
concatenated strings as it seems it cannot and probably never has been able to.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Apr 4 01:46:02 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Apr 2007 01:46:02 -0400
Subject: [Biojava-dev] [Bug 2261] New: Request for enhancement of
RichSequence
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
Summary: Request for enhancement of RichSequence
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: seq
AssignedTo: biojava-dev at biojava.org
ReportedBy: mark.schreiber at novartis.com
RichSequence implements FeatureHolder but it would be nice if it could
implement RichFeature holder. This would require the addition of four methods
to any implementations of the RichSequence interface but would avoid endless
casting.
If we do it before an official release of bj1.5 we won't strictly be breaking
the interface. Is there any reason why we should not do this??
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From dag at sonsorol.org Fri Apr 6 21:52:01 2007
From: dag at sonsorol.org (Chris Dagdigian)
Date: Fri, 6 Apr 2007 21:52:01 -0400
Subject: [Biojava-dev] Fwd: Bug in
org/biojava/utils/io/UncompressInputStream.java
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com>
Message-ID: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
Passing on this email that came to me ...
Regards,
Chris Dagdigian
OBF
Begin forwarded message:
> From: "Miguel Duarte"
> Date: April 6, 2007 2:16:52 PM EDT
> To: dag at sonsorol.org
> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>
> Hi Chris,
>
>> From http://sourceforge.net/project/shownotes.php?
>> release_id=314770&group_id=18598,
> i've learned that you're maintaining the class
> org/biojava/utils/io/UncompressInputStream.java. If that's not the
> case please forward this mail to the maintainer.
>
> I've discovered a nasty bug: With some read block sizes the algorithm
> truncates a few bytes from the end of the stream. I've verified this
> comparing the gzip/uncompress output for some files versus what
> org/biojava/utils/io/UncompressInputStream.java generates.
>
> Unfortunately i've not discovered the bug yet, but i can contribute
> with the attached test case. How to verify the bug:
> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
> compare the results.
>
> Thanks,
> Miguel Duarte
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BH_03834.MCR.Z
Type: application/x-compress
Size: 26405 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070406/ed47c294/attachment-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uncompressed_by_gzip
Type: application/octet-stream
Size: 81920 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070406/ed47c294/attachment-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uncompressed_by_uncompressInputStream
Type: application/octet-stream
Size: 81832 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070406/ed47c294/attachment-0003.obj
-------------- next part --------------
From mark.schreiber at novartis.com Sun Apr 8 22:13:12 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Mon, 9 Apr 2007 10:13:12 +0800
Subject: [Biojava-dev] Fwd: Bug
in org/biojava/utils/io/UncompressInputStream.java
Message-ID:
Does anyone maintain this class??
More to the point, does anyone know what it is for??? If I look at the
Uses link in javadoc there are aparently none at the public or package
level. Additionally why does biojava need one, are there not java.io
classes that can handle compressed streams??
Is there a good reason why we cannot just clean it out?
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
Chris Dagdigian
Sent by: biojava-dev-bounces at lists.open-bio.org
04/07/2007 09:52 AM
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
Passing on this email that came to me ...
Regards,
Chris Dagdigian
OBF
Begin forwarded message:
> From: "Miguel Duarte"
> Date: April 6, 2007 2:16:52 PM EDT
> To: dag at sonsorol.org
> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>
> Hi Chris,
>
>> From http://sourceforge.net/project/shownotes.php?
>> release_id=314770&group_id=18598,
> i've learned that you're maintaining the class
> org/biojava/utils/io/UncompressInputStream.java. If that's not the
> case please forward this mail to the maintainer.
>
> I've discovered a nasty bug: With some read block sizes the algorithm
> truncates a few bytes from the end of the stream. I've verified this
> comparing the gzip/uncompress output for some files versus what
> org/biojava/utils/io/UncompressInputStream.java generates.
>
> Unfortunately i've not discovered the bug yet, but i can contribute
> with the attached test case. How to verify the bug:
> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
> compare the results.
>
> Thanks,
> Miguel Duarte
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
[ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
[ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
[ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
Schreiber ]
From holland at ebi.ac.uk Tue Apr 10 04:59:32 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 09:59:32 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To:
References:
Message-ID: <461B51F4.8010402@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I have no idea what it is for. There are generic Java classes provided
with the SDK that do the same job. I think we should probably drop it.
Lets wait to see if anyone shouts first.
mark.schreiber at novartis.com wrote:
> Does anyone maintain this class??
>
> More to the point, does anyone know what it is for??? If I look at the
> Uses link in javadoc there are aparently none at the public or package
> level. Additionally why does biojava need one, are there not java.io
> classes that can handle compressed streams??
>
> Is there a good reason why we cannot just clean it out?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> Chris Dagdigian
> Sent by: biojava-dev-bounces at lists.open-bio.org
> 04/07/2007 09:52 AM
>
>
> To: biojava-dev at biojava.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>
>
>
> Passing on this email that came to me ...
>
> Regards,
> Chris Dagdigian
> OBF
>
>
> Begin forwarded message:
>
>> From: "Miguel Duarte"
>> Date: April 6, 2007 2:16:52 PM EDT
>> To: dag at sonsorol.org
>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>> Hi Chris,
>>
>>> From http://sourceforge.net/project/shownotes.php?
>>> release_id=314770&group_id=18598,
>> i've learned that you're maintaining the class
>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>> case please forward this mail to the maintainer.
>>
>> I've discovered a nasty bug: With some read block sizes the algorithm
>> truncates a few bytes from the end of the stream. I've verified this
>> comparing the gzip/uncompress output for some files versus what
>> org/biojava/utils/io/UncompressInputStream.java generates.
>>
>> Unfortunately i've not discovered the bug yet, but i can contribute
>> with the attached test case. How to verify the bug:
>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>> compare the results.
>>
>> Thanks,
>> Miguel Duarte
>
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
> Schreiber ]
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
HoCuWrx5k2ONg/9oxIfVVPI=
=cGTy
-----END PGP SIGNATURE-----
From ayates at ebi.ac.uk Tue Apr 10 05:56:57 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 10:56:57 +0100
Subject: [Biojava-dev]
Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B51F4.8010402@ebi.ac.uk>
References:
<461B51F4.8010402@ebi.ac.uk>
Message-ID: <461B5F69.9060506@ebi.ac.uk>
I don't think there are standard classes for this compression format in
the SDK. There are ones for GZIP & ZIP but not for LZW which this one is
dealing with. Also I'm not sure about using GZIP to unzip a file
compressed with LZW since GZIP uses DEFLATE.
We need to decompress the file using uncompress (which is missing from
my Linux box but is on the mac ... go figure) and then match that up to
the output from UncompressInputStream & see if they agree or not.
Andy
Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I have no idea what it is for. There are generic Java classes provided
> with the SDK that do the same job. I think we should probably drop it.
> Lets wait to see if anyone shouts first.
>
> mark.schreiber at novartis.com wrote:
>> Does anyone maintain this class??
>>
>> More to the point, does anyone know what it is for??? If I look at the
>> Uses link in javadoc there are aparently none at the public or package
>> level. Additionally why does biojava need one, are there not java.io
>> classes that can handle compressed streams??
>>
>> Is there a good reason why we cannot just clean it out?
>>
>> - Mark
>>
>> Mark Schreiber
>> Research Investigator (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax +65 6722 2910
>>
>>
>>
>>
>>
>> Chris Dagdigian
>> Sent by: biojava-dev-bounces at lists.open-bio.org
>> 04/07/2007 09:52 AM
>>
>>
>> To: biojava-dev at biojava.org
>> cc: (bcc: Mark Schreiber/GP/Novartis)
>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>>
>>
>> Passing on this email that came to me ...
>>
>> Regards,
>> Chris Dagdigian
>> OBF
>>
>>
>> Begin forwarded message:
>>
>>> From: "Miguel Duarte"
>>> Date: April 6, 2007 2:16:52 PM EDT
>>> To: dag at sonsorol.org
>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>> Hi Chris,
>>>
>>>> From http://sourceforge.net/project/shownotes.php?
>>>> release_id=314770&group_id=18598,
>>> i've learned that you're maintaining the class
>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>> case please forward this mail to the maintainer.
>>>
>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>> truncates a few bytes from the end of the stream. I've verified this
>>> comparing the gzip/uncompress output for some files versus what
>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>
>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>> with the attached test case. How to verify the bug:
>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>> compare the results.
>>>
>>> Thanks,
>>> Miguel Duarte
>>
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
>> Schreiber ]
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
> HoCuWrx5k2ONg/9oxIfVVPI=
> =cGTy
> -----END PGP SIGNATURE-----
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From ayates at ebi.ac.uk Tue Apr 10 06:03:40 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 11:03:40 +0100
Subject: [Biojava-dev]
Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B5F69.9060506@ebi.ac.uk>
References: <461B51F4.8010402@ebi.ac.uk>
<461B5F69.9060506@ebi.ac.uk>
Message-ID: <461B60FC.7040903@ebi.ac.uk>
Okay a quick run of uncompress on the mac with the files in question
does produce a file which is equivalent to the file produced by gzip but
not to the one produced by UncompressInputStream.
The required md5sum for a pass should be (after a md5 digest):
9f0924237d20288793172091d61f85b8 uncompressed_by_gzip
But we get:
17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream
Okay so looks like there is something "wrong". Seems like it drops 88
bytes from the decompression.
Wonder what happens if we pass this file type through the
GZIPInputStream from the JDK?
Andy Yates wrote:
> I don't think there are standard classes for this compression format in
> the SDK. There are ones for GZIP & ZIP but not for LZW which this one is
> dealing with. Also I'm not sure about using GZIP to unzip a file
> compressed with LZW since GZIP uses DEFLATE.
>
> We need to decompress the file using uncompress (which is missing from
> my Linux box but is on the mac ... go figure) and then match that up to
> the output from UncompressInputStream & see if they agree or not.
>
> Andy
>
> Richard Holland wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I have no idea what it is for. There are generic Java classes provided
>> with the SDK that do the same job. I think we should probably drop it.
>> Lets wait to see if anyone shouts first.
>>
>> mark.schreiber at novartis.com wrote:
>>> Does anyone maintain this class??
>>>
>>> More to the point, does anyone know what it is for??? If I look at the
>>> Uses link in javadoc there are aparently none at the public or package
>>> level. Additionally why does biojava need one, are there not java.io
>>> classes that can handle compressed streams??
>>>
>>> Is there a good reason why we cannot just clean it out?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax +65 6722 2910
>>>
>>>
>>>
>>>
>>>
>>> Chris Dagdigian
>>> Sent by: biojava-dev-bounces at lists.open-bio.org
>>> 04/07/2007 09:52 AM
>>>
>>>
>>> To: biojava-dev at biojava.org
>>> cc: (bcc: Mark Schreiber/GP/Novartis)
>>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>>
>>>
>>> Passing on this email that came to me ...
>>>
>>> Regards,
>>> Chris Dagdigian
>>> OBF
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: "Miguel Duarte"
>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>> To: dag at sonsorol.org
>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>
>>>> Hi Chris,
>>>>
>>>>> From http://sourceforge.net/project/shownotes.php?
>>>>> release_id=314770&group_id=18598,
>>>> i've learned that you're maintaining the class
>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>> case please forward this mail to the maintainer.
>>>>
>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>> truncates a few bytes from the end of the stream. I've verified this
>>>> comparing the gzip/uncompress output for some files versus what
>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>
>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>> with the attached test case. How to verify the bug:
>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>> compare the results.
>>>>
>>>> Thanks,
>>>> Miguel Duarte
>>>
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
>>> Schreiber ]
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
>> HoCuWrx5k2ONg/9oxIfVVPI=
>> =cGTy
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From holland at ebi.ac.uk Tue Apr 10 06:09:04 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 11:09:04 +0100
Subject: [Biojava-dev] Fwd: Bug
in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com>
<2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
Message-ID: <461B6240.2070907@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
AFAIK the Zip algorithm is just LZW with bells on, so it should produce
exactly the same results.
Chris Dagdigian wrote:
>
> Passing on this email that came to me ...
>
> Regards,
> Chris Dagdigian
> OBF
>
>
> Begin forwarded message:
>
>> From: "Miguel Duarte"
>> Date: April 6, 2007 2:16:52 PM EDT
>> To: dag at sonsorol.org
>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>> Hi Chris,
>>
>>> From
>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>
>> i've learned that you're maintaining the class
>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>> case please forward this mail to the maintainer.
>>
>> I've discovered a nasty bug: With some read block sizes the algorithm
>> truncates a few bytes from the end of the stream. I've verified this
>> comparing the gzip/uncompress output for some files versus what
>> org/biojava/utils/io/UncompressInputStream.java generates.
>>
>> Unfortunately i've not discovered the bug yet, but i can contribute
>> with the attached test case. How to verify the bug:
>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>> compare the results.
>>
>> Thanks,
>> Miguel Duarte
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG2JA4C5LeMEKA/QRAjutAJ9cZbqpoag2Z5aQd4gbOAiMm78VZACdHzER
UoIhheyTE1805rMBzG4R+Q0=
=hfN2
-----END PGP SIGNATURE-----
From ayates at ebi.ac.uk Tue Apr 10 06:12:58 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 11:12:58 +0100
Subject: [Biojava-dev]
Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B60FC.7040903@ebi.ac.uk>
References: <461B51F4.8010402@ebi.ac.uk>
<461B5F69.9060506@ebi.ac.uk> <461B60FC.7040903@ebi.ac.uk>
Message-ID: <461B632A.10101@ebi.ac.uk>
Quick program to pass it through the GZIPInputStream chucks an
IOException saying it's not in the GZIP format (which it isn't). Also
passing it through the ZipInputStream seems to do nothing.
At any rate it looks like we cannot get rid of this class; it's got to
be fixed/maintained
Andy Yates wrote:
> Okay a quick run of uncompress on the mac with the files in question
> does produce a file which is equivalent to the file produced by gzip but
> not to the one produced by UncompressInputStream.
>
> The required md5sum for a pass should be (after a md5 digest):
>
> 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip
>
> But we get:
>
> 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream
>
> Okay so looks like there is something "wrong". Seems like it drops 88
> bytes from the decompression.
>
> Wonder what happens if we pass this file type through the
> GZIPInputStream from the JDK?
>
> Andy Yates wrote:
>> I don't think there are standard classes for this compression format
>> in the SDK. There are ones for GZIP & ZIP but not for LZW which this
>> one is dealing with. Also I'm not sure about using GZIP to unzip a
>> file compressed with LZW since GZIP uses DEFLATE.
>>
>> We need to decompress the file using uncompress (which is missing from
>> my Linux box but is on the mac ... go figure) and then match that up
>> to the output from UncompressInputStream & see if they agree or not.
>>
>> Andy
>>
>> Richard Holland wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> I have no idea what it is for. There are generic Java classes provided
>>> with the SDK that do the same job. I think we should probably drop it.
>>> Lets wait to see if anyone shouts first.
>>>
>>> mark.schreiber at novartis.com wrote:
>>>> Does anyone maintain this class??
>>>>
>>>> More to the point, does anyone know what it is for??? If I look at
>>>> the Uses link in javadoc there are aparently none at the public or
>>>> package level. Additionally why does biojava need one, are there not
>>>> java.io classes that can handle compressed streams??
>>>>
>>>> Is there a good reason why we cannot just clean it out?
>>>>
>>>> - Mark
>>>>
>>>> Mark Schreiber
>>>> Research Investigator (Bioinformatics)
>>>>
>>>> Novartis Institute for Tropical Diseases (NITD)
>>>> 10 Biopolis Road
>>>> #05-01 Chromos
>>>> Singapore 138670
>>>> www.nitd.novartis.com
>>>>
>>>> phone +65 6722 2973
>>>> fax +65 6722 2910
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Chris Dagdigian
>>>> Sent by: biojava-dev-bounces at lists.open-bio.org
>>>> 04/07/2007 09:52 AM
>>>>
>>>>
>>>> To: biojava-dev at biojava.org
>>>> cc: (bcc: Mark Schreiber/GP/Novartis)
>>>> Subject: [Biojava-dev] Fwd: Bug in
>>>> org/biojava/utils/io/UncompressInputStream.java
>>>>
>>>>
>>>>
>>>> Passing on this email that came to me ...
>>>>
>>>> Regards,
>>>> Chris Dagdigian
>>>> OBF
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: "Miguel Duarte"
>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>> To: dag at sonsorol.org
>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>>> From http://sourceforge.net/project/shownotes.php?
>>>>>> release_id=314770&group_id=18598,
>>>>> i've learned that you're maintaining the class
>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>> case please forward this mail to the maintainer.
>>>>>
>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>> comparing the gzip/uncompress output for some files versus what
>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>
>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>> with the attached test case. How to verify the bug:
>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>> compare the results.
>>>>>
>>>>> Thanks,
>>>>> Miguel Duarte
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by
>>>> Mark Schreiber ]
>>>>
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
>>> HoCuWrx5k2ONg/9oxIfVVPI=
>>> =cGTy
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From holland at ebi.ac.uk Tue Apr 10 06:37:36 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 11:37:36 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B6443.9060300@ebi.ac.uk>
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
<461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk>
Message-ID: <461B68F0.4000908@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Why are these files in compress/uncompress format? Is it proprietary
software creating them, or a legacy system of some kind? Wouldn't gzip
give better results both in terms of compression ratios and performance
as it is far more up-to-date?
I believe that the JDK doesn't support LZW because LZW was patented, and
that patent expired only very recently (in 2003/4/5/6 depending on where
you live and in what form you use LZW):
http://www.gnu.org/philosophy/gif.html
It's one of those wonderful cases where the patent enforcement caused
the algorithm it was protecting to get dumped and forgotten because
nobody wanted to pay for it. Apart from *nix compress/uncompress and
inside the GIF format I'm not sure it's actually used anywhere else any
more.
Technically we infringed the patent by including LZW support in BioJava,
but now the patent has expired we no longer need to worry.
Question is, do we need to fix this inherently computer-science problem
which is entirely unrelated to biology or bioinformatics, or can we just
get people to use an alternative library instead which supports it
better and is more generic? They are out there, for instance:
http://www.chilkatsoft.com/java-zip.asp
cheers,
Richard
Andy Yates wrote:
> Seems very strange this does. I don't know much about decompression but
> by the looks of things LZW isn't supported by the JDK.
>
> Richard Holland wrote:
> AFAIK the Zip algorithm is just LZW with bells on, so it should produce
> exactly the same results.
>
> Chris Dagdigian wrote:
>>>> Passing on this email that came to me ...
>>>>
>>>> Regards,
>>>> Chris Dagdigian
>>>> OBF
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: "Miguel Duarte"
>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>> To: dag at sonsorol.org
>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>>> From
>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>>>>
>>>>>>
>>>>> i've learned that you're maintaining the class
>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>> case please forward this mail to the maintainer.
>>>>>
>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>> comparing the gzip/uncompress output for some files versus what
>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>
>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>> with the attached test case. How to verify the bug:
>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>> compare the results.
>>>>>
>>>>> Thanks,
>>>>> Miguel Duarte
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4
CwgzY5S7KELC3TA1oKKtjUw=
=9xEM
-----END PGP SIGNATURE-----
From ayates at ebi.ac.uk Tue Apr 10 06:54:27 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 11:54:27 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B68F0.4000908@ebi.ac.uk>
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
<461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk>
<461B68F0.4000908@ebi.ac.uk>
Message-ID: <461B6CE3.303@ebi.ac.uk>
I guess it all depends really on what is the software that is producing
these files. If it is something very common to Bioinformatics we might
have to accept that support needs to come in from somewhere; and by the
looks of things the techniques for compression are quite varied (the man
for compress mentions things about adaptive dictionaries and the alike).
Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Why are these files in compress/uncompress format? Is it proprietary
> software creating them, or a legacy system of some kind? Wouldn't gzip
> give better results both in terms of compression ratios and performance
> as it is far more up-to-date?
>
> I believe that the JDK doesn't support LZW because LZW was patented, and
> that patent expired only very recently (in 2003/4/5/6 depending on where
> you live and in what form you use LZW):
>
> http://www.gnu.org/philosophy/gif.html
>
> It's one of those wonderful cases where the patent enforcement caused
> the algorithm it was protecting to get dumped and forgotten because
> nobody wanted to pay for it. Apart from *nix compress/uncompress and
> inside the GIF format I'm not sure it's actually used anywhere else any
> more.
>
> Technically we infringed the patent by including LZW support in BioJava,
> but now the patent has expired we no longer need to worry.
>
> Question is, do we need to fix this inherently computer-science problem
> which is entirely unrelated to biology or bioinformatics, or can we just
> get people to use an alternative library instead which supports it
> better and is more generic? They are out there, for instance:
>
> http://www.chilkatsoft.com/java-zip.asp
>
> cheers,
> Richard
>
> Andy Yates wrote:
>> Seems very strange this does. I don't know much about decompression but
>> by the looks of things LZW isn't supported by the JDK.
>>
>> Richard Holland wrote:
>> AFAIK the Zip algorithm is just LZW with bells on, so it should produce
>> exactly the same results.
>>
>> Chris Dagdigian wrote:
>>>>> Passing on this email that came to me ...
>>>>>
>>>>> Regards,
>>>>> Chris Dagdigian
>>>>> OBF
>>>>>
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>>> From: "Miguel Duarte"
>>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>>> To: dag at sonsorol.org
>>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>>> From
>>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>>>>>
>>>>>>>
>>>>>> i've learned that you're maintaining the class
>>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>>> case please forward this mail to the maintainer.
>>>>>>
>>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>>> comparing the gzip/uncompress output for some files versus what
>>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>>
>>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>>> with the attached test case. How to verify the bug:
>>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>>> compare the results.
>>>>>>
>>>>>> Thanks,
>>>>>> Miguel Duarte
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4
> CwgzY5S7KELC3TA1oKKtjUw=
> =9xEM
> -----END PGP SIGNATURE-----
From ap3 at sanger.ac.uk Tue Apr 10 06:27:39 2007
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 10 Apr 2007 11:27:39 +0100
Subject: [Biojava-dev] Fwd: Bug
in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To:
References:
Message-ID:
Hi!
I committed this class a while ago, since I did not find any other way
to read .Z compressed files.
Unfortunately PDB files are often stored like that ...
If anybody has a suggestion how to read unix compressed files (.Z) in a
better way, I would be glad to hear.
Parsing them as Zip or GZip did not work in my trials...
Andreas
On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote:
> Does anyone maintain this class??
>
> More to the point, does anyone know what it is for??? If I look at the
> Uses link in javadoc there are aparently none at the public or package
> level. Additionally why does biojava need one, are there not java.io
> classes that can handle compressed streams??
>
> Is there a good reason why we cannot just clean it out?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> Chris Dagdigian
> Sent by: biojava-dev-bounces at lists.open-bio.org
> 04/07/2007 09:52 AM
>
>
> To: biojava-dev at biojava.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-dev] Fwd: Bug in
> org/biojava/utils/io/UncompressInputStream.java
>
>
>
> Passing on this email that came to me ...
>
> Regards,
> Chris Dagdigian
> OBF
>
>
> Begin forwarded message:
>
>> From: "Miguel Duarte"
>> Date: April 6, 2007 2:16:52 PM EDT
>> To: dag at sonsorol.org
>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>> Hi Chris,
>>
>>> From http://sourceforge.net/project/shownotes.php?
>>> release_id=314770&group_id=18598,
>> i've learned that you're maintaining the class
>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>> case please forward this mail to the maintainer.
>>
>> I've discovered a nasty bug: With some read block sizes the algorithm
>> truncates a few bytes from the end of the stream. I've verified this
>> comparing the gzip/uncompress output for some files versus what
>> org/biojava/utils/io/UncompressInputStream.java generates.
>>
>> Unfortunately i've not discovered the bug yet, but i can contribute
>> with the attached test case. How to verify the bug:
>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>> compare the results.
>>
>> Thanks,
>> Miguel Duarte
>
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
> Schreiber ]
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
From holland at ebi.ac.uk Tue Apr 10 07:01:36 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 12:01:36 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To:
References:
Message-ID: <461B6E90.3090501@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andreas - did you write the class? If so, then you may understand it
better than the rest of us. Would you be willing to attempt to fix it?
cheers,
Richard
Andreas Prlic wrote:
> Hi!
>
> I committed this class a while ago, since I did not find any other way
> to read .Z compressed files.
>
> Unfortunately PDB files are often stored like that ...
>
> If anybody has a suggestion how to read unix compressed files (.Z) in a
> better way, I would be glad to hear.
>
> Parsing them as Zip or GZip did not work in my trials...
>
> Andreas
>
>
>
>
>
> On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote:
>
>> Does anyone maintain this class??
>>
>> More to the point, does anyone know what it is for??? If I look at the
>> Uses link in javadoc there are aparently none at the public or package
>> level. Additionally why does biojava need one, are there not java.io
>> classes that can handle compressed streams??
>>
>> Is there a good reason why we cannot just clean it out?
>>
>> - Mark
>>
>> Mark Schreiber
>> Research Investigator (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax +65 6722 2910
>>
>>
>>
>>
>>
>> Chris Dagdigian
>> Sent by: biojava-dev-bounces at lists.open-bio.org
>> 04/07/2007 09:52 AM
>>
>>
>> To: biojava-dev at biojava.org
>> cc: (bcc: Mark Schreiber/GP/Novartis)
>> Subject: [Biojava-dev] Fwd: Bug in
>> org/biojava/utils/io/UncompressInputStream.java
>>
>>
>>
>> Passing on this email that came to me ...
>>
>> Regards,
>> Chris Dagdigian
>> OBF
>>
>>
>> Begin forwarded message:
>>
>>> From: "Miguel Duarte"
>>> Date: April 6, 2007 2:16:52 PM EDT
>>> To: dag at sonsorol.org
>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>> Hi Chris,
>>>
>>>> From http://sourceforge.net/project/shownotes.php?
>>>> release_id=314770&group_id=18598,
>>> i've learned that you're maintaining the class
>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>> case please forward this mail to the maintainer.
>>>
>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>> truncates a few bytes from the end of the stream. I've verified this
>>> comparing the gzip/uncompress output for some files versus what
>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>
>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>> with the attached test case. How to verify the bug:
>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>> compare the results.
>>>
>>> Thanks,
>>> Miguel Duarte
>>
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
>> Schreiber ]
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
> -----------------------------------------------------------------------
>
> Andreas Prlic Wellcome Trust Sanger Institute
> Hinxton, Cambridge CB10 1SA, UK
> +44 (0) 1223 49 6891
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG26Q4C5LeMEKA/QRAiMxAJ4u4RUjTGODjClIM1LIRzP12xNUOgCgifA+
14CbPaY5SwcG1/wUHJVpl/U=
=wBDT
-----END PGP SIGNATURE-----
From markjschreiber at gmail.com Tue Apr 10 07:29:03 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Apr 2007 19:29:03 +0800
Subject: [Biojava-dev] Fwd: Bug in
org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B60FC.7040903@ebi.ac.uk>
References:
<461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk>
<461B60FC.7040903@ebi.ac.uk>
Message-ID: <93b45ca50704100429u388f5b8ax86ba05e5d05e02a9@mail.gmail.com>
Without looking at the code I would guess that dropping 88 bytes could
be because of a buffered reader or writer not flushing before it is
closed??
- Mark
On 4/10/07, Andy Yates wrote:
> Okay a quick run of uncompress on the mac with the files in question
> does produce a file which is equivalent to the file produced by gzip but
> not to the one produced by UncompressInputStream.
>
> The required md5sum for a pass should be (after a md5 digest):
>
> 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip
>
> But we get:
>
> 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream
>
> Okay so looks like there is something "wrong". Seems like it drops 88
> bytes from the decompression.
>
> Wonder what happens if we pass this file type through the
> GZIPInputStream from the JDK?
>
> Andy Yates wrote:
> > I don't think there are standard classes for this compression format in
> > the SDK. There are ones for GZIP & ZIP but not for LZW which this one is
> > dealing with. Also I'm not sure about using GZIP to unzip a file
> > compressed with LZW since GZIP uses DEFLATE.
> >
> > We need to decompress the file using uncompress (which is missing from
> > my Linux box but is on the mac ... go figure) and then match that up to
> > the output from UncompressInputStream & see if they agree or not.
> >
> > Andy
> >
> > Richard Holland wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> I have no idea what it is for. There are generic Java classes provided
> >> with the SDK that do the same job. I think we should probably drop it.
> >> Lets wait to see if anyone shouts first.
> >>
> >> mark.schreiber at novartis.com wrote:
> >>> Does anyone maintain this class??
> >>>
> >>> More to the point, does anyone know what it is for??? If I look at the
> >>> Uses link in javadoc there are aparently none at the public or package
> >>> level. Additionally why does biojava need one, are there not java.io
> >>> classes that can handle compressed streams??
> >>>
> >>> Is there a good reason why we cannot just clean it out?
> >>>
> >>> - Mark
> >>>
> >>> Mark Schreiber
> >>> Research Investigator (Bioinformatics)
> >>>
> >>> Novartis Institute for Tropical Diseases (NITD)
> >>> 10 Biopolis Road
> >>> #05-01 Chromos
> >>> Singapore 138670
> >>> www.nitd.novartis.com
> >>>
> >>> phone +65 6722 2973
> >>> fax +65 6722 2910
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Chris Dagdigian
> >>> Sent by: biojava-dev-bounces at lists.open-bio.org
> >>> 04/07/2007 09:52 AM
> >>>
> >>>
> >>> To: biojava-dev at biojava.org
> >>> cc: (bcc: Mark Schreiber/GP/Novartis)
> >>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
> >>>
> >>>
> >>>
> >>> Passing on this email that came to me ...
> >>>
> >>> Regards,
> >>> Chris Dagdigian
> >>> OBF
> >>>
> >>>
> >>> Begin forwarded message:
> >>>
> >>>> From: "Miguel Duarte"
> >>>> Date: April 6, 2007 2:16:52 PM EDT
> >>>> To: dag at sonsorol.org
> >>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
> >>>>
> >>>> Hi Chris,
> >>>>
> >>>>> From http://sourceforge.net/project/shownotes.php?
> >>>>> release_id=314770&group_id=18598,
> >>>> i've learned that you're maintaining the class
> >>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
> >>>> case please forward this mail to the maintainer.
> >>>>
> >>>> I've discovered a nasty bug: With some read block sizes the algorithm
> >>>> truncates a few bytes from the end of the stream. I've verified this
> >>>> comparing the gzip/uncompress output for some files versus what
> >>>> org/biojava/utils/io/UncompressInputStream.java generates.
> >>>>
> >>>> Unfortunately i've not discovered the bug yet, but i can contribute
> >>>> with the attached test case. How to verify the bug:
> >>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
> >>>> compare the results.
> >>>>
> >>>> Thanks,
> >>>> Miguel Duarte
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>>
> >>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
> >>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
> >>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
> >>> Schreiber ]
> >>>
> >>>
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.2.2 (GNU/Linux)
> >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >>
> >> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
> >> HoCuWrx5k2ONg/9oxIfVVPI=
> >> =cGTy
> >> -----END PGP SIGNATURE-----
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
From bugzilla-daemon at portal.open-bio.org Tue Apr 10 07:58:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Apr 2007 07:58:59 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704101158.l3ABwxRV028563@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |holland at ebi.ac.uk
------- Comment #1 from holland at ebi.ac.uk 2007-04-10 07:58 EST -------
I think the only reason it doesn't already do so is because when I wrote
RichSequence, RichFeatureHolder hadn't been invented yet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 10 08:15:28 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Apr 2007 08:15:28 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704101215.l3ACFSYY029387@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
------- Comment #2 from holland at ebi.ac.uk 2007-04-10 08:15 EST -------
Just looked at this a bit closer and found that only Features can hold other
Features - RichFeatureHolder represents the FeatureRelationship portion of
BioSQL. FeatureRelationships exist between features, and not between sequences
and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe
it should be FeatureRelationshipHolder, or something like that?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From ap3 at sanger.ac.uk Tue Apr 10 09:02:17 2007
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 10 Apr 2007 14:02:17 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B6E90.3090501@ebi.ac.uk>
References:
<461B6E90.3090501@ebi.ac.uk>
Message-ID:
> Andreas - did you write the class? If so, then you may understand it
> better than the rest of us. Would you be willing to attempt to fix it?
No, I did not write it - it is a LGPL class which I found in another
project.
see http://www.innovation.ch/java/HTTPClient/ or also the header in
the file.
I will try to have a look at this problem, but not sure if I can fix it
quickly.
PDB data is still available for download as .Z files, e.g.
ftp://ftp.rcsb.org/pub/pdb/data/structures/divided/pdb/ar/
that's why I would need to have some tools for reading these.
I agree this is a general problem and the solution does not necessarily
have to be part of BioJava.
I don;t think any patent got infringed, since the file got committed
after they had expired.
Andreas
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
From bugzilla-daemon at portal.open-bio.org Wed Apr 11 01:12:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Apr 2007 01:12:29 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704110512.l3B5CT9v008783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
------- Comment #3 from mark.schreiber at novartis.com 2007-04-11 01:12 EST -------
(In reply to comment #2)
> Just looked at this a bit closer and found that only Features can hold other
> Features - RichFeatureHolder represents the FeatureRelationship portion of
> BioSQL. FeatureRelationships exist between features, and not between sequences
> and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe
> it should be FeatureRelationshipHolder, or something like that?
I would agree with the proposal to rename it. It would save a lot of confusion.
It should be a pretty simple task to refactor it but it would need to happen
before a release version.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mark.schreiber at novartis.com Wed Apr 11 23:19:18 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 12 Apr 2007 11:19:18 +0800
Subject: [Biojava-dev] javacc
Message-ID:
Hello -
Has anyone ever written a javacc lexer / parser for Genbank (or any of the
other major formats?).
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
From bugzilla-daemon at portal.open-bio.org Fri Apr 13 01:08:33 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Apr 2007 01:08:33 -0400
Subject: [Biojava-dev] [Bug 2273] New: More problems writing uniprot files
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2273
Summary: More problems writing uniprot files
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: seq.io
AssignedTo: biojava-dev at biojava.org
ReportedBy: gwaldon at geneinfinity.org
I found a few problems during the writing of uniprot files. Using P04941 as a
test exemple:
1. The ID line does not appear with a fix format (this is probably not a bug
actually):
(before/after - read/write)
ID KV6A7_MOUSE Reviewed; 107 AA.
ID KV6A7_MOUSE Reviewed; 107 AA.
2. The reference title get truncated at the end by one character after each
read/write operation:
RT phenyloxazolone and its early diversification.";
RT phenyloxazolone and its early diversification";
RT phenyloxazolone and its early diversificatio";
...
3. The FT line is not formatted correctly; this is a bug because the FT line
has a fixed format, the I of Ig should be at position 35:
(before/after - read/write)
FT CHAIN 1 >107 Ig kappa chain V-VI region NQ2-48.2.2.
FT CHAIN 1 107> Ig kappa chain V-VI region NQ2-48.2.2.
4. SQ line, are-these exactly the same CRC64 number?
SQ SEQUENCE 107 AA; 11557 MW; 72488DA9EF354934 CRC64;
SQ SEQUENCE 107 AA; 11564 MW; ffffffffe278ca323958dd50 CRC64;
- George
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From invite at facebook.com Fri Apr 13 13:12:04 2007
From: invite at facebook.com (Biswaroop Ghosh)
Date: Fri, 13 Apr 2007 10:12:04 -0700
Subject: [Biojava-dev] I've added you as a friend on Facebook...
Message-ID: <82cdae6537486b8fd6048bc766c5c1c5@register.facebook.com>
I've requested to add you as a friend on Facebook. You can use Facebook to see the profiles of the people around you, share photos, and connect with friends. Now everyone can join Facebook, even if you couldn't before.
Thanks,
Biswaroop
P.S. Here's the link:
http://www.facebook.com/p.php?i=695556070&k=10ae46824c&r&v=2
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:08:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:08:29 -0400
Subject: [Biojava-dev] [Bug 2273] More problems writing uniprot files
In-Reply-To:
Message-ID: <200704171208.l3HC8T2G004508@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2273
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:08 EST -------
I have fixed points 1-3. Point 4 I have raised as a new bug for someone else to
fix - the problem goes deeper than just UniProtFormat!
Can you check the code I have committed in CVS and update this bug accordingly
with what you find.
I have not written a unit test as I'm very busy at present and don't have the
time. If you could add one in that would be great.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:11:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:11:11 -0400
Subject: [Biojava-dev] [Bug 2274] New: CRC64 checksum toString() returning
incorrect values
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2274
Summary: CRC64 checksum toString() returning incorrect values
Product: BioJava
Version: live (CVS source)
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Others
AssignedTo: biojava-dev at biojava.org
ReportedBy: holland at ebi.ac.uk
In org.biojavax.utils.CRC64Checksum the toString() method returns 24-character
strings, when CRC64 checksums are only 16-character. Also need to check that
the correct polynomials etc. are being used.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:19:04 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:19:04 -0400
Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning
incorrect values
In-Reply-To:
Message-ID: <200704171219.l3HCJ4op005457@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2274
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:19 EST -------
Fixed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:19:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:19:15 -0400
Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning
incorrect values
In-Reply-To:
Message-ID: <200704171219.l3HCJFwu005500@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2274
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |CLOSED
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:25:50 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:25:50 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704171225.l3HCPo16006114@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from holland at ebi.ac.uk 2007-04-17 08:25 EST -------
Done. Renamed to RichFeatureRelationshipHolder and removed reference to
RichFeatureHolder as it is technically no such thing.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jburdick at keyfitz.org Thu Apr 19 14:26:55 2007
From: jburdick at keyfitz.org (Josh Burdick)
Date: Thu, 19 Apr 2007 14:26:55 -0400
Subject: [Biojava-dev] reading a subsequence from a .nib file
In-Reply-To:
References:
Message-ID: <1177007215.5481.4.camel@localhost.localdomain>
On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote:
> Hi -
>
> Too my knowledge nothing like this exists in BioJava. Could someone take
> it the last mile and make it produce SymbolLists?
>
I went ahead and added a method getSymbolListByLocation() which takes
the string and converts it to a SymbolList using DNATools. There are
bound to be more efficient ways to do this, but I think this a
reasonable start.
The files are in the same locations:
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
Hopefully someone will find this code useful.
Josh
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
[...]
From gwaldon at geneinfinity.org Thu Apr 19 17:05:09 2007
From: gwaldon at geneinfinity.org (george waldon)
Date: Thu, 19 Apr 2007 14:05:09 -0700
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
Message-ID: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com>
LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format.
This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence.
I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations?
Thanks,
George
From mark.schreiber at novartis.com Thu Apr 19 22:44:51 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 20 Apr 2007 10:44:51 +0800
Subject: [Biojava-dev] reading a subsequence from a .nib file
Message-ID:
Hi Josh -
Looks good. Just one thing, your JUnit test contains a hardcoded file
path to the test file which means it is not portable. Could you modify
that so that it loads the file from the classpath as a resource (see some
of the IO unit tests for examples). Can you also provide the test file.
Best regards,
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
Josh Burdick
04/20/2007 02:26 AM
To: mark.schreiber at novartis.com
cc: biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] reading a subsequence from a .nib file
On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote:
> Hi -
>
> Too my knowledge nothing like this exists in BioJava. Could someone take
> it the last mile and make it produce SymbolLists?
>
I went ahead and added a method getSymbolListByLocation() which takes
the string and converts it to a SymbolList using DNATools. There are
bound to be more efficient ways to do this, but I think this a
reasonable start.
The files are in the same locations:
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
Hopefully someone will find this code useful.
Josh
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
[...]
From mark.schreiber at novartis.com Thu Apr 19 22:46:56 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 20 Apr 2007 10:46:56 +0800
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
Message-ID:
I think this sounds sensible. Having another term for something that can
be derived from the alphabet is redundant.
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
"george waldon"
Sent by: biojava-dev-bounces at lists.open-bio.org
04/20/2007 05:05 AM
Please respond to george waldon
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish
between "aa" and "bp" during the write operation in uniprot format and in
genbank format.
This code is error-prone. For instance, converting a protein sequence from
a fasta file to a genbank formatted file still write "bp". Indeed, the
sequence annotation for this term should be generated during the
enrichment of sequence.
I don't think the extra-work is really necessary. Is-there any objection
that I remove this term and rely instead on the alphabet (either PROTEIN
or PROTEIN_TERM) during the writing operations?
Thanks,
George
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From holland at ebi.ac.uk Fri Apr 20 04:29:18 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Fri, 20 Apr 2007 09:29:18 +0100
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
In-Reply-To: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com>
References: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com>
Message-ID: <462879DE.5000507@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
No objections. I think your logic is better than mine here. Please go ahead.
cheers,
Richard
george waldon wrote:
> LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format.
>
> This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence.
>
> I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations?
>
> Thanks,
> George
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGKHne4C5LeMEKA/QRAsD6AKCN4Nj7LMk3fCjAcrfE1Lw+Se3FJQCdH142
Sz6DYxYj1HedeHPZJpejtQs=
=9HDc
-----END PGP SIGNATURE-----
From jburdick at keyfitz.org Wed Apr 11 13:42:18 2007
From: jburdick at keyfitz.org (Josh Burdick)
Date: Wed, 11 Apr 2007 17:42:18 -0000
Subject: [Biojava-dev] reading a subsequence from a .nib file
In-Reply-To:
References:
Message-ID: <1176312083.21937.42.camel@localhost.localdomain>
On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote:
> Hi -
>
> Too my knowledge nothing like this exists in BioJava. Could someone take
> it the last mile and make it produce SymbolLists?
>
I added a method that just takes the string and makes it into a
SymbolList using DNATools. This is somewhat inefficient (you can make a
SymbolList directly as an array of numbers, but I wasn't certain enough
that I understood it to try that.)
The package name should be changed, and the test code should probably
do somewhat more, but other than that, if someone wants to add it, feel
free. (The two files are at the same location as before.)
Josh
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
From bugzilla-daemon at portal.open-bio.org Sun Apr 1 17:03:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Apr 2007 13:03:15 -0400
Subject: [Biojava-dev] [Bug 2253] NullPointerException in
MultiSourceCompoundRichLocation
In-Reply-To:
Message-ID: <200704011703.l31H3FTF011220@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2253
gwaldon at geneinfinity.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|VERIFIED |CLOSED
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From russ at kepler-eng.com Mon Apr 2 18:59:26 2007
From: russ at kepler-eng.com (Russ Kepler)
Date: Mon, 2 Apr 2007 12:59:26 -0600
Subject: [Biojava-dev] Changing the sample name of the ABI file
In-Reply-To:
References:
Message-ID: <200704021259.26814.russ@kepler-eng.com>
On Tuesday 13 February 2007 00:01, Lee Heewook wrote:
> Is there way to change the sample name of the ABI file?
I'm not sure what're you're asking. There's a sample name field in the file
in SMPL, to rewrite that you'd have to rewrite pretty much the whole file.
Most of the time I found it easier to re-export the data from the instrument.
But if you're parsing the name of the file for info (frequently done as
grubbing in the file is a PITA) then you can usually simply change the file
name.
From gwaldon at geneinfinity.org Mon Apr 2 23:56:09 2007
From: gwaldon at geneinfinity.org (george waldon)
Date: Mon, 02 Apr 2007 16:56:09 -0700
Subject: [Biojava-dev] Isoelectric point calculation
Message-ID: <20070402235609.52369.qmail@mmm1924.dulles19-verio.com>
Hi,
Trying to solve the problem of symbol ambiguity in pI calculation that was brought to our attention on Biojava-1, I found a few problems, in particular (!) calculated pI values are incorrect, BinarySearch throw exceptions, and the ResidueProperties.xml has some strange values, such as pK of Glu at -4.25.
The class IsoelectricPointCalc was written a long time ago and I hope to get in touch with the original author and have a corrected code rapidly.
As a general rule, scientific biomethods and biodata put in biojava need precise literature references. Javadocs are a good place for that.
- George
From mark.schreiber at novartis.com Tue Apr 3 01:03:20 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 3 Apr 2007 09:03:20 +0800
Subject: [Biojava-dev] reading a subsequence from a .nib file
Message-ID:
Hi -
Too my knowledge nothing like this exists in BioJava. Could someone take
it the last mile and make it produce SymbolLists?
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
Josh Burdick
Sent by: biojava-dev-bounces at lists.open-bio.org
01/23/2007 12:29 AM
To: biojava-dev at lists.open-bio.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] reading a subsequence from a .nib file
I wrote some code to read a chunk of DNA sequence from a file in Jim
Kent's blat ".nib" file format. This is a simple format using four
bits/base.
I didn't attach the code, to avoid spamming the whole list; but it,
and a (very crude!) JUnit test, are at
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
You could use 2 bits/base, but then you can't have ambiguous bases. 4
bits/base seems like a reasonable compromise; plus sites that have
"blat" installed will need to have the .nib files on a server somewhere
anyway, and this way repeat-masking can be included, which may be
convenient.
Also, it doesn't support writing a .nib file; again, presumably people
will be using Jim Kent's faToNib program to do that.
It would need some tweaking to be included in BioJava, because it
returns a plain String of ACGT, instead of a PackedSequence object.
(Probably this would just involve rewriting the setupBuffer() and
addToBuffer() methods in the code.) Also, the coordinate information
could come from a Range object.
If similar code is already somewhere in BioJava, please ignore this;
but I couldn't find it with thirty seconds of Googling, so I figured it
hadn't been written...
Josh Burdick
programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia
jburdick at keyfitz.org
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From mark.schreiber at novartis.com Tue Apr 3 01:06:11 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 3 Apr 2007 09:06:11 +0800
Subject: [Biojava-dev] Isoelectric point calculation
Message-ID:
Hi George -
This probably should be reported as a bug in bugzilla to make sure we get
around to fixing it.
Thanks,
- Mark
"george waldon"
Sent by: biojava-dev-bounces at lists.open-bio.org
04/03/2007 07:56 AM
Please respond to george waldon
To: biojava-dev at biojava.org
cc: smh1008 at cam.ac.uk, (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] Isoelectric point calculation
Hi,
Trying to solve the problem of symbol ambiguity in pI calculation that was
brought to our attention on Biojava-1, I found a few problems, in
particular (!) calculated pI values are incorrect, BinarySearch throw
exceptions, and the ResidueProperties.xml has some strange values, such as
pK of Glu at -4.25.
The class IsoelectricPointCalc was written a long time ago and I hope to
get in touch with the original author and have a corrected code rapidly.
As a general rule, scientific biomethods and biodata put in biojava need
precise literature references. Javadocs are a good place for that.
- George
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From mark.schreiber at novartis.com Tue Apr 3 01:09:10 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 3 Apr 2007 09:09:10 +0800
Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG
Message-ID:
Hi Caroline -
Could you post some example code that we could use to replicate the
problem?
Thanks.
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
"Caroline Renaux"
Sent by: biojava-dev-bounces at lists.open-bio.org
03/26/2007 09:18 PM
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG
Bonjour,
j'ai r?cemment utilis? le Package org.biojava.bio.symbol et plus
particuli?rement la classe UkkonenSuffixTree. Cependant lorsque que je
veux
ajouter un ensemble de s?quences ? l'arbre et que je les s?pares par le
caract?re de s?paration '$' cel? ne fonctionne pas. Lorsqu'il traite la
seconde s?quence j'obtiens une erreur "NullPointerException" dans la
m?thode
jumpTo ? la ligne :
arrivedAt=(SuffixNode)currentNode.children.get(*new*
Character(source.charAt
(from)));
Je ne comprend pas ce que j'aurai pu faire de travers.
D'avance merci de votre r?ponse.
RENAUX C.
--------------------------------
Hello,
I used for a java application the org.biojava.bio.symbol package and
particularly the UkkonenSuffixTree class. When i want to add a set of
sequences to the tree, i add a '$' between the sequences but it doesn't
work. I have a NullPointerException when the system add the second
sequence
int the method jumTo at the line :
arrivedAt=(SuffixNode)currentNode.children.get(*new*
Character(source.charAt
(from)));
I don't understand why it doesn't work.
Thank you in advance.
RENAUX C.
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 04:03:53 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 00:03:53 -0400
Subject: [Biojava-dev] [Bug 2244] uniprot files do not load
In-Reply-To:
Message-ID: <200704030403.l3343rrF032035@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2244
------- Comment #9 from gwaldon at geneinfinity.org 2007-04-03 00:03 EST -------
*** Bug 2249 has been marked as a duplicate of this bug. ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 07:12:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 03:12:48 -0400
Subject: [Biojava-dev] [Bug 2258] New: ConcurrentModificationException in
SimpleRichAnnotation
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2258
Summary: ConcurrentModificationException in SimpleRichAnnotation
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: seq
AssignedTo: biojava-dev at biojava.org
ReportedBy: gwaldon at geneinfinity.org
Exception thrown by the method clear(), apparently resulting of trying to
change the note set while iterating over it.
- George
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From holland at ebi.ac.uk Tue Apr 3 10:00:01 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 03 Apr 2007 11:00:01 +0100
Subject: [Biojava-dev] JDBCPooledDataSource regression
In-Reply-To: <45C090A6.10909@ebi.ac.uk>
References: <416B41DF-91E1-4D1F-A4B4-799FE712B032@sanger.ac.uk> <45C0781B.7030304@ebi.ac.uk> <5DDBC6F5-7DC3-4446-A982-3CD9B3931A06@sanger.ac.uk> <45C08639.7080600@ebi.ac.uk> <8689C307-0643-46D9-90A6-A9958681D1D0@sanger.ac.uk> <45C08B7A.3060102@ebi.ac.uk>
<45C08E4E.2050308@ebi.ac.uk> <45C090A6.10909@ebi.ac.uk>
Message-ID: <461225A1.4000705@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Great stuff. You should commit it when you get your CVS account. :)
There's one or two typos (can't spell deprecated!) but I'm sure once you
get the rest of BioJava into Eclipse or something to make this change
permanent they'll show up.
cheers,
Richard
Andy Yates wrote:
> Okay I've attached the fix here.
>
> I just did this in a text editor but I believe that the imports are
> okay. If you can just do a quick scan as well to make sure I haven't
> deleted anything that was very important.
>
> I'll get on to the helpdesk now as well :)
>
> Andy
>
> Richard Holland wrote:
> Andy could you make the change to your local copy of the source file and
> email the file to me, that way I can make sure I don't get it wrong when
> I commit it.
>
> Richard.
>
> PS. You should probably have your own CVS account - email the OBF
> helpdesk and ask for one, saying I told you to. :)
>
>
> Andy Yates wrote:
>>>> Thomas Down wrote:
>>>>> On 31 Jan 2007, at 12:06, Andy Yates wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Sorry I was meaning if that if that method just becomes:
>>>>>>
>>>>>> public static DataSource getDataSource(final String driver,
>>>>>> final String url,
>>>>>> final String user,
>>>>>> final String pass)
>>>>>> throws Exception {
>>>>>>
>>>>>> BasicDataSource ds = new BasicDataSource();
>>>>>> ds.setUrl(url);
>>>>>> ds.setDriverClassName(driver);
>>>>>> ds.setUsername(user);
>>>>>> ds.setPassword(pass);
>>>>>> // Set BasicDataSource properties such as maxActive and
>>>>>> maxIdle, as described in
>>>>>> //
>>>>>> http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html
>>>>>>
>>>>>> ds.setMaxActive(10);
>>>>>> ds.setMaxIdle(5);
>>>>>> ds.setMaxWait(10000);
>>>>>>
>>>>>> return ds;
>>>>>> }
>>>>>>
>>>>>> Does that still work?
>>>>> Hmmm, I was assuming that BasicDataSource didn't actually do any
>>>>> pooling itself, and that you needed another layer on top to manage a
>>>>> connection pool -- that seems to be how all previous revisions of
>>>>> JDBCConnectionPool worked, so I guess I wasn't alone in thinking
>>>>> this. But yes, BasicDataSource does seem to do pooling itself
>>>>> (confirmed by reading the source), so maybe your simpler version is
>>>>> a better idea. It certainly works okay for me.
>>>>>
>>>>> Thomas.
>>>> That's what I thought should have happened :). Can I suggest that
>>>> this revised version goes into CVS? Anyone got any objections?
>>>>
>>>> Andy
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
> ------------------------------------------------------------------------
> /*
> * BioJava development code
> *
> * This code may be freely distributed and modified under the
> * terms of the GNU Lesser General Public Licence. This should
> * be distributed with the code. If you do not have a copy,
> * see:
> *
> * http://www.gnu.org/copyleft/lesser.html
> *
> * Copyright for this code is held jointly by the individual
> * authors. These should be listed in @author doc comments.
> *
> * For more information on the BioJava project and its aims,
> * or to join the biojava-l mailing list, visit the home page
> * at:
> *
> * http://www.biojava.org/
> *
> */
> package org.biojava.utils;
> import javax.sql.DataSource;
> import org.apache.commons.dbcp.BasicDataSource;
> import org.apache.commons.dbcp.PoolingDataSource;
> import org.apache.commons.pool.ObjectPool;
> /**
> * Returns a DataSource that implements connection pooling
> *
> * Uses Jakarta Commons DBCP and Pool packages.
> * See the description of the dbcp package at
> * http://jakarta.apache.org/commons/dbcp/api/overview-summary.html#overview_description
> *
> * @author Simon Foote
> * @author Len Trigg
> */
> public class JDBCPooledDataSource {
> public static DataSource getDataSource(final String driver,
> final String url,
> final String user,
> final String pass)
> throws Exception {
> BasicDataSource ds = new BasicDataSource();
> ds.setUrl(url);
> ds.setDriverClassName(driver);
> ds.setUsername(user);
> ds.setPassword(pass);
> // Set BasicDataSource properties such as maxActive and maxIdle, as described in
> // http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html
> ds.setMaxActive(10);
> ds.setMaxIdle(5);
> ds.setMaxWait(10000);
> return dataSource;
> }
> // Adds simple equals and hashcode methods so that we can compare if
> // two connections are to the same database. This will fail if the
> // DataSource is redirected to another database etc (I doubt this is
> // ever likely to be used).
> /**
> * @depercated This is no longer used in favor of {@link BasicDataSource}
> * from DBCP
> */
> static class MyPoolingDataSource extends PoolingDataSource {
> final String source;
> public MyPoolingDataSource(ObjectPool connectionPool, String source) {
> super(connectionPool);
> this.source = source;
> }
> public boolean equals(Object o2) {
> if ((o2 == null) || !(o2 instanceof MyPoolingDataSource)) {
> return false;
> }
> MyPoolingDataSource b2 = (MyPoolingDataSource) o2;
> return source.equals(b2.source);
> }
> public int hashCode() {
> return source.hashCode();
> }
> }
> public static void main(String[] args) {
> try {
> DataSource ds1 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", "");
> DataSource ds2 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", "");
> System.err.println(ds1);
> System.err.println(ds2);
> System.err.println(ds1.equals(ds2));
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
> }
> ------------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGEiWh4C5LeMEKA/QRAgYbAJ4yoE6dsLuOOS8sg1wOCybV6rsNUwCeN0c8
oiFz/0yblV4P8a35RbU+nDM=
=imiK
-----END PGP SIGNATURE-----
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 10:05:22 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 06:05:22 -0400
Subject: [Biojava-dev] [Bug 2038] test bug
In-Reply-To:
Message-ID: <200704031005.l33A5M9N015780@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2038
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |INVALID
------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:05 EST -------
This has been lying around for ages, thought I'd tidy it up. :)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 10:20:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 06:20:48 -0400
Subject: [Biojava-dev] [Bug 2258] ConcurrentModificationException in
SimpleRichAnnotation
In-Reply-To:
Message-ID: <200704031020.l33AKmpP016736@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2258
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:20 EST -------
Fixed later today in CVS. Test also added.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 3 10:23:54 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Apr 2007 06:23:54 -0400
Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer
In-Reply-To:
Message-ID: <200704031023.l33ANsV1016903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2107
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from holland at ebi.ac.uk 2007-04-03 06:23 EST -------
I have committed Jolyon's changes (or will do so later today). These are
untestable with a JUnit so no test has been added.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Apr 4 05:37:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Apr 2007 01:37:59 -0400
Subject: [Biojava-dev] [Bug 2260] New: Bug in UkkonenSuffixTree
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2260
Summary: Bug in UkkonenSuffixTree
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Linux
Status: NEW
Severity: minor
Priority: P2
Component: symbol
AssignedTo: biojava-dev at biojava.org
ReportedBy: mark.schreiber at novartis.com
There is a bug in the UkkonenSuffixTree when one tries to add concatenated
Strings which are delimited with a $. This doesn't seem to be a problem when
Strings are added individually. A simple work around is to add Strings
individually. The following code causes the bug:
public class Main {
/** Creates a new instance of Main */
public Main() {
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
String seqs = "atcgcgcgcgctcggcctgggggctcgcgct$acgggtggtggt";
UkkonenSuffixTree suff = new UkkonenSuffixTree(seqs);
}
}
Someone with a better knowledge of suffix trees than I have would need to look
at this...
Additionally there are several places in code where variables are declared and
never used or declared globally when they need not be or are not declared final
when they are never modified. There is also System.out.println() statements
that should be messages in exceptions or errors. The code could do with a good
clean up.
I am marking this as minor because there is a work around. The simplest thing
might be to disable the advertised feature of being able to deal with
concatenated strings as it seems it cannot and probably never has been able to.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Apr 4 05:46:02 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Apr 2007 01:46:02 -0400
Subject: [Biojava-dev] [Bug 2261] New: Request for enhancement of
RichSequence
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
Summary: Request for enhancement of RichSequence
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: seq
AssignedTo: biojava-dev at biojava.org
ReportedBy: mark.schreiber at novartis.com
RichSequence implements FeatureHolder but it would be nice if it could
implement RichFeature holder. This would require the addition of four methods
to any implementations of the RichSequence interface but would avoid endless
casting.
If we do it before an official release of bj1.5 we won't strictly be breaking
the interface. Is there any reason why we should not do this??
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From dag at sonsorol.org Sat Apr 7 01:52:01 2007
From: dag at sonsorol.org (Chris Dagdigian)
Date: Fri, 6 Apr 2007 21:52:01 -0400
Subject: [Biojava-dev] Fwd: Bug in
org/biojava/utils/io/UncompressInputStream.java
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com>
Message-ID: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
Passing on this email that came to me ...
Regards,
Chris Dagdigian
OBF
Begin forwarded message:
> From: "Miguel Duarte"
> Date: April 6, 2007 2:16:52 PM EDT
> To: dag at sonsorol.org
> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>
> Hi Chris,
>
>> From http://sourceforge.net/project/shownotes.php?
>> release_id=314770&group_id=18598,
> i've learned that you're maintaining the class
> org/biojava/utils/io/UncompressInputStream.java. If that's not the
> case please forward this mail to the maintainer.
>
> I've discovered a nasty bug: With some read block sizes the algorithm
> truncates a few bytes from the end of the stream. I've verified this
> comparing the gzip/uncompress output for some files versus what
> org/biojava/utils/io/UncompressInputStream.java generates.
>
> Unfortunately i've not discovered the bug yet, but i can contribute
> with the attached test case. How to verify the bug:
> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
> compare the results.
>
> Thanks,
> Miguel Duarte
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BH_03834.MCR.Z
Type: application/x-compress
Size: 26405 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uncompressed_by_gzip
Type: application/octet-stream
Size: 81920 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uncompressed_by_uncompressInputStream
Type: application/octet-stream
Size: 81832 bytes
Desc: not available
URL:
-------------- next part --------------
From mark.schreiber at novartis.com Mon Apr 9 02:13:12 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Mon, 9 Apr 2007 10:13:12 +0800
Subject: [Biojava-dev] Fwd: Bug
in org/biojava/utils/io/UncompressInputStream.java
Message-ID:
Does anyone maintain this class??
More to the point, does anyone know what it is for??? If I look at the
Uses link in javadoc there are aparently none at the public or package
level. Additionally why does biojava need one, are there not java.io
classes that can handle compressed streams??
Is there a good reason why we cannot just clean it out?
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
Chris Dagdigian
Sent by: biojava-dev-bounces at lists.open-bio.org
04/07/2007 09:52 AM
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
Passing on this email that came to me ...
Regards,
Chris Dagdigian
OBF
Begin forwarded message:
> From: "Miguel Duarte"
> Date: April 6, 2007 2:16:52 PM EDT
> To: dag at sonsorol.org
> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>
> Hi Chris,
>
>> From http://sourceforge.net/project/shownotes.php?
>> release_id=314770&group_id=18598,
> i've learned that you're maintaining the class
> org/biojava/utils/io/UncompressInputStream.java. If that's not the
> case please forward this mail to the maintainer.
>
> I've discovered a nasty bug: With some read block sizes the algorithm
> truncates a few bytes from the end of the stream. I've verified this
> comparing the gzip/uncompress output for some files versus what
> org/biojava/utils/io/UncompressInputStream.java generates.
>
> Unfortunately i've not discovered the bug yet, but i can contribute
> with the attached test case. How to verify the bug:
> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
> compare the results.
>
> Thanks,
> Miguel Duarte
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
[ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
[ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
[ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
Schreiber ]
From holland at ebi.ac.uk Tue Apr 10 08:59:32 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 09:59:32 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To:
References:
Message-ID: <461B51F4.8010402@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I have no idea what it is for. There are generic Java classes provided
with the SDK that do the same job. I think we should probably drop it.
Lets wait to see if anyone shouts first.
mark.schreiber at novartis.com wrote:
> Does anyone maintain this class??
>
> More to the point, does anyone know what it is for??? If I look at the
> Uses link in javadoc there are aparently none at the public or package
> level. Additionally why does biojava need one, are there not java.io
> classes that can handle compressed streams??
>
> Is there a good reason why we cannot just clean it out?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> Chris Dagdigian
> Sent by: biojava-dev-bounces at lists.open-bio.org
> 04/07/2007 09:52 AM
>
>
> To: biojava-dev at biojava.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>
>
>
> Passing on this email that came to me ...
>
> Regards,
> Chris Dagdigian
> OBF
>
>
> Begin forwarded message:
>
>> From: "Miguel Duarte"
>> Date: April 6, 2007 2:16:52 PM EDT
>> To: dag at sonsorol.org
>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>> Hi Chris,
>>
>>> From http://sourceforge.net/project/shownotes.php?
>>> release_id=314770&group_id=18598,
>> i've learned that you're maintaining the class
>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>> case please forward this mail to the maintainer.
>>
>> I've discovered a nasty bug: With some read block sizes the algorithm
>> truncates a few bytes from the end of the stream. I've verified this
>> comparing the gzip/uncompress output for some files versus what
>> org/biojava/utils/io/UncompressInputStream.java generates.
>>
>> Unfortunately i've not discovered the bug yet, but i can contribute
>> with the attached test case. How to verify the bug:
>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>> compare the results.
>>
>> Thanks,
>> Miguel Duarte
>
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
> Schreiber ]
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
HoCuWrx5k2ONg/9oxIfVVPI=
=cGTy
-----END PGP SIGNATURE-----
From ayates at ebi.ac.uk Tue Apr 10 09:56:57 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 10:56:57 +0100
Subject: [Biojava-dev]
Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B51F4.8010402@ebi.ac.uk>
References:
<461B51F4.8010402@ebi.ac.uk>
Message-ID: <461B5F69.9060506@ebi.ac.uk>
I don't think there are standard classes for this compression format in
the SDK. There are ones for GZIP & ZIP but not for LZW which this one is
dealing with. Also I'm not sure about using GZIP to unzip a file
compressed with LZW since GZIP uses DEFLATE.
We need to decompress the file using uncompress (which is missing from
my Linux box but is on the mac ... go figure) and then match that up to
the output from UncompressInputStream & see if they agree or not.
Andy
Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I have no idea what it is for. There are generic Java classes provided
> with the SDK that do the same job. I think we should probably drop it.
> Lets wait to see if anyone shouts first.
>
> mark.schreiber at novartis.com wrote:
>> Does anyone maintain this class??
>>
>> More to the point, does anyone know what it is for??? If I look at the
>> Uses link in javadoc there are aparently none at the public or package
>> level. Additionally why does biojava need one, are there not java.io
>> classes that can handle compressed streams??
>>
>> Is there a good reason why we cannot just clean it out?
>>
>> - Mark
>>
>> Mark Schreiber
>> Research Investigator (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax +65 6722 2910
>>
>>
>>
>>
>>
>> Chris Dagdigian
>> Sent by: biojava-dev-bounces at lists.open-bio.org
>> 04/07/2007 09:52 AM
>>
>>
>> To: biojava-dev at biojava.org
>> cc: (bcc: Mark Schreiber/GP/Novartis)
>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>>
>>
>> Passing on this email that came to me ...
>>
>> Regards,
>> Chris Dagdigian
>> OBF
>>
>>
>> Begin forwarded message:
>>
>>> From: "Miguel Duarte"
>>> Date: April 6, 2007 2:16:52 PM EDT
>>> To: dag at sonsorol.org
>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>> Hi Chris,
>>>
>>>> From http://sourceforge.net/project/shownotes.php?
>>>> release_id=314770&group_id=18598,
>>> i've learned that you're maintaining the class
>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>> case please forward this mail to the maintainer.
>>>
>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>> truncates a few bytes from the end of the stream. I've verified this
>>> comparing the gzip/uncompress output for some files versus what
>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>
>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>> with the attached test case. How to verify the bug:
>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>> compare the results.
>>>
>>> Thanks,
>>> Miguel Duarte
>>
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
>> Schreiber ]
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
> HoCuWrx5k2ONg/9oxIfVVPI=
> =cGTy
> -----END PGP SIGNATURE-----
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From ayates at ebi.ac.uk Tue Apr 10 10:03:40 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 11:03:40 +0100
Subject: [Biojava-dev]
Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B5F69.9060506@ebi.ac.uk>
References: <461B51F4.8010402@ebi.ac.uk>
<461B5F69.9060506@ebi.ac.uk>
Message-ID: <461B60FC.7040903@ebi.ac.uk>
Okay a quick run of uncompress on the mac with the files in question
does produce a file which is equivalent to the file produced by gzip but
not to the one produced by UncompressInputStream.
The required md5sum for a pass should be (after a md5 digest):
9f0924237d20288793172091d61f85b8 uncompressed_by_gzip
But we get:
17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream
Okay so looks like there is something "wrong". Seems like it drops 88
bytes from the decompression.
Wonder what happens if we pass this file type through the
GZIPInputStream from the JDK?
Andy Yates wrote:
> I don't think there are standard classes for this compression format in
> the SDK. There are ones for GZIP & ZIP but not for LZW which this one is
> dealing with. Also I'm not sure about using GZIP to unzip a file
> compressed with LZW since GZIP uses DEFLATE.
>
> We need to decompress the file using uncompress (which is missing from
> my Linux box but is on the mac ... go figure) and then match that up to
> the output from UncompressInputStream & see if they agree or not.
>
> Andy
>
> Richard Holland wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I have no idea what it is for. There are generic Java classes provided
>> with the SDK that do the same job. I think we should probably drop it.
>> Lets wait to see if anyone shouts first.
>>
>> mark.schreiber at novartis.com wrote:
>>> Does anyone maintain this class??
>>>
>>> More to the point, does anyone know what it is for??? If I look at the
>>> Uses link in javadoc there are aparently none at the public or package
>>> level. Additionally why does biojava need one, are there not java.io
>>> classes that can handle compressed streams??
>>>
>>> Is there a good reason why we cannot just clean it out?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax +65 6722 2910
>>>
>>>
>>>
>>>
>>>
>>> Chris Dagdigian
>>> Sent by: biojava-dev-bounces at lists.open-bio.org
>>> 04/07/2007 09:52 AM
>>>
>>>
>>> To: biojava-dev at biojava.org
>>> cc: (bcc: Mark Schreiber/GP/Novartis)
>>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>>
>>>
>>> Passing on this email that came to me ...
>>>
>>> Regards,
>>> Chris Dagdigian
>>> OBF
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: "Miguel Duarte"
>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>> To: dag at sonsorol.org
>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>
>>>> Hi Chris,
>>>>
>>>>> From http://sourceforge.net/project/shownotes.php?
>>>>> release_id=314770&group_id=18598,
>>>> i've learned that you're maintaining the class
>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>> case please forward this mail to the maintainer.
>>>>
>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>> truncates a few bytes from the end of the stream. I've verified this
>>>> comparing the gzip/uncompress output for some files versus what
>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>
>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>> with the attached test case. How to verify the bug:
>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>> compare the results.
>>>>
>>>> Thanks,
>>>> Miguel Duarte
>>>
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
>>> Schreiber ]
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
>> HoCuWrx5k2ONg/9oxIfVVPI=
>> =cGTy
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From holland at ebi.ac.uk Tue Apr 10 10:09:04 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 11:09:04 +0100
Subject: [Biojava-dev] Fwd: Bug
in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com>
<2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
Message-ID: <461B6240.2070907@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
AFAIK the Zip algorithm is just LZW with bells on, so it should produce
exactly the same results.
Chris Dagdigian wrote:
>
> Passing on this email that came to me ...
>
> Regards,
> Chris Dagdigian
> OBF
>
>
> Begin forwarded message:
>
>> From: "Miguel Duarte"
>> Date: April 6, 2007 2:16:52 PM EDT
>> To: dag at sonsorol.org
>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>> Hi Chris,
>>
>>> From
>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>
>> i've learned that you're maintaining the class
>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>> case please forward this mail to the maintainer.
>>
>> I've discovered a nasty bug: With some read block sizes the algorithm
>> truncates a few bytes from the end of the stream. I've verified this
>> comparing the gzip/uncompress output for some files versus what
>> org/biojava/utils/io/UncompressInputStream.java generates.
>>
>> Unfortunately i've not discovered the bug yet, but i can contribute
>> with the attached test case. How to verify the bug:
>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>> compare the results.
>>
>> Thanks,
>> Miguel Duarte
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG2JA4C5LeMEKA/QRAjutAJ9cZbqpoag2Z5aQd4gbOAiMm78VZACdHzER
UoIhheyTE1805rMBzG4R+Q0=
=hfN2
-----END PGP SIGNATURE-----
From ayates at ebi.ac.uk Tue Apr 10 10:12:58 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 11:12:58 +0100
Subject: [Biojava-dev]
Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B60FC.7040903@ebi.ac.uk>
References: <461B51F4.8010402@ebi.ac.uk>
<461B5F69.9060506@ebi.ac.uk> <461B60FC.7040903@ebi.ac.uk>
Message-ID: <461B632A.10101@ebi.ac.uk>
Quick program to pass it through the GZIPInputStream chucks an
IOException saying it's not in the GZIP format (which it isn't). Also
passing it through the ZipInputStream seems to do nothing.
At any rate it looks like we cannot get rid of this class; it's got to
be fixed/maintained
Andy Yates wrote:
> Okay a quick run of uncompress on the mac with the files in question
> does produce a file which is equivalent to the file produced by gzip but
> not to the one produced by UncompressInputStream.
>
> The required md5sum for a pass should be (after a md5 digest):
>
> 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip
>
> But we get:
>
> 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream
>
> Okay so looks like there is something "wrong". Seems like it drops 88
> bytes from the decompression.
>
> Wonder what happens if we pass this file type through the
> GZIPInputStream from the JDK?
>
> Andy Yates wrote:
>> I don't think there are standard classes for this compression format
>> in the SDK. There are ones for GZIP & ZIP but not for LZW which this
>> one is dealing with. Also I'm not sure about using GZIP to unzip a
>> file compressed with LZW since GZIP uses DEFLATE.
>>
>> We need to decompress the file using uncompress (which is missing from
>> my Linux box but is on the mac ... go figure) and then match that up
>> to the output from UncompressInputStream & see if they agree or not.
>>
>> Andy
>>
>> Richard Holland wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> I have no idea what it is for. There are generic Java classes provided
>>> with the SDK that do the same job. I think we should probably drop it.
>>> Lets wait to see if anyone shouts first.
>>>
>>> mark.schreiber at novartis.com wrote:
>>>> Does anyone maintain this class??
>>>>
>>>> More to the point, does anyone know what it is for??? If I look at
>>>> the Uses link in javadoc there are aparently none at the public or
>>>> package level. Additionally why does biojava need one, are there not
>>>> java.io classes that can handle compressed streams??
>>>>
>>>> Is there a good reason why we cannot just clean it out?
>>>>
>>>> - Mark
>>>>
>>>> Mark Schreiber
>>>> Research Investigator (Bioinformatics)
>>>>
>>>> Novartis Institute for Tropical Diseases (NITD)
>>>> 10 Biopolis Road
>>>> #05-01 Chromos
>>>> Singapore 138670
>>>> www.nitd.novartis.com
>>>>
>>>> phone +65 6722 2973
>>>> fax +65 6722 2910
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Chris Dagdigian
>>>> Sent by: biojava-dev-bounces at lists.open-bio.org
>>>> 04/07/2007 09:52 AM
>>>>
>>>>
>>>> To: biojava-dev at biojava.org
>>>> cc: (bcc: Mark Schreiber/GP/Novartis)
>>>> Subject: [Biojava-dev] Fwd: Bug in
>>>> org/biojava/utils/io/UncompressInputStream.java
>>>>
>>>>
>>>>
>>>> Passing on this email that came to me ...
>>>>
>>>> Regards,
>>>> Chris Dagdigian
>>>> OBF
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: "Miguel Duarte"
>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>> To: dag at sonsorol.org
>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>>> From http://sourceforge.net/project/shownotes.php?
>>>>>> release_id=314770&group_id=18598,
>>>>> i've learned that you're maintaining the class
>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>> case please forward this mail to the maintainer.
>>>>>
>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>> comparing the gzip/uncompress output for some files versus what
>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>
>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>> with the attached test case. How to verify the bug:
>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>> compare the results.
>>>>>
>>>>> Thanks,
>>>>> Miguel Duarte
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by
>>>> Mark Schreiber ]
>>>>
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
>>> HoCuWrx5k2ONg/9oxIfVVPI=
>>> =cGTy
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From holland at ebi.ac.uk Tue Apr 10 10:37:36 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 11:37:36 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B6443.9060300@ebi.ac.uk>
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
<461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk>
Message-ID: <461B68F0.4000908@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Why are these files in compress/uncompress format? Is it proprietary
software creating them, or a legacy system of some kind? Wouldn't gzip
give better results both in terms of compression ratios and performance
as it is far more up-to-date?
I believe that the JDK doesn't support LZW because LZW was patented, and
that patent expired only very recently (in 2003/4/5/6 depending on where
you live and in what form you use LZW):
http://www.gnu.org/philosophy/gif.html
It's one of those wonderful cases where the patent enforcement caused
the algorithm it was protecting to get dumped and forgotten because
nobody wanted to pay for it. Apart from *nix compress/uncompress and
inside the GIF format I'm not sure it's actually used anywhere else any
more.
Technically we infringed the patent by including LZW support in BioJava,
but now the patent has expired we no longer need to worry.
Question is, do we need to fix this inherently computer-science problem
which is entirely unrelated to biology or bioinformatics, or can we just
get people to use an alternative library instead which supports it
better and is more generic? They are out there, for instance:
http://www.chilkatsoft.com/java-zip.asp
cheers,
Richard
Andy Yates wrote:
> Seems very strange this does. I don't know much about decompression but
> by the looks of things LZW isn't supported by the JDK.
>
> Richard Holland wrote:
> AFAIK the Zip algorithm is just LZW with bells on, so it should produce
> exactly the same results.
>
> Chris Dagdigian wrote:
>>>> Passing on this email that came to me ...
>>>>
>>>> Regards,
>>>> Chris Dagdigian
>>>> OBF
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: "Miguel Duarte"
>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>> To: dag at sonsorol.org
>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>>> From
>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>>>>
>>>>>>
>>>>> i've learned that you're maintaining the class
>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>> case please forward this mail to the maintainer.
>>>>>
>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>> comparing the gzip/uncompress output for some files versus what
>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>
>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>> with the attached test case. How to verify the bug:
>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>> compare the results.
>>>>>
>>>>> Thanks,
>>>>> Miguel Duarte
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4
CwgzY5S7KELC3TA1oKKtjUw=
=9xEM
-----END PGP SIGNATURE-----
From ayates at ebi.ac.uk Tue Apr 10 10:54:27 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 10 Apr 2007 11:54:27 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B68F0.4000908@ebi.ac.uk>
References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org>
<461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk>
<461B68F0.4000908@ebi.ac.uk>
Message-ID: <461B6CE3.303@ebi.ac.uk>
I guess it all depends really on what is the software that is producing
these files. If it is something very common to Bioinformatics we might
have to accept that support needs to come in from somewhere; and by the
looks of things the techniques for compression are quite varied (the man
for compress mentions things about adaptive dictionaries and the alike).
Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Why are these files in compress/uncompress format? Is it proprietary
> software creating them, or a legacy system of some kind? Wouldn't gzip
> give better results both in terms of compression ratios and performance
> as it is far more up-to-date?
>
> I believe that the JDK doesn't support LZW because LZW was patented, and
> that patent expired only very recently (in 2003/4/5/6 depending on where
> you live and in what form you use LZW):
>
> http://www.gnu.org/philosophy/gif.html
>
> It's one of those wonderful cases where the patent enforcement caused
> the algorithm it was protecting to get dumped and forgotten because
> nobody wanted to pay for it. Apart from *nix compress/uncompress and
> inside the GIF format I'm not sure it's actually used anywhere else any
> more.
>
> Technically we infringed the patent by including LZW support in BioJava,
> but now the patent has expired we no longer need to worry.
>
> Question is, do we need to fix this inherently computer-science problem
> which is entirely unrelated to biology or bioinformatics, or can we just
> get people to use an alternative library instead which supports it
> better and is more generic? They are out there, for instance:
>
> http://www.chilkatsoft.com/java-zip.asp
>
> cheers,
> Richard
>
> Andy Yates wrote:
>> Seems very strange this does. I don't know much about decompression but
>> by the looks of things LZW isn't supported by the JDK.
>>
>> Richard Holland wrote:
>> AFAIK the Zip algorithm is just LZW with bells on, so it should produce
>> exactly the same results.
>>
>> Chris Dagdigian wrote:
>>>>> Passing on this email that came to me ...
>>>>>
>>>>> Regards,
>>>>> Chris Dagdigian
>>>>> OBF
>>>>>
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>>> From: "Miguel Duarte"
>>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>>> To: dag at sonsorol.org
>>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>>> From
>>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>>>>>
>>>>>>>
>>>>>> i've learned that you're maintaining the class
>>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>>> case please forward this mail to the maintainer.
>>>>>>
>>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>>> comparing the gzip/uncompress output for some files versus what
>>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>>
>>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>>> with the attached test case. How to verify the bug:
>>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>>> compare the results.
>>>>>>
>>>>>> Thanks,
>>>>>> Miguel Duarte
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4
> CwgzY5S7KELC3TA1oKKtjUw=
> =9xEM
> -----END PGP SIGNATURE-----
From ap3 at sanger.ac.uk Tue Apr 10 10:27:39 2007
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 10 Apr 2007 11:27:39 +0100
Subject: [Biojava-dev] Fwd: Bug
in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To:
References:
Message-ID:
Hi!
I committed this class a while ago, since I did not find any other way
to read .Z compressed files.
Unfortunately PDB files are often stored like that ...
If anybody has a suggestion how to read unix compressed files (.Z) in a
better way, I would be glad to hear.
Parsing them as Zip or GZip did not work in my trials...
Andreas
On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote:
> Does anyone maintain this class??
>
> More to the point, does anyone know what it is for??? If I look at the
> Uses link in javadoc there are aparently none at the public or package
> level. Additionally why does biojava need one, are there not java.io
> classes that can handle compressed streams??
>
> Is there a good reason why we cannot just clean it out?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> Chris Dagdigian
> Sent by: biojava-dev-bounces at lists.open-bio.org
> 04/07/2007 09:52 AM
>
>
> To: biojava-dev at biojava.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-dev] Fwd: Bug in
> org/biojava/utils/io/UncompressInputStream.java
>
>
>
> Passing on this email that came to me ...
>
> Regards,
> Chris Dagdigian
> OBF
>
>
> Begin forwarded message:
>
>> From: "Miguel Duarte"
>> Date: April 6, 2007 2:16:52 PM EDT
>> To: dag at sonsorol.org
>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>
>> Hi Chris,
>>
>>> From http://sourceforge.net/project/shownotes.php?
>>> release_id=314770&group_id=18598,
>> i've learned that you're maintaining the class
>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>> case please forward this mail to the maintainer.
>>
>> I've discovered a nasty bug: With some read block sizes the algorithm
>> truncates a few bytes from the end of the stream. I've verified this
>> comparing the gzip/uncompress output for some files versus what
>> org/biojava/utils/io/UncompressInputStream.java generates.
>>
>> Unfortunately i've not discovered the bug yet, but i can contribute
>> with the attached test case. How to verify the bug:
>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>> compare the results.
>>
>> Thanks,
>> Miguel Duarte
>
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
> Schreiber ]
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
From holland at ebi.ac.uk Tue Apr 10 11:01:36 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 10 Apr 2007 12:01:36 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To:
References:
Message-ID: <461B6E90.3090501@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andreas - did you write the class? If so, then you may understand it
better than the rest of us. Would you be willing to attempt to fix it?
cheers,
Richard
Andreas Prlic wrote:
> Hi!
>
> I committed this class a while ago, since I did not find any other way
> to read .Z compressed files.
>
> Unfortunately PDB files are often stored like that ...
>
> If anybody has a suggestion how to read unix compressed files (.Z) in a
> better way, I would be glad to hear.
>
> Parsing them as Zip or GZip did not work in my trials...
>
> Andreas
>
>
>
>
>
> On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote:
>
>> Does anyone maintain this class??
>>
>> More to the point, does anyone know what it is for??? If I look at the
>> Uses link in javadoc there are aparently none at the public or package
>> level. Additionally why does biojava need one, are there not java.io
>> classes that can handle compressed streams??
>>
>> Is there a good reason why we cannot just clean it out?
>>
>> - Mark
>>
>> Mark Schreiber
>> Research Investigator (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax +65 6722 2910
>>
>>
>>
>>
>>
>> Chris Dagdigian
>> Sent by: biojava-dev-bounces at lists.open-bio.org
>> 04/07/2007 09:52 AM
>>
>>
>> To: biojava-dev at biojava.org
>> cc: (bcc: Mark Schreiber/GP/Novartis)
>> Subject: [Biojava-dev] Fwd: Bug in
>> org/biojava/utils/io/UncompressInputStream.java
>>
>>
>>
>> Passing on this email that came to me ...
>>
>> Regards,
>> Chris Dagdigian
>> OBF
>>
>>
>> Begin forwarded message:
>>
>>> From: "Miguel Duarte"
>>> Date: April 6, 2007 2:16:52 PM EDT
>>> To: dag at sonsorol.org
>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>> Hi Chris,
>>>
>>>> From http://sourceforge.net/project/shownotes.php?
>>>> release_id=314770&group_id=18598,
>>> i've learned that you're maintaining the class
>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>> case please forward this mail to the maintainer.
>>>
>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>> truncates a few bytes from the end of the stream. I've verified this
>>> comparing the gzip/uncompress output for some files versus what
>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>
>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>> with the attached test case. How to verify the bug:
>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>> compare the results.
>>>
>>> Thanks,
>>> Miguel Duarte
>>
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
>> Schreiber ]
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
> -----------------------------------------------------------------------
>
> Andreas Prlic Wellcome Trust Sanger Institute
> Hinxton, Cambridge CB10 1SA, UK
> +44 (0) 1223 49 6891
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG26Q4C5LeMEKA/QRAiMxAJ4u4RUjTGODjClIM1LIRzP12xNUOgCgifA+
14CbPaY5SwcG1/wUHJVpl/U=
=wBDT
-----END PGP SIGNATURE-----
From markjschreiber at gmail.com Tue Apr 10 11:29:03 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Apr 2007 19:29:03 +0800
Subject: [Biojava-dev] Fwd: Bug in
org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B60FC.7040903@ebi.ac.uk>
References:
<461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk>
<461B60FC.7040903@ebi.ac.uk>
Message-ID: <93b45ca50704100429u388f5b8ax86ba05e5d05e02a9@mail.gmail.com>
Without looking at the code I would guess that dropping 88 bytes could
be because of a buffered reader or writer not flushing before it is
closed??
- Mark
On 4/10/07, Andy Yates wrote:
> Okay a quick run of uncompress on the mac with the files in question
> does produce a file which is equivalent to the file produced by gzip but
> not to the one produced by UncompressInputStream.
>
> The required md5sum for a pass should be (after a md5 digest):
>
> 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip
>
> But we get:
>
> 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream
>
> Okay so looks like there is something "wrong". Seems like it drops 88
> bytes from the decompression.
>
> Wonder what happens if we pass this file type through the
> GZIPInputStream from the JDK?
>
> Andy Yates wrote:
> > I don't think there are standard classes for this compression format in
> > the SDK. There are ones for GZIP & ZIP but not for LZW which this one is
> > dealing with. Also I'm not sure about using GZIP to unzip a file
> > compressed with LZW since GZIP uses DEFLATE.
> >
> > We need to decompress the file using uncompress (which is missing from
> > my Linux box but is on the mac ... go figure) and then match that up to
> > the output from UncompressInputStream & see if they agree or not.
> >
> > Andy
> >
> > Richard Holland wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> I have no idea what it is for. There are generic Java classes provided
> >> with the SDK that do the same job. I think we should probably drop it.
> >> Lets wait to see if anyone shouts first.
> >>
> >> mark.schreiber at novartis.com wrote:
> >>> Does anyone maintain this class??
> >>>
> >>> More to the point, does anyone know what it is for??? If I look at the
> >>> Uses link in javadoc there are aparently none at the public or package
> >>> level. Additionally why does biojava need one, are there not java.io
> >>> classes that can handle compressed streams??
> >>>
> >>> Is there a good reason why we cannot just clean it out?
> >>>
> >>> - Mark
> >>>
> >>> Mark Schreiber
> >>> Research Investigator (Bioinformatics)
> >>>
> >>> Novartis Institute for Tropical Diseases (NITD)
> >>> 10 Biopolis Road
> >>> #05-01 Chromos
> >>> Singapore 138670
> >>> www.nitd.novartis.com
> >>>
> >>> phone +65 6722 2973
> >>> fax +65 6722 2910
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Chris Dagdigian
> >>> Sent by: biojava-dev-bounces at lists.open-bio.org
> >>> 04/07/2007 09:52 AM
> >>>
> >>>
> >>> To: biojava-dev at biojava.org
> >>> cc: (bcc: Mark Schreiber/GP/Novartis)
> >>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
> >>>
> >>>
> >>>
> >>> Passing on this email that came to me ...
> >>>
> >>> Regards,
> >>> Chris Dagdigian
> >>> OBF
> >>>
> >>>
> >>> Begin forwarded message:
> >>>
> >>>> From: "Miguel Duarte"
> >>>> Date: April 6, 2007 2:16:52 PM EDT
> >>>> To: dag at sonsorol.org
> >>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
> >>>>
> >>>> Hi Chris,
> >>>>
> >>>>> From http://sourceforge.net/project/shownotes.php?
> >>>>> release_id=314770&group_id=18598,
> >>>> i've learned that you're maintaining the class
> >>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
> >>>> case please forward this mail to the maintainer.
> >>>>
> >>>> I've discovered a nasty bug: With some read block sizes the algorithm
> >>>> truncates a few bytes from the end of the stream. I've verified this
> >>>> comparing the gzip/uncompress output for some files versus what
> >>>> org/biojava/utils/io/UncompressInputStream.java generates.
> >>>>
> >>>> Unfortunately i've not discovered the bug yet, but i can contribute
> >>>> with the attached test case. How to verify the bug:
> >>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
> >>>> compare the results.
> >>>>
> >>>> Thanks,
> >>>> Miguel Duarte
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>>
> >>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
> >>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
> >>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark
> >>> Schreiber ]
> >>>
> >>>
> >>> _______________________________________________
> >>> biojava-dev mailing list
> >>> biojava-dev at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.2.2 (GNU/Linux)
> >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >>
> >> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
> >> HoCuWrx5k2ONg/9oxIfVVPI=
> >> =cGTy
> >> -----END PGP SIGNATURE-----
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
From bugzilla-daemon at portal.open-bio.org Tue Apr 10 11:58:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Apr 2007 07:58:59 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704101158.l3ABwxRV028563@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |holland at ebi.ac.uk
------- Comment #1 from holland at ebi.ac.uk 2007-04-10 07:58 EST -------
I think the only reason it doesn't already do so is because when I wrote
RichSequence, RichFeatureHolder hadn't been invented yet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 10 12:15:28 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Apr 2007 08:15:28 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704101215.l3ACFSYY029387@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
------- Comment #2 from holland at ebi.ac.uk 2007-04-10 08:15 EST -------
Just looked at this a bit closer and found that only Features can hold other
Features - RichFeatureHolder represents the FeatureRelationship portion of
BioSQL. FeatureRelationships exist between features, and not between sequences
and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe
it should be FeatureRelationshipHolder, or something like that?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From ap3 at sanger.ac.uk Tue Apr 10 13:02:17 2007
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 10 Apr 2007 14:02:17 +0100
Subject: [Biojava-dev] Fwd:
Bug in org/biojava/utils/io/UncompressInputStream.java
In-Reply-To: <461B6E90.3090501@ebi.ac.uk>
References:
<461B6E90.3090501@ebi.ac.uk>
Message-ID:
> Andreas - did you write the class? If so, then you may understand it
> better than the rest of us. Would you be willing to attempt to fix it?
No, I did not write it - it is a LGPL class which I found in another
project.
see http://www.innovation.ch/java/HTTPClient/ or also the header in
the file.
I will try to have a look at this problem, but not sure if I can fix it
quickly.
PDB data is still available for download as .Z files, e.g.
ftp://ftp.rcsb.org/pub/pdb/data/structures/divided/pdb/ar/
that's why I would need to have some tools for reading these.
I agree this is a general problem and the solution does not necessarily
have to be part of BioJava.
I don;t think any patent got infringed, since the file got committed
after they had expired.
Andreas
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
From bugzilla-daemon at portal.open-bio.org Wed Apr 11 05:12:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Apr 2007 01:12:29 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704110512.l3B5CT9v008783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
------- Comment #3 from mark.schreiber at novartis.com 2007-04-11 01:12 EST -------
(In reply to comment #2)
> Just looked at this a bit closer and found that only Features can hold other
> Features - RichFeatureHolder represents the FeatureRelationship portion of
> BioSQL. FeatureRelationships exist between features, and not between sequences
> and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe
> it should be FeatureRelationshipHolder, or something like that?
I would agree with the proposal to rename it. It would save a lot of confusion.
It should be a pretty simple task to refactor it but it would need to happen
before a release version.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mark.schreiber at novartis.com Thu Apr 12 03:19:18 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 12 Apr 2007 11:19:18 +0800
Subject: [Biojava-dev] javacc
Message-ID:
Hello -
Has anyone ever written a javacc lexer / parser for Genbank (or any of the
other major formats?).
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
From bugzilla-daemon at portal.open-bio.org Fri Apr 13 05:08:33 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Apr 2007 01:08:33 -0400
Subject: [Biojava-dev] [Bug 2273] New: More problems writing uniprot files
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2273
Summary: More problems writing uniprot files
Product: BioJava
Version: live (CVS source)
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: seq.io
AssignedTo: biojava-dev at biojava.org
ReportedBy: gwaldon at geneinfinity.org
I found a few problems during the writing of uniprot files. Using P04941 as a
test exemple:
1. The ID line does not appear with a fix format (this is probably not a bug
actually):
(before/after - read/write)
ID KV6A7_MOUSE Reviewed; 107 AA.
ID KV6A7_MOUSE Reviewed; 107 AA.
2. The reference title get truncated at the end by one character after each
read/write operation:
RT phenyloxazolone and its early diversification.";
RT phenyloxazolone and its early diversification";
RT phenyloxazolone and its early diversificatio";
...
3. The FT line is not formatted correctly; this is a bug because the FT line
has a fixed format, the I of Ig should be at position 35:
(before/after - read/write)
FT CHAIN 1 >107 Ig kappa chain V-VI region NQ2-48.2.2.
FT CHAIN 1 107> Ig kappa chain V-VI region NQ2-48.2.2.
4. SQ line, are-these exactly the same CRC64 number?
SQ SEQUENCE 107 AA; 11557 MW; 72488DA9EF354934 CRC64;
SQ SEQUENCE 107 AA; 11564 MW; ffffffffe278ca323958dd50 CRC64;
- George
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From invite at facebook.com Fri Apr 13 17:12:04 2007
From: invite at facebook.com (Biswaroop Ghosh)
Date: Fri, 13 Apr 2007 10:12:04 -0700
Subject: [Biojava-dev] I've added you as a friend on Facebook...
Message-ID: <82cdae6537486b8fd6048bc766c5c1c5@register.facebook.com>
I've requested to add you as a friend on Facebook. You can use Facebook to see the profiles of the people around you, share photos, and connect with friends. Now everyone can join Facebook, even if you couldn't before.
Thanks,
Biswaroop
P.S. Here's the link:
http://www.facebook.com/p.php?i=695556070&k=10ae46824c&r&v=2
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:08:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:08:29 -0400
Subject: [Biojava-dev] [Bug 2273] More problems writing uniprot files
In-Reply-To:
Message-ID: <200704171208.l3HC8T2G004508@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2273
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:08 EST -------
I have fixed points 1-3. Point 4 I have raised as a new bug for someone else to
fix - the problem goes deeper than just UniProtFormat!
Can you check the code I have committed in CVS and update this bug accordingly
with what you find.
I have not written a unit test as I'm very busy at present and don't have the
time. If you could add one in that would be great.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:11:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:11:11 -0400
Subject: [Biojava-dev] [Bug 2274] New: CRC64 checksum toString() returning
incorrect values
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2274
Summary: CRC64 checksum toString() returning incorrect values
Product: BioJava
Version: live (CVS source)
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Others
AssignedTo: biojava-dev at biojava.org
ReportedBy: holland at ebi.ac.uk
In org.biojavax.utils.CRC64Checksum the toString() method returns 24-character
strings, when CRC64 checksums are only 16-character. Also need to check that
the correct polynomials etc. are being used.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:19:04 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:19:04 -0400
Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning
incorrect values
In-Reply-To:
Message-ID: <200704171219.l3HCJ4op005457@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2274
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:19 EST -------
Fixed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:19:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:19:15 -0400
Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning
incorrect values
In-Reply-To:
Message-ID: <200704171219.l3HCJFwu005500@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2274
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |CLOSED
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:25:50 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Apr 2007 08:25:50 -0400
Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence
In-Reply-To:
Message-ID: <200704171225.l3HCPo16006114@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2261
holland at ebi.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from holland at ebi.ac.uk 2007-04-17 08:25 EST -------
Done. Renamed to RichFeatureRelationshipHolder and removed reference to
RichFeatureHolder as it is technically no such thing.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jburdick at keyfitz.org Thu Apr 19 18:26:55 2007
From: jburdick at keyfitz.org (Josh Burdick)
Date: Thu, 19 Apr 2007 14:26:55 -0400
Subject: [Biojava-dev] reading a subsequence from a .nib file
In-Reply-To:
References:
Message-ID: <1177007215.5481.4.camel@localhost.localdomain>
On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote:
> Hi -
>
> Too my knowledge nothing like this exists in BioJava. Could someone take
> it the last mile and make it produce SymbolLists?
>
I went ahead and added a method getSymbolListByLocation() which takes
the string and converts it to a SymbolList using DNATools. There are
bound to be more efficient ways to do this, but I think this a
reasonable start.
The files are in the same locations:
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
Hopefully someone will find this code useful.
Josh
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
[...]
From gwaldon at geneinfinity.org Thu Apr 19 21:05:09 2007
From: gwaldon at geneinfinity.org (george waldon)
Date: Thu, 19 Apr 2007 14:05:09 -0700
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
Message-ID: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com>
LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format.
This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence.
I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations?
Thanks,
George
From mark.schreiber at novartis.com Fri Apr 20 02:44:51 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 20 Apr 2007 10:44:51 +0800
Subject: [Biojava-dev] reading a subsequence from a .nib file
Message-ID:
Hi Josh -
Looks good. Just one thing, your JUnit test contains a hardcoded file
path to the test file which means it is not portable. Could you modify
that so that it loads the file from the classpath as a resource (see some
of the IO unit tests for examples). Can you also provide the test file.
Best regards,
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
Josh Burdick
04/20/2007 02:26 AM
To: mark.schreiber at novartis.com
cc: biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] reading a subsequence from a .nib file
On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote:
> Hi -
>
> Too my knowledge nothing like this exists in BioJava. Could someone take
> it the last mile and make it produce SymbolLists?
>
I went ahead and added a method getSymbolListByLocation() which takes
the string and converts it to a SymbolList using DNATools. There are
bound to be more efficient ways to do this, but I think this a
reasonable start.
The files are in the same locations:
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
Hopefully someone will find this code useful.
Josh
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
[...]
From mark.schreiber at novartis.com Fri Apr 20 02:46:56 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 20 Apr 2007 10:46:56 +0800
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
Message-ID:
I think this sounds sensible. Having another term for something that can
be derived from the alphabet is redundant.
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
"george waldon"
Sent by: biojava-dev-bounces at lists.open-bio.org
04/20/2007 05:05 AM
Please respond to george waldon
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish
between "aa" and "bp" during the write operation in uniprot format and in
genbank format.
This code is error-prone. For instance, converting a protein sequence from
a fasta file to a genbank formatted file still write "bp". Indeed, the
sequence annotation for this term should be generated during the
enrichment of sequence.
I don't think the extra-work is really necessary. Is-there any objection
that I remove this term and rely instead on the alphabet (either PROTEIN
or PROTEIN_TERM) during the writing operations?
Thanks,
George
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
From holland at ebi.ac.uk Fri Apr 20 08:29:18 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Fri, 20 Apr 2007 09:29:18 +0100
Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM
In-Reply-To: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com>
References: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com>
Message-ID: <462879DE.5000507@ebi.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
No objections. I think your logic is better than mine here. Please go ahead.
cheers,
Richard
george waldon wrote:
> LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format.
>
> This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence.
>
> I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations?
>
> Thanks,
> George
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGKHne4C5LeMEKA/QRAsD6AKCN4Nj7LMk3fCjAcrfE1Lw+Se3FJQCdH142
Sz6DYxYj1HedeHPZJpejtQs=
=9HDc
-----END PGP SIGNATURE-----
From jburdick at keyfitz.org Wed Apr 11 17:42:18 2007
From: jburdick at keyfitz.org (Josh Burdick)
Date: Wed, 11 Apr 2007 17:42:18 -0000
Subject: [Biojava-dev] reading a subsequence from a .nib file
In-Reply-To:
References:
Message-ID: <1176312083.21937.42.camel@localhost.localdomain>
On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote:
> Hi -
>
> Too my knowledge nothing like this exists in BioJava. Could someone take
> it the last mile and make it produce SymbolLists?
>
I added a method that just takes the string and makes it into a
SymbolList using DNATools. This is somewhat inefficient (you can make a
SymbolList directly as an array of numbers, but I wasn't certain enough
that I understood it to try that.)
The package name should be changed, and the test code should probably
do somewhat more, but other than that, if someone wants to add it, feel
free. (The two files are at the same location as before.)
Josh
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>