From bugzilla-daemon at portal.open-bio.org Sun Apr 1 13:03:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Apr 2007 13:03:15 -0400 Subject: [Biojava-dev] [Bug 2253] NullPointerException in MultiSourceCompoundRichLocation In-Reply-To: Message-ID: <200704011703.l31H3FTF011220@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2253 gwaldon at geneinfinity.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |CLOSED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From russ at kepler-eng.com Mon Apr 2 14:59:26 2007 From: russ at kepler-eng.com (Russ Kepler) Date: Mon, 2 Apr 2007 12:59:26 -0600 Subject: [Biojava-dev] Changing the sample name of the ABI file In-Reply-To: References: Message-ID: <200704021259.26814.russ@kepler-eng.com> On Tuesday 13 February 2007 00:01, Lee Heewook wrote: > Is there way to change the sample name of the ABI file? I'm not sure what're you're asking. There's a sample name field in the file in SMPL, to rewrite that you'd have to rewrite pretty much the whole file. Most of the time I found it easier to re-export the data from the instrument. But if you're parsing the name of the file for info (frequently done as grubbing in the file is a PITA) then you can usually simply change the file name. From gwaldon at geneinfinity.org Mon Apr 2 19:56:09 2007 From: gwaldon at geneinfinity.org (george waldon) Date: Mon, 02 Apr 2007 16:56:09 -0700 Subject: [Biojava-dev] Isoelectric point calculation Message-ID: <20070402235609.52369.qmail@mmm1924.dulles19-verio.com> Hi, Trying to solve the problem of symbol ambiguity in pI calculation that was brought to our attention on Biojava-1, I found a few problems, in particular (!) calculated pI values are incorrect, BinarySearch throw exceptions, and the ResidueProperties.xml has some strange values, such as pK of Glu at -4.25. The class IsoelectricPointCalc was written a long time ago and I hope to get in touch with the original author and have a corrected code rapidly. As a general rule, scientific biomethods and biodata put in biojava need precise literature references. Javadocs are a good place for that. - George From mark.schreiber at novartis.com Mon Apr 2 21:03:20 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 3 Apr 2007 09:03:20 +0800 Subject: [Biojava-dev] reading a subsequence from a .nib file Message-ID: Hi - Too my knowledge nothing like this exists in BioJava. Could someone take it the last mile and make it produce SymbolLists? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Josh Burdick Sent by: biojava-dev-bounces at lists.open-bio.org 01/23/2007 12:29 AM To: biojava-dev at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] reading a subsequence from a .nib file I wrote some code to read a chunk of DNA sequence from a file in Jim Kent's blat ".nib" file format. This is a simple format using four bits/base. I didn't attach the code, to avoid spamming the whole list; but it, and a (very crude!) JUnit test, are at http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java You could use 2 bits/base, but then you can't have ambiguous bases. 4 bits/base seems like a reasonable compromise; plus sites that have "blat" installed will need to have the .nib files on a server somewhere anyway, and this way repeat-masking can be included, which may be convenient. Also, it doesn't support writing a .nib file; again, presumably people will be using Jim Kent's faToNib program to do that. It would need some tweaking to be included in BioJava, because it returns a plain String of ACGT, instead of a PackedSequence object. (Probably this would just involve rewriting the setupBuffer() and addToBuffer() methods in the code.) Also, the coordinate information could come from a Range object. If similar code is already somewhere in BioJava, please ignore this; but I couldn't find it with thirty seconds of Googling, so I figured it hadn't been written... Josh Burdick programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia jburdick at keyfitz.org _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Mon Apr 2 21:06:11 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 3 Apr 2007 09:06:11 +0800 Subject: [Biojava-dev] Isoelectric point calculation Message-ID: Hi George - This probably should be reported as a bug in bugzilla to make sure we get around to fixing it. Thanks, - Mark "george waldon" Sent by: biojava-dev-bounces at lists.open-bio.org 04/03/2007 07:56 AM Please respond to george waldon To: biojava-dev at biojava.org cc: smh1008 at cam.ac.uk, (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] Isoelectric point calculation Hi, Trying to solve the problem of symbol ambiguity in pI calculation that was brought to our attention on Biojava-1, I found a few problems, in particular (!) calculated pI values are incorrect, BinarySearch throw exceptions, and the ResidueProperties.xml has some strange values, such as pK of Glu at -4.25. The class IsoelectricPointCalc was written a long time ago and I hope to get in touch with the original author and have a corrected code rapidly. As a general rule, scientific biomethods and biodata put in biojava need precise literature references. Javadocs are a good place for that. - George _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Mon Apr 2 21:09:10 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 3 Apr 2007 09:09:10 +0800 Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG Message-ID: Hi Caroline - Could you post some example code that we could use to replicate the problem? Thanks. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Caroline Renaux" Sent by: biojava-dev-bounces at lists.open-bio.org 03/26/2007 09:18 PM To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG Bonjour, j'ai r?cemment utilis? le Package org.biojava.bio.symbol et plus particuli?rement la classe UkkonenSuffixTree. Cependant lorsque que je veux ajouter un ensemble de s?quences ? l'arbre et que je les s?pares par le caract?re de s?paration '$' cel? ne fonctionne pas. Lorsqu'il traite la seconde s?quence j'obtiens une erreur "NullPointerException" dans la m?thode jumpTo ? la ligne : arrivedAt=(SuffixNode)currentNode.children.get(*new* Character(source.charAt (from))); Je ne comprend pas ce que j'aurai pu faire de travers. D'avance merci de votre r?ponse. RENAUX C. -------------------------------- Hello, I used for a java application the org.biojava.bio.symbol package and particularly the UkkonenSuffixTree class. When i want to add a set of sequences to the tree, i add a '$' between the sequences but it doesn't work. I have a NullPointerException when the system add the second sequence int the method jumTo at the line : arrivedAt=(SuffixNode)currentNode.children.get(*new* Character(source.charAt (from))); I don't understand why it doesn't work. Thank you in advance. RENAUX C. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From bugzilla-daemon at portal.open-bio.org Tue Apr 3 00:03:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 00:03:53 -0400 Subject: [Biojava-dev] [Bug 2244] uniprot files do not load In-Reply-To: Message-ID: <200704030403.l3343rrF032035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2244 ------- Comment #9 from gwaldon at geneinfinity.org 2007-04-03 00:03 EST ------- *** Bug 2249 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 3 03:12:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 03:12:48 -0400 Subject: [Biojava-dev] [Bug 2258] New: ConcurrentModificationException in SimpleRichAnnotation Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2258 Summary: ConcurrentModificationException in SimpleRichAnnotation Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq AssignedTo: biojava-dev at biojava.org ReportedBy: gwaldon at geneinfinity.org Exception thrown by the method clear(), apparently resulting of trying to change the note set while iterating over it. - George -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From holland at ebi.ac.uk Tue Apr 3 06:00:01 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 03 Apr 2007 11:00:01 +0100 Subject: [Biojava-dev] JDBCPooledDataSource regression In-Reply-To: <45C090A6.10909@ebi.ac.uk> References: <416B41DF-91E1-4D1F-A4B4-799FE712B032@sanger.ac.uk> <45C0781B.7030304@ebi.ac.uk> <5DDBC6F5-7DC3-4446-A982-3CD9B3931A06@sanger.ac.uk> <45C08639.7080600@ebi.ac.uk> <8689C307-0643-46D9-90A6-A9958681D1D0@sanger.ac.uk> <45C08B7A.3060102@ebi.ac.uk> <45C08E4E.2050308@ebi.ac.uk> <45C090A6.10909@ebi.ac.uk> Message-ID: <461225A1.4000705@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Great stuff. You should commit it when you get your CVS account. :) There's one or two typos (can't spell deprecated!) but I'm sure once you get the rest of BioJava into Eclipse or something to make this change permanent they'll show up. cheers, Richard Andy Yates wrote: > Okay I've attached the fix here. > > I just did this in a text editor but I believe that the imports are > okay. If you can just do a quick scan as well to make sure I haven't > deleted anything that was very important. > > I'll get on to the helpdesk now as well :) > > Andy > > Richard Holland wrote: > Andy could you make the change to your local copy of the source file and > email the file to me, that way I can make sure I don't get it wrong when > I commit it. > > Richard. > > PS. You should probably have your own CVS account - email the OBF > helpdesk and ask for one, saying I told you to. :) > > > Andy Yates wrote: >>>> Thomas Down wrote: >>>>> On 31 Jan 2007, at 12:06, Andy Yates wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Sorry I was meaning if that if that method just becomes: >>>>>> >>>>>> public static DataSource getDataSource(final String driver, >>>>>> final String url, >>>>>> final String user, >>>>>> final String pass) >>>>>> throws Exception { >>>>>> >>>>>> BasicDataSource ds = new BasicDataSource(); >>>>>> ds.setUrl(url); >>>>>> ds.setDriverClassName(driver); >>>>>> ds.setUsername(user); >>>>>> ds.setPassword(pass); >>>>>> // Set BasicDataSource properties such as maxActive and >>>>>> maxIdle, as described in >>>>>> // >>>>>> http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html >>>>>> >>>>>> ds.setMaxActive(10); >>>>>> ds.setMaxIdle(5); >>>>>> ds.setMaxWait(10000); >>>>>> >>>>>> return ds; >>>>>> } >>>>>> >>>>>> Does that still work? >>>>> Hmmm, I was assuming that BasicDataSource didn't actually do any >>>>> pooling itself, and that you needed another layer on top to manage a >>>>> connection pool -- that seems to be how all previous revisions of >>>>> JDBCConnectionPool worked, so I guess I wasn't alone in thinking >>>>> this. But yes, BasicDataSource does seem to do pooling itself >>>>> (confirmed by reading the source), so maybe your simpler version is >>>>> a better idea. It certainly works okay for me. >>>>> >>>>> Thomas. >>>> That's what I thought should have happened :). Can I suggest that >>>> this revised version goes into CVS? Anyone got any objections? >>>> >>>> Andy >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> > ------------------------------------------------------------------------ > /* > * BioJava development code > * > * This code may be freely distributed and modified under the > * terms of the GNU Lesser General Public Licence. This should > * be distributed with the code. If you do not have a copy, > * see: > * > * http://www.gnu.org/copyleft/lesser.html > * > * Copyright for this code is held jointly by the individual > * authors. These should be listed in @author doc comments. > * > * For more information on the BioJava project and its aims, > * or to join the biojava-l mailing list, visit the home page > * at: > * > * http://www.biojava.org/ > * > */ > package org.biojava.utils; > import javax.sql.DataSource; > import org.apache.commons.dbcp.BasicDataSource; > import org.apache.commons.dbcp.PoolingDataSource; > import org.apache.commons.pool.ObjectPool; > /** > * Returns a DataSource that implements connection pooling > * > * Uses Jakarta Commons DBCP and Pool packages. > * See the description of the dbcp package at > * http://jakarta.apache.org/commons/dbcp/api/overview-summary.html#overview_description > * > * @author Simon Foote > * @author Len Trigg > */ > public class JDBCPooledDataSource { > public static DataSource getDataSource(final String driver, > final String url, > final String user, > final String pass) > throws Exception { > BasicDataSource ds = new BasicDataSource(); > ds.setUrl(url); > ds.setDriverClassName(driver); > ds.setUsername(user); > ds.setPassword(pass); > // Set BasicDataSource properties such as maxActive and maxIdle, as described in > // http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html > ds.setMaxActive(10); > ds.setMaxIdle(5); > ds.setMaxWait(10000); > return dataSource; > } > // Adds simple equals and hashcode methods so that we can compare if > // two connections are to the same database. This will fail if the > // DataSource is redirected to another database etc (I doubt this is > // ever likely to be used). > /** > * @depercated This is no longer used in favor of {@link BasicDataSource} > * from DBCP > */ > static class MyPoolingDataSource extends PoolingDataSource { > final String source; > public MyPoolingDataSource(ObjectPool connectionPool, String source) { > super(connectionPool); > this.source = source; > } > public boolean equals(Object o2) { > if ((o2 == null) || !(o2 instanceof MyPoolingDataSource)) { > return false; > } > MyPoolingDataSource b2 = (MyPoolingDataSource) o2; > return source.equals(b2.source); > } > public int hashCode() { > return source.hashCode(); > } > } > public static void main(String[] args) { > try { > DataSource ds1 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", ""); > DataSource ds2 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", ""); > System.err.println(ds1); > System.err.println(ds2); > System.err.println(ds1.equals(ds2)); > } catch (Exception e) { > e.printStackTrace(); > } > } > } > ------------------------------------------------------------------------ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGEiWh4C5LeMEKA/QRAgYbAJ4yoE6dsLuOOS8sg1wOCybV6rsNUwCeN0c8 oiFz/0yblV4P8a35RbU+nDM= =imiK -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Tue Apr 3 06:05:22 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 06:05:22 -0400 Subject: [Biojava-dev] [Bug 2038] test bug In-Reply-To: Message-ID: <200704031005.l33A5M9N015780@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2038 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:05 EST ------- This has been lying around for ages, thought I'd tidy it up. :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 3 06:20:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 06:20:48 -0400 Subject: [Biojava-dev] [Bug 2258] ConcurrentModificationException in SimpleRichAnnotation In-Reply-To: Message-ID: <200704031020.l33AKmpP016736@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2258 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:20 EST ------- Fixed later today in CVS. Test also added. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 3 06:23:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 06:23:54 -0400 Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer In-Reply-To: Message-ID: <200704031023.l33ANsV1016903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2107 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from holland at ebi.ac.uk 2007-04-03 06:23 EST ------- I have committed Jolyon's changes (or will do so later today). These are untestable with a JUnit so no test has been added. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 4 01:37:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Apr 2007 01:37:59 -0400 Subject: [Biojava-dev] [Bug 2260] New: Bug in UkkonenSuffixTree Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2260 Summary: Bug in UkkonenSuffixTree Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: symbol AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com There is a bug in the UkkonenSuffixTree when one tries to add concatenated Strings which are delimited with a $. This doesn't seem to be a problem when Strings are added individually. A simple work around is to add Strings individually. The following code causes the bug: public class Main { /** Creates a new instance of Main */ public Main() { } /** * @param args the command line arguments */ public static void main(String[] args) { String seqs = "atcgcgcgcgctcggcctgggggctcgcgct$acgggtggtggt"; UkkonenSuffixTree suff = new UkkonenSuffixTree(seqs); } } Someone with a better knowledge of suffix trees than I have would need to look at this... Additionally there are several places in code where variables are declared and never used or declared globally when they need not be or are not declared final when they are never modified. There is also System.out.println() statements that should be messages in exceptions or errors. The code could do with a good clean up. I am marking this as minor because there is a work around. The simplest thing might be to disable the advertised feature of being able to deal with concatenated strings as it seems it cannot and probably never has been able to. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 4 01:46:02 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Apr 2007 01:46:02 -0400 Subject: [Biojava-dev] [Bug 2261] New: Request for enhancement of RichSequence Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2261 Summary: Request for enhancement of RichSequence Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: seq AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com RichSequence implements FeatureHolder but it would be nice if it could implement RichFeature holder. This would require the addition of four methods to any implementations of the RichSequence interface but would avoid endless casting. If we do it before an official release of bj1.5 we won't strictly be breaking the interface. Is there any reason why we should not do this?? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dag at sonsorol.org Fri Apr 6 21:52:01 2007 From: dag at sonsorol.org (Chris Dagdigian) Date: Fri, 6 Apr 2007 21:52:01 -0400 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> Message-ID: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> Passing on this email that came to me ... Regards, Chris Dagdigian OBF Begin forwarded message: > From: "Miguel Duarte" > Date: April 6, 2007 2:16:52 PM EDT > To: dag at sonsorol.org > Subject: Bug in org/biojava/utils/io/UncompressInputStream.java > > Hi Chris, > >> From http://sourceforge.net/project/shownotes.php? >> release_id=314770&group_id=18598, > i've learned that you're maintaining the class > org/biojava/utils/io/UncompressInputStream.java. If that's not the > case please forward this mail to the maintainer. > > I've discovered a nasty bug: With some read block sizes the algorithm > truncates a few bytes from the end of the stream. I've verified this > comparing the gzip/uncompress output for some files versus what > org/biojava/utils/io/UncompressInputStream.java generates. > > Unfortunately i've not discovered the bug yet, but i can contribute > with the attached test case. How to verify the bug: > uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and > compare the results. > > Thanks, > Miguel Duarte -------------- next part -------------- A non-text attachment was scrubbed... Name: BH_03834.MCR.Z Type: application/x-compress Size: 26405 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070406/ed47c294/attachment-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: uncompressed_by_gzip Type: application/octet-stream Size: 81920 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070406/ed47c294/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: uncompressed_by_uncompressInputStream Type: application/octet-stream Size: 81832 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20070406/ed47c294/attachment-0003.obj -------------- next part -------------- From mark.schreiber at novartis.com Sun Apr 8 22:13:12 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 9 Apr 2007 10:13:12 +0800 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java Message-ID: Does anyone maintain this class?? More to the point, does anyone know what it is for??? If I look at the Uses link in javadoc there are aparently none at the public or package level. Additionally why does biojava need one, are there not java.io classes that can handle compressed streams?? Is there a good reason why we cannot just clean it out? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Chris Dagdigian Sent by: biojava-dev-bounces at lists.open-bio.org 04/07/2007 09:52 AM To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java Passing on this email that came to me ... Regards, Chris Dagdigian OBF Begin forwarded message: > From: "Miguel Duarte" > Date: April 6, 2007 2:16:52 PM EDT > To: dag at sonsorol.org > Subject: Bug in org/biojava/utils/io/UncompressInputStream.java > > Hi Chris, > >> From http://sourceforge.net/project/shownotes.php? >> release_id=314770&group_id=18598, > i've learned that you're maintaining the class > org/biojava/utils/io/UncompressInputStream.java. If that's not the > case please forward this mail to the maintainer. > > I've discovered a nasty bug: With some read block sizes the algorithm > truncates a few bytes from the end of the stream. I've verified this > comparing the gzip/uncompress output for some files versus what > org/biojava/utils/io/UncompressInputStream.java generates. > > Unfortunately i've not discovered the bug yet, but i can contribute > with the attached test case. How to verify the bug: > uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and > compare the results. > > Thanks, > Miguel Duarte _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark Schreiber ] From holland at ebi.ac.uk Tue Apr 10 04:59:32 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 09:59:32 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: References: Message-ID: <461B51F4.8010402@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have no idea what it is for. There are generic Java classes provided with the SDK that do the same job. I think we should probably drop it. Lets wait to see if anyone shouts first. mark.schreiber at novartis.com wrote: > Does anyone maintain this class?? > > More to the point, does anyone know what it is for??? If I look at the > Uses link in javadoc there are aparently none at the public or package > level. Additionally why does biojava need one, are there not java.io > classes that can handle compressed streams?? > > Is there a good reason why we cannot just clean it out? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > Chris Dagdigian > Sent by: biojava-dev-bounces at lists.open-bio.org > 04/07/2007 09:52 AM > > > To: biojava-dev at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java > > > > Passing on this email that came to me ... > > Regards, > Chris Dagdigian > OBF > > > Begin forwarded message: > >> From: "Miguel Duarte" >> Date: April 6, 2007 2:16:52 PM EDT >> To: dag at sonsorol.org >> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> Hi Chris, >> >>> From http://sourceforge.net/project/shownotes.php? >>> release_id=314770&group_id=18598, >> i've learned that you're maintaining the class >> org/biojava/utils/io/UncompressInputStream.java. If that's not the >> case please forward this mail to the maintainer. >> >> I've discovered a nasty bug: With some read block sizes the algorithm >> truncates a few bytes from the end of the stream. I've verified this >> comparing the gzip/uncompress output for some files versus what >> org/biojava/utils/io/UncompressInputStream.java generates. >> >> Unfortunately i've not discovered the bug yet, but i can contribute >> with the attached test case. How to verify the bug: >> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >> compare the results. >> >> Thanks, >> Miguel Duarte > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark > Schreiber ] > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 HoCuWrx5k2ONg/9oxIfVVPI= =cGTy -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Tue Apr 10 05:56:57 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 10:56:57 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B51F4.8010402@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> Message-ID: <461B5F69.9060506@ebi.ac.uk> I don't think there are standard classes for this compression format in the SDK. There are ones for GZIP & ZIP but not for LZW which this one is dealing with. Also I'm not sure about using GZIP to unzip a file compressed with LZW since GZIP uses DEFLATE. We need to decompress the file using uncompress (which is missing from my Linux box but is on the mac ... go figure) and then match that up to the output from UncompressInputStream & see if they agree or not. Andy Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I have no idea what it is for. There are generic Java classes provided > with the SDK that do the same job. I think we should probably drop it. > Lets wait to see if anyone shouts first. > > mark.schreiber at novartis.com wrote: >> Does anyone maintain this class?? >> >> More to the point, does anyone know what it is for??? If I look at the >> Uses link in javadoc there are aparently none at the public or package >> level. Additionally why does biojava need one, are there not java.io >> classes that can handle compressed streams?? >> >> Is there a good reason why we cannot just clean it out? >> >> - Mark >> >> Mark Schreiber >> Research Investigator (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> >> >> >> >> Chris Dagdigian >> Sent by: biojava-dev-bounces at lists.open-bio.org >> 04/07/2007 09:52 AM >> >> >> To: biojava-dev at biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> >> >> Passing on this email that came to me ... >> >> Regards, >> Chris Dagdigian >> OBF >> >> >> Begin forwarded message: >> >>> From: "Miguel Duarte" >>> Date: April 6, 2007 2:16:52 PM EDT >>> To: dag at sonsorol.org >>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>> >>> Hi Chris, >>> >>>> From http://sourceforge.net/project/shownotes.php? >>>> release_id=314770&group_id=18598, >>> i've learned that you're maintaining the class >>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>> case please forward this mail to the maintainer. >>> >>> I've discovered a nasty bug: With some read block sizes the algorithm >>> truncates a few bytes from the end of the stream. I've verified this >>> comparing the gzip/uncompress output for some files versus what >>> org/biojava/utils/io/UncompressInputStream.java generates. >>> >>> Unfortunately i've not discovered the bug yet, but i can contribute >>> with the attached test case. How to verify the bug: >>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>> compare the results. >>> >>> Thanks, >>> Miguel Duarte >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark >> Schreiber ] >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 > HoCuWrx5k2ONg/9oxIfVVPI= > =cGTy > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From ayates at ebi.ac.uk Tue Apr 10 06:03:40 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 11:03:40 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B5F69.9060506@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk> Message-ID: <461B60FC.7040903@ebi.ac.uk> Okay a quick run of uncompress on the mac with the files in question does produce a file which is equivalent to the file produced by gzip but not to the one produced by UncompressInputStream. The required md5sum for a pass should be (after a md5 digest): 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip But we get: 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream Okay so looks like there is something "wrong". Seems like it drops 88 bytes from the decompression. Wonder what happens if we pass this file type through the GZIPInputStream from the JDK? Andy Yates wrote: > I don't think there are standard classes for this compression format in > the SDK. There are ones for GZIP & ZIP but not for LZW which this one is > dealing with. Also I'm not sure about using GZIP to unzip a file > compressed with LZW since GZIP uses DEFLATE. > > We need to decompress the file using uncompress (which is missing from > my Linux box but is on the mac ... go figure) and then match that up to > the output from UncompressInputStream & see if they agree or not. > > Andy > > Richard Holland wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> I have no idea what it is for. There are generic Java classes provided >> with the SDK that do the same job. I think we should probably drop it. >> Lets wait to see if anyone shouts first. >> >> mark.schreiber at novartis.com wrote: >>> Does anyone maintain this class?? >>> >>> More to the point, does anyone know what it is for??? If I look at the >>> Uses link in javadoc there are aparently none at the public or package >>> level. Additionally why does biojava need one, are there not java.io >>> classes that can handle compressed streams?? >>> >>> Is there a good reason why we cannot just clean it out? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> >>> >>> >>> >>> Chris Dagdigian >>> Sent by: biojava-dev-bounces at lists.open-bio.org >>> 04/07/2007 09:52 AM >>> >>> >>> To: biojava-dev at biojava.org >>> cc: (bcc: Mark Schreiber/GP/Novartis) >>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java >>> >>> >>> >>> Passing on this email that came to me ... >>> >>> Regards, >>> Chris Dagdigian >>> OBF >>> >>> >>> Begin forwarded message: >>> >>>> From: "Miguel Duarte" >>>> Date: April 6, 2007 2:16:52 PM EDT >>>> To: dag at sonsorol.org >>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>> >>>> Hi Chris, >>>> >>>>> From http://sourceforge.net/project/shownotes.php? >>>>> release_id=314770&group_id=18598, >>>> i've learned that you're maintaining the class >>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>> case please forward this mail to the maintainer. >>>> >>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>> truncates a few bytes from the end of the stream. I've verified this >>>> comparing the gzip/uncompress output for some files versus what >>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>> >>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>> with the attached test case. How to verify the bug: >>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>> compare the results. >>>> >>>> Thanks, >>>> Miguel Duarte >>> >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark >>> Schreiber ] >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 >> HoCuWrx5k2ONg/9oxIfVVPI= >> =cGTy >> -----END PGP SIGNATURE----- >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Tue Apr 10 06:09:04 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 11:09:04 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> Message-ID: <461B6240.2070907@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 AFAIK the Zip algorithm is just LZW with bells on, so it should produce exactly the same results. Chris Dagdigian wrote: > > Passing on this email that came to me ... > > Regards, > Chris Dagdigian > OBF > > > Begin forwarded message: > >> From: "Miguel Duarte" >> Date: April 6, 2007 2:16:52 PM EDT >> To: dag at sonsorol.org >> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> Hi Chris, >> >>> From >>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598, >>> >> i've learned that you're maintaining the class >> org/biojava/utils/io/UncompressInputStream.java. If that's not the >> case please forward this mail to the maintainer. >> >> I've discovered a nasty bug: With some read block sizes the algorithm >> truncates a few bytes from the end of the stream. I've verified this >> comparing the gzip/uncompress output for some files versus what >> org/biojava/utils/io/UncompressInputStream.java generates. >> >> Unfortunately i've not discovered the bug yet, but i can contribute >> with the attached test case. How to verify the bug: >> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >> compare the results. >> >> Thanks, >> Miguel Duarte > > > ------------------------------------------------------------------------ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG2JA4C5LeMEKA/QRAjutAJ9cZbqpoag2Z5aQd4gbOAiMm78VZACdHzER UoIhheyTE1805rMBzG4R+Q0= =hfN2 -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Tue Apr 10 06:12:58 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 11:12:58 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B60FC.7040903@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk> <461B60FC.7040903@ebi.ac.uk> Message-ID: <461B632A.10101@ebi.ac.uk> Quick program to pass it through the GZIPInputStream chucks an IOException saying it's not in the GZIP format (which it isn't). Also passing it through the ZipInputStream seems to do nothing. At any rate it looks like we cannot get rid of this class; it's got to be fixed/maintained Andy Yates wrote: > Okay a quick run of uncompress on the mac with the files in question > does produce a file which is equivalent to the file produced by gzip but > not to the one produced by UncompressInputStream. > > The required md5sum for a pass should be (after a md5 digest): > > 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip > > But we get: > > 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream > > Okay so looks like there is something "wrong". Seems like it drops 88 > bytes from the decompression. > > Wonder what happens if we pass this file type through the > GZIPInputStream from the JDK? > > Andy Yates wrote: >> I don't think there are standard classes for this compression format >> in the SDK. There are ones for GZIP & ZIP but not for LZW which this >> one is dealing with. Also I'm not sure about using GZIP to unzip a >> file compressed with LZW since GZIP uses DEFLATE. >> >> We need to decompress the file using uncompress (which is missing from >> my Linux box but is on the mac ... go figure) and then match that up >> to the output from UncompressInputStream & see if they agree or not. >> >> Andy >> >> Richard Holland wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> I have no idea what it is for. There are generic Java classes provided >>> with the SDK that do the same job. I think we should probably drop it. >>> Lets wait to see if anyone shouts first. >>> >>> mark.schreiber at novartis.com wrote: >>>> Does anyone maintain this class?? >>>> >>>> More to the point, does anyone know what it is for??? If I look at >>>> the Uses link in javadoc there are aparently none at the public or >>>> package level. Additionally why does biojava need one, are there not >>>> java.io classes that can handle compressed streams?? >>>> >>>> Is there a good reason why we cannot just clean it out? >>>> >>>> - Mark >>>> >>>> Mark Schreiber >>>> Research Investigator (Bioinformatics) >>>> >>>> Novartis Institute for Tropical Diseases (NITD) >>>> 10 Biopolis Road >>>> #05-01 Chromos >>>> Singapore 138670 >>>> www.nitd.novartis.com >>>> >>>> phone +65 6722 2973 >>>> fax +65 6722 2910 >>>> >>>> >>>> >>>> >>>> >>>> Chris Dagdigian >>>> Sent by: biojava-dev-bounces at lists.open-bio.org >>>> 04/07/2007 09:52 AM >>>> >>>> >>>> To: biojava-dev at biojava.org >>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>> Subject: [Biojava-dev] Fwd: Bug in >>>> org/biojava/utils/io/UncompressInputStream.java >>>> >>>> >>>> >>>> Passing on this email that came to me ... >>>> >>>> Regards, >>>> Chris Dagdigian >>>> OBF >>>> >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Miguel Duarte" >>>>> Date: April 6, 2007 2:16:52 PM EDT >>>>> To: dag at sonsorol.org >>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>>> >>>>> Hi Chris, >>>>> >>>>>> From http://sourceforge.net/project/shownotes.php? >>>>>> release_id=314770&group_id=18598, >>>>> i've learned that you're maintaining the class >>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>>> case please forward this mail to the maintainer. >>>>> >>>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>>> truncates a few bytes from the end of the stream. I've verified this >>>>> comparing the gzip/uncompress output for some files versus what >>>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>>> >>>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>>> with the attached test case. How to verify the bug: >>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>>> compare the results. >>>>> >>>>> Thanks, >>>>> Miguel Duarte >>>> >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by >>>> Mark Schreiber ] >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 >>> HoCuWrx5k2ONg/9oxIfVVPI= >>> =cGTy >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Tue Apr 10 06:37:36 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 11:37:36 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B6443.9060300@ebi.ac.uk> References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> <461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk> Message-ID: <461B68F0.4000908@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Why are these files in compress/uncompress format? Is it proprietary software creating them, or a legacy system of some kind? Wouldn't gzip give better results both in terms of compression ratios and performance as it is far more up-to-date? I believe that the JDK doesn't support LZW because LZW was patented, and that patent expired only very recently (in 2003/4/5/6 depending on where you live and in what form you use LZW): http://www.gnu.org/philosophy/gif.html It's one of those wonderful cases where the patent enforcement caused the algorithm it was protecting to get dumped and forgotten because nobody wanted to pay for it. Apart from *nix compress/uncompress and inside the GIF format I'm not sure it's actually used anywhere else any more. Technically we infringed the patent by including LZW support in BioJava, but now the patent has expired we no longer need to worry. Question is, do we need to fix this inherently computer-science problem which is entirely unrelated to biology or bioinformatics, or can we just get people to use an alternative library instead which supports it better and is more generic? They are out there, for instance: http://www.chilkatsoft.com/java-zip.asp cheers, Richard Andy Yates wrote: > Seems very strange this does. I don't know much about decompression but > by the looks of things LZW isn't supported by the JDK. > > Richard Holland wrote: > AFAIK the Zip algorithm is just LZW with bells on, so it should produce > exactly the same results. > > Chris Dagdigian wrote: >>>> Passing on this email that came to me ... >>>> >>>> Regards, >>>> Chris Dagdigian >>>> OBF >>>> >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Miguel Duarte" >>>>> Date: April 6, 2007 2:16:52 PM EDT >>>>> To: dag at sonsorol.org >>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>>> >>>>> Hi Chris, >>>>> >>>>>> From >>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598, >>>>>> >>>>>> >>>>> i've learned that you're maintaining the class >>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>>> case please forward this mail to the maintainer. >>>>> >>>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>>> truncates a few bytes from the end of the stream. I've verified this >>>>> comparing the gzip/uncompress output for some files versus what >>>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>>> >>>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>>> with the attached test case. How to verify the bug: >>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>>> compare the results. >>>>> >>>>> Thanks, >>>>> Miguel Duarte >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4 CwgzY5S7KELC3TA1oKKtjUw= =9xEM -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Tue Apr 10 06:54:27 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 11:54:27 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B68F0.4000908@ebi.ac.uk> References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> <461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk> <461B68F0.4000908@ebi.ac.uk> Message-ID: <461B6CE3.303@ebi.ac.uk> I guess it all depends really on what is the software that is producing these files. If it is something very common to Bioinformatics we might have to accept that support needs to come in from somewhere; and by the looks of things the techniques for compression are quite varied (the man for compress mentions things about adaptive dictionaries and the alike). Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Why are these files in compress/uncompress format? Is it proprietary > software creating them, or a legacy system of some kind? Wouldn't gzip > give better results both in terms of compression ratios and performance > as it is far more up-to-date? > > I believe that the JDK doesn't support LZW because LZW was patented, and > that patent expired only very recently (in 2003/4/5/6 depending on where > you live and in what form you use LZW): > > http://www.gnu.org/philosophy/gif.html > > It's one of those wonderful cases where the patent enforcement caused > the algorithm it was protecting to get dumped and forgotten because > nobody wanted to pay for it. Apart from *nix compress/uncompress and > inside the GIF format I'm not sure it's actually used anywhere else any > more. > > Technically we infringed the patent by including LZW support in BioJava, > but now the patent has expired we no longer need to worry. > > Question is, do we need to fix this inherently computer-science problem > which is entirely unrelated to biology or bioinformatics, or can we just > get people to use an alternative library instead which supports it > better and is more generic? They are out there, for instance: > > http://www.chilkatsoft.com/java-zip.asp > > cheers, > Richard > > Andy Yates wrote: >> Seems very strange this does. I don't know much about decompression but >> by the looks of things LZW isn't supported by the JDK. >> >> Richard Holland wrote: >> AFAIK the Zip algorithm is just LZW with bells on, so it should produce >> exactly the same results. >> >> Chris Dagdigian wrote: >>>>> Passing on this email that came to me ... >>>>> >>>>> Regards, >>>>> Chris Dagdigian >>>>> OBF >>>>> >>>>> >>>>> Begin forwarded message: >>>>> >>>>>> From: "Miguel Duarte" >>>>>> Date: April 6, 2007 2:16:52 PM EDT >>>>>> To: dag at sonsorol.org >>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>>>> >>>>>> Hi Chris, >>>>>> >>>>>>> From >>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598, >>>>>>> >>>>>>> >>>>>> i've learned that you're maintaining the class >>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>>>> case please forward this mail to the maintainer. >>>>>> >>>>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>>>> truncates a few bytes from the end of the stream. I've verified this >>>>>> comparing the gzip/uncompress output for some files versus what >>>>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>>>> >>>>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>>>> with the attached test case. How to verify the bug: >>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>>>> compare the results. >>>>>> >>>>>> Thanks, >>>>>> Miguel Duarte >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4 > CwgzY5S7KELC3TA1oKKtjUw= > =9xEM > -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Tue Apr 10 06:27:39 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 10 Apr 2007 11:27:39 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: References: Message-ID: Hi! I committed this class a while ago, since I did not find any other way to read .Z compressed files. Unfortunately PDB files are often stored like that ... If anybody has a suggestion how to read unix compressed files (.Z) in a better way, I would be glad to hear. Parsing them as Zip or GZip did not work in my trials... Andreas On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote: > Does anyone maintain this class?? > > More to the point, does anyone know what it is for??? If I look at the > Uses link in javadoc there are aparently none at the public or package > level. Additionally why does biojava need one, are there not java.io > classes that can handle compressed streams?? > > Is there a good reason why we cannot just clean it out? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > Chris Dagdigian > Sent by: biojava-dev-bounces at lists.open-bio.org > 04/07/2007 09:52 AM > > > To: biojava-dev at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] Fwd: Bug in > org/biojava/utils/io/UncompressInputStream.java > > > > Passing on this email that came to me ... > > Regards, > Chris Dagdigian > OBF > > > Begin forwarded message: > >> From: "Miguel Duarte" >> Date: April 6, 2007 2:16:52 PM EDT >> To: dag at sonsorol.org >> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> Hi Chris, >> >>> From http://sourceforge.net/project/shownotes.php? >>> release_id=314770&group_id=18598, >> i've learned that you're maintaining the class >> org/biojava/utils/io/UncompressInputStream.java. If that's not the >> case please forward this mail to the maintainer. >> >> I've discovered a nasty bug: With some read block sizes the algorithm >> truncates a few bytes from the end of the stream. I've verified this >> comparing the gzip/uncompress output for some files versus what >> org/biojava/utils/io/UncompressInputStream.java generates. >> >> Unfortunately i've not discovered the bug yet, but i can contribute >> with the attached test case. How to verify the bug: >> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >> compare the results. >> >> Thanks, >> Miguel Duarte > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark > Schreiber ] > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From holland at ebi.ac.uk Tue Apr 10 07:01:36 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 12:01:36 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: References: Message-ID: <461B6E90.3090501@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andreas - did you write the class? If so, then you may understand it better than the rest of us. Would you be willing to attempt to fix it? cheers, Richard Andreas Prlic wrote: > Hi! > > I committed this class a while ago, since I did not find any other way > to read .Z compressed files. > > Unfortunately PDB files are often stored like that ... > > If anybody has a suggestion how to read unix compressed files (.Z) in a > better way, I would be glad to hear. > > Parsing them as Zip or GZip did not work in my trials... > > Andreas > > > > > > On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote: > >> Does anyone maintain this class?? >> >> More to the point, does anyone know what it is for??? If I look at the >> Uses link in javadoc there are aparently none at the public or package >> level. Additionally why does biojava need one, are there not java.io >> classes that can handle compressed streams?? >> >> Is there a good reason why we cannot just clean it out? >> >> - Mark >> >> Mark Schreiber >> Research Investigator (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> >> >> >> >> Chris Dagdigian >> Sent by: biojava-dev-bounces at lists.open-bio.org >> 04/07/2007 09:52 AM >> >> >> To: biojava-dev at biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-dev] Fwd: Bug in >> org/biojava/utils/io/UncompressInputStream.java >> >> >> >> Passing on this email that came to me ... >> >> Regards, >> Chris Dagdigian >> OBF >> >> >> Begin forwarded message: >> >>> From: "Miguel Duarte" >>> Date: April 6, 2007 2:16:52 PM EDT >>> To: dag at sonsorol.org >>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>> >>> Hi Chris, >>> >>>> From http://sourceforge.net/project/shownotes.php? >>>> release_id=314770&group_id=18598, >>> i've learned that you're maintaining the class >>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>> case please forward this mail to the maintainer. >>> >>> I've discovered a nasty bug: With some read block sizes the algorithm >>> truncates a few bytes from the end of the stream. I've verified this >>> comparing the gzip/uncompress output for some files versus what >>> org/biojava/utils/io/UncompressInputStream.java generates. >>> >>> Unfortunately i've not discovered the bug yet, but i can contribute >>> with the attached test case. How to verify the bug: >>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>> compare the results. >>> >>> Thanks, >>> Miguel Duarte >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark >> Schreiber ] >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG26Q4C5LeMEKA/QRAiMxAJ4u4RUjTGODjClIM1LIRzP12xNUOgCgifA+ 14CbPaY5SwcG1/wUHJVpl/U= =wBDT -----END PGP SIGNATURE----- From markjschreiber at gmail.com Tue Apr 10 07:29:03 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 10 Apr 2007 19:29:03 +0800 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B60FC.7040903@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk> <461B60FC.7040903@ebi.ac.uk> Message-ID: <93b45ca50704100429u388f5b8ax86ba05e5d05e02a9@mail.gmail.com> Without looking at the code I would guess that dropping 88 bytes could be because of a buffered reader or writer not flushing before it is closed?? - Mark On 4/10/07, Andy Yates wrote: > Okay a quick run of uncompress on the mac with the files in question > does produce a file which is equivalent to the file produced by gzip but > not to the one produced by UncompressInputStream. > > The required md5sum for a pass should be (after a md5 digest): > > 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip > > But we get: > > 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream > > Okay so looks like there is something "wrong". Seems like it drops 88 > bytes from the decompression. > > Wonder what happens if we pass this file type through the > GZIPInputStream from the JDK? > > Andy Yates wrote: > > I don't think there are standard classes for this compression format in > > the SDK. There are ones for GZIP & ZIP but not for LZW which this one is > > dealing with. Also I'm not sure about using GZIP to unzip a file > > compressed with LZW since GZIP uses DEFLATE. > > > > We need to decompress the file using uncompress (which is missing from > > my Linux box but is on the mac ... go figure) and then match that up to > > the output from UncompressInputStream & see if they agree or not. > > > > Andy > > > > Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> I have no idea what it is for. There are generic Java classes provided > >> with the SDK that do the same job. I think we should probably drop it. > >> Lets wait to see if anyone shouts first. > >> > >> mark.schreiber at novartis.com wrote: > >>> Does anyone maintain this class?? > >>> > >>> More to the point, does anyone know what it is for??? If I look at the > >>> Uses link in javadoc there are aparently none at the public or package > >>> level. Additionally why does biojava need one, are there not java.io > >>> classes that can handle compressed streams?? > >>> > >>> Is there a good reason why we cannot just clean it out? > >>> > >>> - Mark > >>> > >>> Mark Schreiber > >>> Research Investigator (Bioinformatics) > >>> > >>> Novartis Institute for Tropical Diseases (NITD) > >>> 10 Biopolis Road > >>> #05-01 Chromos > >>> Singapore 138670 > >>> www.nitd.novartis.com > >>> > >>> phone +65 6722 2973 > >>> fax +65 6722 2910 > >>> > >>> > >>> > >>> > >>> > >>> Chris Dagdigian > >>> Sent by: biojava-dev-bounces at lists.open-bio.org > >>> 04/07/2007 09:52 AM > >>> > >>> > >>> To: biojava-dev at biojava.org > >>> cc: (bcc: Mark Schreiber/GP/Novartis) > >>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java > >>> > >>> > >>> > >>> Passing on this email that came to me ... > >>> > >>> Regards, > >>> Chris Dagdigian > >>> OBF > >>> > >>> > >>> Begin forwarded message: > >>> > >>>> From: "Miguel Duarte" > >>>> Date: April 6, 2007 2:16:52 PM EDT > >>>> To: dag at sonsorol.org > >>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java > >>>> > >>>> Hi Chris, > >>>> > >>>>> From http://sourceforge.net/project/shownotes.php? > >>>>> release_id=314770&group_id=18598, > >>>> i've learned that you're maintaining the class > >>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the > >>>> case please forward this mail to the maintainer. > >>>> > >>>> I've discovered a nasty bug: With some read block sizes the algorithm > >>>> truncates a few bytes from the end of the stream. I've verified this > >>>> comparing the gzip/uncompress output for some files versus what > >>>> org/biojava/utils/io/UncompressInputStream.java generates. > >>>> > >>>> Unfortunately i've not discovered the bug yet, but i can contribute > >>>> with the attached test case. How to verify the bug: > >>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and > >>>> compare the results. > >>>> > >>>> Thanks, > >>>> Miguel Duarte > >>> > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>> > >>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] > >>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] > >>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark > >>> Schreiber ] > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>> > >> -----BEGIN PGP SIGNATURE----- > >> Version: GnuPG v1.4.2.2 (GNU/Linux) > >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > >> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 > >> HoCuWrx5k2ONg/9oxIfVVPI= > >> =cGTy > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From bugzilla-daemon at portal.open-bio.org Tue Apr 10 07:58:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Apr 2007 07:58:59 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704101158.l3ABwxRV028563@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |holland at ebi.ac.uk ------- Comment #1 from holland at ebi.ac.uk 2007-04-10 07:58 EST ------- I think the only reason it doesn't already do so is because when I wrote RichSequence, RichFeatureHolder hadn't been invented yet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 10 08:15:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Apr 2007 08:15:28 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704101215.l3ACFSYY029387@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 ------- Comment #2 from holland at ebi.ac.uk 2007-04-10 08:15 EST ------- Just looked at this a bit closer and found that only Features can hold other Features - RichFeatureHolder represents the FeatureRelationship portion of BioSQL. FeatureRelationships exist between features, and not between sequences and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe it should be FeatureRelationshipHolder, or something like that? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ap3 at sanger.ac.uk Tue Apr 10 09:02:17 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 10 Apr 2007 14:02:17 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B6E90.3090501@ebi.ac.uk> References: <461B6E90.3090501@ebi.ac.uk> Message-ID: > Andreas - did you write the class? If so, then you may understand it > better than the rest of us. Would you be willing to attempt to fix it? No, I did not write it - it is a LGPL class which I found in another project. see http://www.innovation.ch/java/HTTPClient/ or also the header in the file. I will try to have a look at this problem, but not sure if I can fix it quickly. PDB data is still available for download as .Z files, e.g. ftp://ftp.rcsb.org/pub/pdb/data/structures/divided/pdb/ar/ that's why I would need to have some tools for reading these. I agree this is a general problem and the solution does not necessarily have to be part of BioJava. I don;t think any patent got infringed, since the file got committed after they had expired. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From bugzilla-daemon at portal.open-bio.org Wed Apr 11 01:12:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 11 Apr 2007 01:12:29 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704110512.l3B5CT9v008783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 ------- Comment #3 from mark.schreiber at novartis.com 2007-04-11 01:12 EST ------- (In reply to comment #2) > Just looked at this a bit closer and found that only Features can hold other > Features - RichFeatureHolder represents the FeatureRelationship portion of > BioSQL. FeatureRelationships exist between features, and not between sequences > and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe > it should be FeatureRelationshipHolder, or something like that? I would agree with the proposal to rename it. It would save a lot of confusion. It should be a pretty simple task to refactor it but it would need to happen before a release version. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mark.schreiber at novartis.com Wed Apr 11 23:19:18 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 12 Apr 2007 11:19:18 +0800 Subject: [Biojava-dev] javacc Message-ID: Hello - Has anyone ever written a javacc lexer / parser for Genbank (or any of the other major formats?). - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From bugzilla-daemon at portal.open-bio.org Fri Apr 13 01:08:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Apr 2007 01:08:33 -0400 Subject: [Biojava-dev] [Bug 2273] New: More problems writing uniprot files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2273 Summary: More problems writing uniprot files Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: gwaldon at geneinfinity.org I found a few problems during the writing of uniprot files. Using P04941 as a test exemple: 1. The ID line does not appear with a fix format (this is probably not a bug actually): (before/after - read/write) ID KV6A7_MOUSE Reviewed; 107 AA. ID KV6A7_MOUSE Reviewed; 107 AA. 2. The reference title get truncated at the end by one character after each read/write operation: RT phenyloxazolone and its early diversification."; RT phenyloxazolone and its early diversification"; RT phenyloxazolone and its early diversificatio"; ... 3. The FT line is not formatted correctly; this is a bug because the FT line has a fixed format, the I of Ig should be at position 35: (before/after - read/write) FT CHAIN 1 >107 Ig kappa chain V-VI region NQ2-48.2.2. FT CHAIN 1 107> Ig kappa chain V-VI region NQ2-48.2.2. 4. SQ line, are-these exactly the same CRC64 number? SQ SEQUENCE 107 AA; 11557 MW; 72488DA9EF354934 CRC64; SQ SEQUENCE 107 AA; 11564 MW; ffffffffe278ca323958dd50 CRC64; - George -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From invite at facebook.com Fri Apr 13 13:12:04 2007 From: invite at facebook.com (Biswaroop Ghosh) Date: Fri, 13 Apr 2007 10:12:04 -0700 Subject: [Biojava-dev] I've added you as a friend on Facebook... Message-ID: <82cdae6537486b8fd6048bc766c5c1c5@register.facebook.com> I've requested to add you as a friend on Facebook. You can use Facebook to see the profiles of the people around you, share photos, and connect with friends. Now everyone can join Facebook, even if you couldn't before. Thanks, Biswaroop P.S. Here's the link: http://www.facebook.com/p.php?i=695556070&k=10ae46824c&r&v=2 From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:08:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:08:29 -0400 Subject: [Biojava-dev] [Bug 2273] More problems writing uniprot files In-Reply-To: Message-ID: <200704171208.l3HC8T2G004508@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2273 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:08 EST ------- I have fixed points 1-3. Point 4 I have raised as a new bug for someone else to fix - the problem goes deeper than just UniProtFormat! Can you check the code I have committed in CVS and update this bug accordingly with what you find. I have not written a unit test as I'm very busy at present and don't have the time. If you could add one in that would be great. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:11:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:11:11 -0400 Subject: [Biojava-dev] [Bug 2274] New: CRC64 checksum toString() returning incorrect values Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2274 Summary: CRC64 checksum toString() returning incorrect values Product: BioJava Version: live (CVS source) Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Others AssignedTo: biojava-dev at biojava.org ReportedBy: holland at ebi.ac.uk In org.biojavax.utils.CRC64Checksum the toString() method returns 24-character strings, when CRC64 checksums are only 16-character. Also need to check that the correct polynomials etc. are being used. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:19:04 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:19:04 -0400 Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning incorrect values In-Reply-To: Message-ID: <200704171219.l3HCJ4op005457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2274 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:19 EST ------- Fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:19:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:19:15 -0400 Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning incorrect values In-Reply-To: Message-ID: <200704171219.l3HCJFwu005500@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2274 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 08:25:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:25:50 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704171225.l3HCPo16006114@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from holland at ebi.ac.uk 2007-04-17 08:25 EST ------- Done. Renamed to RichFeatureRelationshipHolder and removed reference to RichFeatureHolder as it is technically no such thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jburdick at keyfitz.org Thu Apr 19 14:26:55 2007 From: jburdick at keyfitz.org (Josh Burdick) Date: Thu, 19 Apr 2007 14:26:55 -0400 Subject: [Biojava-dev] reading a subsequence from a .nib file In-Reply-To: References: Message-ID: <1177007215.5481.4.camel@localhost.localdomain> On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote: > Hi - > > Too my knowledge nothing like this exists in BioJava. Could someone take > it the last mile and make it produce SymbolLists? > I went ahead and added a method getSymbolListByLocation() which takes the string and converts it to a SymbolList using DNATools. There are bound to be more efficient ways to do this, but I think this a reasonable start. The files are in the same locations: http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java Hopefully someone will find this code useful. Josh > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > [...] From gwaldon at geneinfinity.org Thu Apr 19 17:05:09 2007 From: gwaldon at geneinfinity.org (george waldon) Date: Thu, 19 Apr 2007 14:05:09 -0700 Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM Message-ID: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com> LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format. This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence. I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations? Thanks, George From mark.schreiber at novartis.com Thu Apr 19 22:44:51 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 20 Apr 2007 10:44:51 +0800 Subject: [Biojava-dev] reading a subsequence from a .nib file Message-ID: Hi Josh - Looks good. Just one thing, your JUnit test contains a hardcoded file path to the test file which means it is not portable. Could you modify that so that it loads the file from the classpath as a resource (see some of the IO unit tests for examples). Can you also provide the test file. Best regards, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Josh Burdick 04/20/2007 02:26 AM To: mark.schreiber at novartis.com cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] reading a subsequence from a .nib file On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote: > Hi - > > Too my knowledge nothing like this exists in BioJava. Could someone take > it the last mile and make it produce SymbolLists? > I went ahead and added a method getSymbolListByLocation() which takes the string and converts it to a SymbolList using DNATools. There are bound to be more efficient ways to do this, but I think this a reasonable start. The files are in the same locations: http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java Hopefully someone will find this code useful. Josh > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > [...] From mark.schreiber at novartis.com Thu Apr 19 22:46:56 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 20 Apr 2007 10:46:56 +0800 Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM Message-ID: I think this sounds sensible. Having another term for something that can be derived from the alphabet is redundant. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "george waldon" Sent by: biojava-dev-bounces at lists.open-bio.org 04/20/2007 05:05 AM Please respond to george waldon To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format. This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence. I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations? Thanks, George _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Fri Apr 20 04:29:18 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 20 Apr 2007 09:29:18 +0100 Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM In-Reply-To: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com> References: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com> Message-ID: <462879DE.5000507@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 No objections. I think your logic is better than mine here. Please go ahead. cheers, Richard george waldon wrote: > LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format. > > This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence. > > I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations? > > Thanks, > George > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGKHne4C5LeMEKA/QRAsD6AKCN4Nj7LMk3fCjAcrfE1Lw+Se3FJQCdH142 Sz6DYxYj1HedeHPZJpejtQs= =9HDc -----END PGP SIGNATURE----- From jburdick at keyfitz.org Wed Apr 11 13:42:18 2007 From: jburdick at keyfitz.org (Josh Burdick) Date: Wed, 11 Apr 2007 17:42:18 -0000 Subject: [Biojava-dev] reading a subsequence from a .nib file In-Reply-To: References: Message-ID: <1176312083.21937.42.camel@localhost.localdomain> On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote: > Hi - > > Too my knowledge nothing like this exists in BioJava. Could someone take > it the last mile and make it produce SymbolLists? > I added a method that just takes the string and makes it into a SymbolList using DNATools. This is somewhat inefficient (you can make a SymbolList directly as an array of numbers, but I wasn't certain enough that I understood it to try that.) The package name should be changed, and the test code should probably do somewhat more, but other than that, if someone wants to add it, feel free. (The two files are at the same location as before.) Josh > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > From bugzilla-daemon at portal.open-bio.org Sun Apr 1 17:03:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Apr 2007 13:03:15 -0400 Subject: [Biojava-dev] [Bug 2253] NullPointerException in MultiSourceCompoundRichLocation In-Reply-To: Message-ID: <200704011703.l31H3FTF011220@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2253 gwaldon at geneinfinity.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |CLOSED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From russ at kepler-eng.com Mon Apr 2 18:59:26 2007 From: russ at kepler-eng.com (Russ Kepler) Date: Mon, 2 Apr 2007 12:59:26 -0600 Subject: [Biojava-dev] Changing the sample name of the ABI file In-Reply-To: References: Message-ID: <200704021259.26814.russ@kepler-eng.com> On Tuesday 13 February 2007 00:01, Lee Heewook wrote: > Is there way to change the sample name of the ABI file? I'm not sure what're you're asking. There's a sample name field in the file in SMPL, to rewrite that you'd have to rewrite pretty much the whole file. Most of the time I found it easier to re-export the data from the instrument. But if you're parsing the name of the file for info (frequently done as grubbing in the file is a PITA) then you can usually simply change the file name. From gwaldon at geneinfinity.org Mon Apr 2 23:56:09 2007 From: gwaldon at geneinfinity.org (george waldon) Date: Mon, 02 Apr 2007 16:56:09 -0700 Subject: [Biojava-dev] Isoelectric point calculation Message-ID: <20070402235609.52369.qmail@mmm1924.dulles19-verio.com> Hi, Trying to solve the problem of symbol ambiguity in pI calculation that was brought to our attention on Biojava-1, I found a few problems, in particular (!) calculated pI values are incorrect, BinarySearch throw exceptions, and the ResidueProperties.xml has some strange values, such as pK of Glu at -4.25. The class IsoelectricPointCalc was written a long time ago and I hope to get in touch with the original author and have a corrected code rapidly. As a general rule, scientific biomethods and biodata put in biojava need precise literature references. Javadocs are a good place for that. - George From mark.schreiber at novartis.com Tue Apr 3 01:03:20 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 3 Apr 2007 09:03:20 +0800 Subject: [Biojava-dev] reading a subsequence from a .nib file Message-ID: Hi - Too my knowledge nothing like this exists in BioJava. Could someone take it the last mile and make it produce SymbolLists? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Josh Burdick Sent by: biojava-dev-bounces at lists.open-bio.org 01/23/2007 12:29 AM To: biojava-dev at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] reading a subsequence from a .nib file I wrote some code to read a chunk of DNA sequence from a file in Jim Kent's blat ".nib" file format. This is a simple format using four bits/base. I didn't attach the code, to avoid spamming the whole list; but it, and a (very crude!) JUnit test, are at http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java You could use 2 bits/base, but then you can't have ambiguous bases. 4 bits/base seems like a reasonable compromise; plus sites that have "blat" installed will need to have the .nib files on a server somewhere anyway, and this way repeat-masking can be included, which may be convenient. Also, it doesn't support writing a .nib file; again, presumably people will be using Jim Kent's faToNib program to do that. It would need some tweaking to be included in BioJava, because it returns a plain String of ACGT, instead of a PackedSequence object. (Probably this would just involve rewriting the setupBuffer() and addToBuffer() methods in the code.) Also, the coordinate information could come from a Range object. If similar code is already somewhere in BioJava, please ignore this; but I couldn't find it with thirty seconds of Googling, so I figured it hadn't been written... Josh Burdick programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia jburdick at keyfitz.org _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Tue Apr 3 01:06:11 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 3 Apr 2007 09:06:11 +0800 Subject: [Biojava-dev] Isoelectric point calculation Message-ID: Hi George - This probably should be reported as a bug in bugzilla to make sure we get around to fixing it. Thanks, - Mark "george waldon" Sent by: biojava-dev-bounces at lists.open-bio.org 04/03/2007 07:56 AM Please respond to george waldon To: biojava-dev at biojava.org cc: smh1008 at cam.ac.uk, (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] Isoelectric point calculation Hi, Trying to solve the problem of symbol ambiguity in pI calculation that was brought to our attention on Biojava-1, I found a few problems, in particular (!) calculated pI values are incorrect, BinarySearch throw exceptions, and the ResidueProperties.xml has some strange values, such as pK of Glu at -4.25. The class IsoelectricPointCalc was written a long time ago and I hope to get in touch with the original author and have a corrected code rapidly. As a general rule, scientific biomethods and biodata put in biojava need precise literature references. Javadocs are a good place for that. - George _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Tue Apr 3 01:09:10 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 3 Apr 2007 09:09:10 +0800 Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG Message-ID: Hi Caroline - Could you post some example code that we could use to replicate the problem? Thanks. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Caroline Renaux" Sent by: biojava-dev-bounces at lists.open-bio.org 03/26/2007 09:18 PM To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] org.biojava.bio.symbol.UkkonenSuffixTree.class BUG Bonjour, j'ai r?cemment utilis? le Package org.biojava.bio.symbol et plus particuli?rement la classe UkkonenSuffixTree. Cependant lorsque que je veux ajouter un ensemble de s?quences ? l'arbre et que je les s?pares par le caract?re de s?paration '$' cel? ne fonctionne pas. Lorsqu'il traite la seconde s?quence j'obtiens une erreur "NullPointerException" dans la m?thode jumpTo ? la ligne : arrivedAt=(SuffixNode)currentNode.children.get(*new* Character(source.charAt (from))); Je ne comprend pas ce que j'aurai pu faire de travers. D'avance merci de votre r?ponse. RENAUX C. -------------------------------- Hello, I used for a java application the org.biojava.bio.symbol package and particularly the UkkonenSuffixTree class. When i want to add a set of sequences to the tree, i add a '$' between the sequences but it doesn't work. I have a NullPointerException when the system add the second sequence int the method jumTo at the line : arrivedAt=(SuffixNode)currentNode.children.get(*new* Character(source.charAt (from))); I don't understand why it doesn't work. Thank you in advance. RENAUX C. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From bugzilla-daemon at portal.open-bio.org Tue Apr 3 04:03:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 00:03:53 -0400 Subject: [Biojava-dev] [Bug 2244] uniprot files do not load In-Reply-To: Message-ID: <200704030403.l3343rrF032035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2244 ------- Comment #9 from gwaldon at geneinfinity.org 2007-04-03 00:03 EST ------- *** Bug 2249 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 3 07:12:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 03:12:48 -0400 Subject: [Biojava-dev] [Bug 2258] New: ConcurrentModificationException in SimpleRichAnnotation Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2258 Summary: ConcurrentModificationException in SimpleRichAnnotation Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq AssignedTo: biojava-dev at biojava.org ReportedBy: gwaldon at geneinfinity.org Exception thrown by the method clear(), apparently resulting of trying to change the note set while iterating over it. - George -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From holland at ebi.ac.uk Tue Apr 3 10:00:01 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 03 Apr 2007 11:00:01 +0100 Subject: [Biojava-dev] JDBCPooledDataSource regression In-Reply-To: <45C090A6.10909@ebi.ac.uk> References: <416B41DF-91E1-4D1F-A4B4-799FE712B032@sanger.ac.uk> <45C0781B.7030304@ebi.ac.uk> <5DDBC6F5-7DC3-4446-A982-3CD9B3931A06@sanger.ac.uk> <45C08639.7080600@ebi.ac.uk> <8689C307-0643-46D9-90A6-A9958681D1D0@sanger.ac.uk> <45C08B7A.3060102@ebi.ac.uk> <45C08E4E.2050308@ebi.ac.uk> <45C090A6.10909@ebi.ac.uk> Message-ID: <461225A1.4000705@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Great stuff. You should commit it when you get your CVS account. :) There's one or two typos (can't spell deprecated!) but I'm sure once you get the rest of BioJava into Eclipse or something to make this change permanent they'll show up. cheers, Richard Andy Yates wrote: > Okay I've attached the fix here. > > I just did this in a text editor but I believe that the imports are > okay. If you can just do a quick scan as well to make sure I haven't > deleted anything that was very important. > > I'll get on to the helpdesk now as well :) > > Andy > > Richard Holland wrote: > Andy could you make the change to your local copy of the source file and > email the file to me, that way I can make sure I don't get it wrong when > I commit it. > > Richard. > > PS. You should probably have your own CVS account - email the OBF > helpdesk and ask for one, saying I told you to. :) > > > Andy Yates wrote: >>>> Thomas Down wrote: >>>>> On 31 Jan 2007, at 12:06, Andy Yates wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Sorry I was meaning if that if that method just becomes: >>>>>> >>>>>> public static DataSource getDataSource(final String driver, >>>>>> final String url, >>>>>> final String user, >>>>>> final String pass) >>>>>> throws Exception { >>>>>> >>>>>> BasicDataSource ds = new BasicDataSource(); >>>>>> ds.setUrl(url); >>>>>> ds.setDriverClassName(driver); >>>>>> ds.setUsername(user); >>>>>> ds.setPassword(pass); >>>>>> // Set BasicDataSource properties such as maxActive and >>>>>> maxIdle, as described in >>>>>> // >>>>>> http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html >>>>>> >>>>>> ds.setMaxActive(10); >>>>>> ds.setMaxIdle(5); >>>>>> ds.setMaxWait(10000); >>>>>> >>>>>> return ds; >>>>>> } >>>>>> >>>>>> Does that still work? >>>>> Hmmm, I was assuming that BasicDataSource didn't actually do any >>>>> pooling itself, and that you needed another layer on top to manage a >>>>> connection pool -- that seems to be how all previous revisions of >>>>> JDBCConnectionPool worked, so I guess I wasn't alone in thinking >>>>> this. But yes, BasicDataSource does seem to do pooling itself >>>>> (confirmed by reading the source), so maybe your simpler version is >>>>> a better idea. It certainly works okay for me. >>>>> >>>>> Thomas. >>>> That's what I thought should have happened :). Can I suggest that >>>> this revised version goes into CVS? Anyone got any objections? >>>> >>>> Andy >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> > ------------------------------------------------------------------------ > /* > * BioJava development code > * > * This code may be freely distributed and modified under the > * terms of the GNU Lesser General Public Licence. This should > * be distributed with the code. If you do not have a copy, > * see: > * > * http://www.gnu.org/copyleft/lesser.html > * > * Copyright for this code is held jointly by the individual > * authors. These should be listed in @author doc comments. > * > * For more information on the BioJava project and its aims, > * or to join the biojava-l mailing list, visit the home page > * at: > * > * http://www.biojava.org/ > * > */ > package org.biojava.utils; > import javax.sql.DataSource; > import org.apache.commons.dbcp.BasicDataSource; > import org.apache.commons.dbcp.PoolingDataSource; > import org.apache.commons.pool.ObjectPool; > /** > * Returns a DataSource that implements connection pooling > * > * Uses Jakarta Commons DBCP and Pool packages. > * See the description of the dbcp package at > * http://jakarta.apache.org/commons/dbcp/api/overview-summary.html#overview_description > * > * @author Simon Foote > * @author Len Trigg > */ > public class JDBCPooledDataSource { > public static DataSource getDataSource(final String driver, > final String url, > final String user, > final String pass) > throws Exception { > BasicDataSource ds = new BasicDataSource(); > ds.setUrl(url); > ds.setDriverClassName(driver); > ds.setUsername(user); > ds.setPassword(pass); > // Set BasicDataSource properties such as maxActive and maxIdle, as described in > // http://jakarta.apache.org/commons/dbcp/api/org/apache/commons/dbcp/BasicDataSource.html > ds.setMaxActive(10); > ds.setMaxIdle(5); > ds.setMaxWait(10000); > return dataSource; > } > // Adds simple equals and hashcode methods so that we can compare if > // two connections are to the same database. This will fail if the > // DataSource is redirected to another database etc (I doubt this is > // ever likely to be used). > /** > * @depercated This is no longer used in favor of {@link BasicDataSource} > * from DBCP > */ > static class MyPoolingDataSource extends PoolingDataSource { > final String source; > public MyPoolingDataSource(ObjectPool connectionPool, String source) { > super(connectionPool); > this.source = source; > } > public boolean equals(Object o2) { > if ((o2 == null) || !(o2 instanceof MyPoolingDataSource)) { > return false; > } > MyPoolingDataSource b2 = (MyPoolingDataSource) o2; > return source.equals(b2.source); > } > public int hashCode() { > return source.hashCode(); > } > } > public static void main(String[] args) { > try { > DataSource ds1 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", ""); > DataSource ds2 = getDataSource("org.hsqldb.jdbcDriver", "jdbc:hsqldb:/tmp/hsqldb/biosql", "sa", ""); > System.err.println(ds1); > System.err.println(ds2); > System.err.println(ds1.equals(ds2)); > } catch (Exception e) { > e.printStackTrace(); > } > } > } > ------------------------------------------------------------------------ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGEiWh4C5LeMEKA/QRAgYbAJ4yoE6dsLuOOS8sg1wOCybV6rsNUwCeN0c8 oiFz/0yblV4P8a35RbU+nDM= =imiK -----END PGP SIGNATURE----- From bugzilla-daemon at portal.open-bio.org Tue Apr 3 10:05:22 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 06:05:22 -0400 Subject: [Biojava-dev] [Bug 2038] test bug In-Reply-To: Message-ID: <200704031005.l33A5M9N015780@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2038 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:05 EST ------- This has been lying around for ages, thought I'd tidy it up. :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 3 10:20:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 06:20:48 -0400 Subject: [Biojava-dev] [Bug 2258] ConcurrentModificationException in SimpleRichAnnotation In-Reply-To: Message-ID: <200704031020.l33AKmpP016736@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2258 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from holland at ebi.ac.uk 2007-04-03 06:20 EST ------- Fixed later today in CVS. Test also added. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 3 10:23:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Apr 2007 06:23:54 -0400 Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer In-Reply-To: Message-ID: <200704031023.l33ANsV1016903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2107 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from holland at ebi.ac.uk 2007-04-03 06:23 EST ------- I have committed Jolyon's changes (or will do so later today). These are untestable with a JUnit so no test has been added. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 4 05:37:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Apr 2007 01:37:59 -0400 Subject: [Biojava-dev] [Bug 2260] New: Bug in UkkonenSuffixTree Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2260 Summary: Bug in UkkonenSuffixTree Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: symbol AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com There is a bug in the UkkonenSuffixTree when one tries to add concatenated Strings which are delimited with a $. This doesn't seem to be a problem when Strings are added individually. A simple work around is to add Strings individually. The following code causes the bug: public class Main { /** Creates a new instance of Main */ public Main() { } /** * @param args the command line arguments */ public static void main(String[] args) { String seqs = "atcgcgcgcgctcggcctgggggctcgcgct$acgggtggtggt"; UkkonenSuffixTree suff = new UkkonenSuffixTree(seqs); } } Someone with a better knowledge of suffix trees than I have would need to look at this... Additionally there are several places in code where variables are declared and never used or declared globally when they need not be or are not declared final when they are never modified. There is also System.out.println() statements that should be messages in exceptions or errors. The code could do with a good clean up. I am marking this as minor because there is a work around. The simplest thing might be to disable the advertised feature of being able to deal with concatenated strings as it seems it cannot and probably never has been able to. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 4 05:46:02 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Apr 2007 01:46:02 -0400 Subject: [Biojava-dev] [Bug 2261] New: Request for enhancement of RichSequence Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2261 Summary: Request for enhancement of RichSequence Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: seq AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com RichSequence implements FeatureHolder but it would be nice if it could implement RichFeature holder. This would require the addition of four methods to any implementations of the RichSequence interface but would avoid endless casting. If we do it before an official release of bj1.5 we won't strictly be breaking the interface. Is there any reason why we should not do this?? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dag at sonsorol.org Sat Apr 7 01:52:01 2007 From: dag at sonsorol.org (Chris Dagdigian) Date: Fri, 6 Apr 2007 21:52:01 -0400 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> Message-ID: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> Passing on this email that came to me ... Regards, Chris Dagdigian OBF Begin forwarded message: > From: "Miguel Duarte" > Date: April 6, 2007 2:16:52 PM EDT > To: dag at sonsorol.org > Subject: Bug in org/biojava/utils/io/UncompressInputStream.java > > Hi Chris, > >> From http://sourceforge.net/project/shownotes.php? >> release_id=314770&group_id=18598, > i've learned that you're maintaining the class > org/biojava/utils/io/UncompressInputStream.java. If that's not the > case please forward this mail to the maintainer. > > I've discovered a nasty bug: With some read block sizes the algorithm > truncates a few bytes from the end of the stream. I've verified this > comparing the gzip/uncompress output for some files versus what > org/biojava/utils/io/UncompressInputStream.java generates. > > Unfortunately i've not discovered the bug yet, but i can contribute > with the attached test case. How to verify the bug: > uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and > compare the results. > > Thanks, > Miguel Duarte -------------- next part -------------- A non-text attachment was scrubbed... Name: BH_03834.MCR.Z Type: application/x-compress Size: 26405 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: uncompressed_by_gzip Type: application/octet-stream Size: 81920 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: uncompressed_by_uncompressInputStream Type: application/octet-stream Size: 81832 bytes Desc: not available URL: -------------- next part -------------- From mark.schreiber at novartis.com Mon Apr 9 02:13:12 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 9 Apr 2007 10:13:12 +0800 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java Message-ID: Does anyone maintain this class?? More to the point, does anyone know what it is for??? If I look at the Uses link in javadoc there are aparently none at the public or package level. Additionally why does biojava need one, are there not java.io classes that can handle compressed streams?? Is there a good reason why we cannot just clean it out? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Chris Dagdigian Sent by: biojava-dev-bounces at lists.open-bio.org 04/07/2007 09:52 AM To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java Passing on this email that came to me ... Regards, Chris Dagdigian OBF Begin forwarded message: > From: "Miguel Duarte" > Date: April 6, 2007 2:16:52 PM EDT > To: dag at sonsorol.org > Subject: Bug in org/biojava/utils/io/UncompressInputStream.java > > Hi Chris, > >> From http://sourceforge.net/project/shownotes.php? >> release_id=314770&group_id=18598, > i've learned that you're maintaining the class > org/biojava/utils/io/UncompressInputStream.java. If that's not the > case please forward this mail to the maintainer. > > I've discovered a nasty bug: With some read block sizes the algorithm > truncates a few bytes from the end of the stream. I've verified this > comparing the gzip/uncompress output for some files versus what > org/biojava/utils/io/UncompressInputStream.java generates. > > Unfortunately i've not discovered the bug yet, but i can contribute > with the attached test case. How to verify the bug: > uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and > compare the results. > > Thanks, > Miguel Duarte _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark Schreiber ] From holland at ebi.ac.uk Tue Apr 10 08:59:32 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 09:59:32 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: References: Message-ID: <461B51F4.8010402@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have no idea what it is for. There are generic Java classes provided with the SDK that do the same job. I think we should probably drop it. Lets wait to see if anyone shouts first. mark.schreiber at novartis.com wrote: > Does anyone maintain this class?? > > More to the point, does anyone know what it is for??? If I look at the > Uses link in javadoc there are aparently none at the public or package > level. Additionally why does biojava need one, are there not java.io > classes that can handle compressed streams?? > > Is there a good reason why we cannot just clean it out? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > Chris Dagdigian > Sent by: biojava-dev-bounces at lists.open-bio.org > 04/07/2007 09:52 AM > > > To: biojava-dev at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java > > > > Passing on this email that came to me ... > > Regards, > Chris Dagdigian > OBF > > > Begin forwarded message: > >> From: "Miguel Duarte" >> Date: April 6, 2007 2:16:52 PM EDT >> To: dag at sonsorol.org >> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> Hi Chris, >> >>> From http://sourceforge.net/project/shownotes.php? >>> release_id=314770&group_id=18598, >> i've learned that you're maintaining the class >> org/biojava/utils/io/UncompressInputStream.java. If that's not the >> case please forward this mail to the maintainer. >> >> I've discovered a nasty bug: With some read block sizes the algorithm >> truncates a few bytes from the end of the stream. I've verified this >> comparing the gzip/uncompress output for some files versus what >> org/biojava/utils/io/UncompressInputStream.java generates. >> >> Unfortunately i've not discovered the bug yet, but i can contribute >> with the attached test case. How to verify the bug: >> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >> compare the results. >> >> Thanks, >> Miguel Duarte > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark > Schreiber ] > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 HoCuWrx5k2ONg/9oxIfVVPI= =cGTy -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Tue Apr 10 09:56:57 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 10:56:57 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B51F4.8010402@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> Message-ID: <461B5F69.9060506@ebi.ac.uk> I don't think there are standard classes for this compression format in the SDK. There are ones for GZIP & ZIP but not for LZW which this one is dealing with. Also I'm not sure about using GZIP to unzip a file compressed with LZW since GZIP uses DEFLATE. We need to decompress the file using uncompress (which is missing from my Linux box but is on the mac ... go figure) and then match that up to the output from UncompressInputStream & see if they agree or not. Andy Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I have no idea what it is for. There are generic Java classes provided > with the SDK that do the same job. I think we should probably drop it. > Lets wait to see if anyone shouts first. > > mark.schreiber at novartis.com wrote: >> Does anyone maintain this class?? >> >> More to the point, does anyone know what it is for??? If I look at the >> Uses link in javadoc there are aparently none at the public or package >> level. Additionally why does biojava need one, are there not java.io >> classes that can handle compressed streams?? >> >> Is there a good reason why we cannot just clean it out? >> >> - Mark >> >> Mark Schreiber >> Research Investigator (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> >> >> >> >> Chris Dagdigian >> Sent by: biojava-dev-bounces at lists.open-bio.org >> 04/07/2007 09:52 AM >> >> >> To: biojava-dev at biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> >> >> Passing on this email that came to me ... >> >> Regards, >> Chris Dagdigian >> OBF >> >> >> Begin forwarded message: >> >>> From: "Miguel Duarte" >>> Date: April 6, 2007 2:16:52 PM EDT >>> To: dag at sonsorol.org >>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>> >>> Hi Chris, >>> >>>> From http://sourceforge.net/project/shownotes.php? >>>> release_id=314770&group_id=18598, >>> i've learned that you're maintaining the class >>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>> case please forward this mail to the maintainer. >>> >>> I've discovered a nasty bug: With some read block sizes the algorithm >>> truncates a few bytes from the end of the stream. I've verified this >>> comparing the gzip/uncompress output for some files versus what >>> org/biojava/utils/io/UncompressInputStream.java generates. >>> >>> Unfortunately i've not discovered the bug yet, but i can contribute >>> with the attached test case. How to verify the bug: >>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>> compare the results. >>> >>> Thanks, >>> Miguel Duarte >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark >> Schreiber ] >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 > HoCuWrx5k2ONg/9oxIfVVPI= > =cGTy > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From ayates at ebi.ac.uk Tue Apr 10 10:03:40 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 11:03:40 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B5F69.9060506@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk> Message-ID: <461B60FC.7040903@ebi.ac.uk> Okay a quick run of uncompress on the mac with the files in question does produce a file which is equivalent to the file produced by gzip but not to the one produced by UncompressInputStream. The required md5sum for a pass should be (after a md5 digest): 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip But we get: 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream Okay so looks like there is something "wrong". Seems like it drops 88 bytes from the decompression. Wonder what happens if we pass this file type through the GZIPInputStream from the JDK? Andy Yates wrote: > I don't think there are standard classes for this compression format in > the SDK. There are ones for GZIP & ZIP but not for LZW which this one is > dealing with. Also I'm not sure about using GZIP to unzip a file > compressed with LZW since GZIP uses DEFLATE. > > We need to decompress the file using uncompress (which is missing from > my Linux box but is on the mac ... go figure) and then match that up to > the output from UncompressInputStream & see if they agree or not. > > Andy > > Richard Holland wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> I have no idea what it is for. There are generic Java classes provided >> with the SDK that do the same job. I think we should probably drop it. >> Lets wait to see if anyone shouts first. >> >> mark.schreiber at novartis.com wrote: >>> Does anyone maintain this class?? >>> >>> More to the point, does anyone know what it is for??? If I look at the >>> Uses link in javadoc there are aparently none at the public or package >>> level. Additionally why does biojava need one, are there not java.io >>> classes that can handle compressed streams?? >>> >>> Is there a good reason why we cannot just clean it out? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> >>> >>> >>> >>> Chris Dagdigian >>> Sent by: biojava-dev-bounces at lists.open-bio.org >>> 04/07/2007 09:52 AM >>> >>> >>> To: biojava-dev at biojava.org >>> cc: (bcc: Mark Schreiber/GP/Novartis) >>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java >>> >>> >>> >>> Passing on this email that came to me ... >>> >>> Regards, >>> Chris Dagdigian >>> OBF >>> >>> >>> Begin forwarded message: >>> >>>> From: "Miguel Duarte" >>>> Date: April 6, 2007 2:16:52 PM EDT >>>> To: dag at sonsorol.org >>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>> >>>> Hi Chris, >>>> >>>>> From http://sourceforge.net/project/shownotes.php? >>>>> release_id=314770&group_id=18598, >>>> i've learned that you're maintaining the class >>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>> case please forward this mail to the maintainer. >>>> >>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>> truncates a few bytes from the end of the stream. I've verified this >>>> comparing the gzip/uncompress output for some files versus what >>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>> >>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>> with the attached test case. How to verify the bug: >>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>> compare the results. >>>> >>>> Thanks, >>>> Miguel Duarte >>> >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark >>> Schreiber ] >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 >> HoCuWrx5k2ONg/9oxIfVVPI= >> =cGTy >> -----END PGP SIGNATURE----- >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Tue Apr 10 10:09:04 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 11:09:04 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> Message-ID: <461B6240.2070907@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 AFAIK the Zip algorithm is just LZW with bells on, so it should produce exactly the same results. Chris Dagdigian wrote: > > Passing on this email that came to me ... > > Regards, > Chris Dagdigian > OBF > > > Begin forwarded message: > >> From: "Miguel Duarte" >> Date: April 6, 2007 2:16:52 PM EDT >> To: dag at sonsorol.org >> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> Hi Chris, >> >>> From >>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598, >>> >> i've learned that you're maintaining the class >> org/biojava/utils/io/UncompressInputStream.java. If that's not the >> case please forward this mail to the maintainer. >> >> I've discovered a nasty bug: With some read block sizes the algorithm >> truncates a few bytes from the end of the stream. I've verified this >> comparing the gzip/uncompress output for some files versus what >> org/biojava/utils/io/UncompressInputStream.java generates. >> >> Unfortunately i've not discovered the bug yet, but i can contribute >> with the attached test case. How to verify the bug: >> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >> compare the results. >> >> Thanks, >> Miguel Duarte > > > ------------------------------------------------------------------------ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG2JA4C5LeMEKA/QRAjutAJ9cZbqpoag2Z5aQd4gbOAiMm78VZACdHzER UoIhheyTE1805rMBzG4R+Q0= =hfN2 -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Tue Apr 10 10:12:58 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 11:12:58 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B60FC.7040903@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk> <461B60FC.7040903@ebi.ac.uk> Message-ID: <461B632A.10101@ebi.ac.uk> Quick program to pass it through the GZIPInputStream chucks an IOException saying it's not in the GZIP format (which it isn't). Also passing it through the ZipInputStream seems to do nothing. At any rate it looks like we cannot get rid of this class; it's got to be fixed/maintained Andy Yates wrote: > Okay a quick run of uncompress on the mac with the files in question > does produce a file which is equivalent to the file produced by gzip but > not to the one produced by UncompressInputStream. > > The required md5sum for a pass should be (after a md5 digest): > > 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip > > But we get: > > 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream > > Okay so looks like there is something "wrong". Seems like it drops 88 > bytes from the decompression. > > Wonder what happens if we pass this file type through the > GZIPInputStream from the JDK? > > Andy Yates wrote: >> I don't think there are standard classes for this compression format >> in the SDK. There are ones for GZIP & ZIP but not for LZW which this >> one is dealing with. Also I'm not sure about using GZIP to unzip a >> file compressed with LZW since GZIP uses DEFLATE. >> >> We need to decompress the file using uncompress (which is missing from >> my Linux box but is on the mac ... go figure) and then match that up >> to the output from UncompressInputStream & see if they agree or not. >> >> Andy >> >> Richard Holland wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> I have no idea what it is for. There are generic Java classes provided >>> with the SDK that do the same job. I think we should probably drop it. >>> Lets wait to see if anyone shouts first. >>> >>> mark.schreiber at novartis.com wrote: >>>> Does anyone maintain this class?? >>>> >>>> More to the point, does anyone know what it is for??? If I look at >>>> the Uses link in javadoc there are aparently none at the public or >>>> package level. Additionally why does biojava need one, are there not >>>> java.io classes that can handle compressed streams?? >>>> >>>> Is there a good reason why we cannot just clean it out? >>>> >>>> - Mark >>>> >>>> Mark Schreiber >>>> Research Investigator (Bioinformatics) >>>> >>>> Novartis Institute for Tropical Diseases (NITD) >>>> 10 Biopolis Road >>>> #05-01 Chromos >>>> Singapore 138670 >>>> www.nitd.novartis.com >>>> >>>> phone +65 6722 2973 >>>> fax +65 6722 2910 >>>> >>>> >>>> >>>> >>>> >>>> Chris Dagdigian >>>> Sent by: biojava-dev-bounces at lists.open-bio.org >>>> 04/07/2007 09:52 AM >>>> >>>> >>>> To: biojava-dev at biojava.org >>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>> Subject: [Biojava-dev] Fwd: Bug in >>>> org/biojava/utils/io/UncompressInputStream.java >>>> >>>> >>>> >>>> Passing on this email that came to me ... >>>> >>>> Regards, >>>> Chris Dagdigian >>>> OBF >>>> >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Miguel Duarte" >>>>> Date: April 6, 2007 2:16:52 PM EDT >>>>> To: dag at sonsorol.org >>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>>> >>>>> Hi Chris, >>>>> >>>>>> From http://sourceforge.net/project/shownotes.php? >>>>>> release_id=314770&group_id=18598, >>>>> i've learned that you're maintaining the class >>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>>> case please forward this mail to the maintainer. >>>>> >>>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>>> truncates a few bytes from the end of the stream. I've verified this >>>>> comparing the gzip/uncompress output for some files versus what >>>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>>> >>>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>>> with the attached test case. How to verify the bug: >>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>>> compare the results. >>>>> >>>>> Thanks, >>>>> Miguel Duarte >>>> >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by >>>> Mark Schreiber ] >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 >>> HoCuWrx5k2ONg/9oxIfVVPI= >>> =cGTy >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Tue Apr 10 10:37:36 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 11:37:36 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B6443.9060300@ebi.ac.uk> References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> <461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk> Message-ID: <461B68F0.4000908@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Why are these files in compress/uncompress format? Is it proprietary software creating them, or a legacy system of some kind? Wouldn't gzip give better results both in terms of compression ratios and performance as it is far more up-to-date? I believe that the JDK doesn't support LZW because LZW was patented, and that patent expired only very recently (in 2003/4/5/6 depending on where you live and in what form you use LZW): http://www.gnu.org/philosophy/gif.html It's one of those wonderful cases where the patent enforcement caused the algorithm it was protecting to get dumped and forgotten because nobody wanted to pay for it. Apart from *nix compress/uncompress and inside the GIF format I'm not sure it's actually used anywhere else any more. Technically we infringed the patent by including LZW support in BioJava, but now the patent has expired we no longer need to worry. Question is, do we need to fix this inherently computer-science problem which is entirely unrelated to biology or bioinformatics, or can we just get people to use an alternative library instead which supports it better and is more generic? They are out there, for instance: http://www.chilkatsoft.com/java-zip.asp cheers, Richard Andy Yates wrote: > Seems very strange this does. I don't know much about decompression but > by the looks of things LZW isn't supported by the JDK. > > Richard Holland wrote: > AFAIK the Zip algorithm is just LZW with bells on, so it should produce > exactly the same results. > > Chris Dagdigian wrote: >>>> Passing on this email that came to me ... >>>> >>>> Regards, >>>> Chris Dagdigian >>>> OBF >>>> >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Miguel Duarte" >>>>> Date: April 6, 2007 2:16:52 PM EDT >>>>> To: dag at sonsorol.org >>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>>> >>>>> Hi Chris, >>>>> >>>>>> From >>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598, >>>>>> >>>>>> >>>>> i've learned that you're maintaining the class >>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>>> case please forward this mail to the maintainer. >>>>> >>>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>>> truncates a few bytes from the end of the stream. I've verified this >>>>> comparing the gzip/uncompress output for some files versus what >>>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>>> >>>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>>> with the attached test case. How to verify the bug: >>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>>> compare the results. >>>>> >>>>> Thanks, >>>>> Miguel Duarte >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4 CwgzY5S7KELC3TA1oKKtjUw= =9xEM -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Tue Apr 10 10:54:27 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 10 Apr 2007 11:54:27 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B68F0.4000908@ebi.ac.uk> References: <2109dfc0704061116i1f0ddbe2ic25012143d2509af@mail.gmail.com> <2A8FFBC4-EC1A-4EB9-992C-DE9225A59578@sonsorol.org> <461B6240.2070907@ebi.ac.uk> <461B6443.9060300@ebi.ac.uk> <461B68F0.4000908@ebi.ac.uk> Message-ID: <461B6CE3.303@ebi.ac.uk> I guess it all depends really on what is the software that is producing these files. If it is something very common to Bioinformatics we might have to accept that support needs to come in from somewhere; and by the looks of things the techniques for compression are quite varied (the man for compress mentions things about adaptive dictionaries and the alike). Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Why are these files in compress/uncompress format? Is it proprietary > software creating them, or a legacy system of some kind? Wouldn't gzip > give better results both in terms of compression ratios and performance > as it is far more up-to-date? > > I believe that the JDK doesn't support LZW because LZW was patented, and > that patent expired only very recently (in 2003/4/5/6 depending on where > you live and in what form you use LZW): > > http://www.gnu.org/philosophy/gif.html > > It's one of those wonderful cases where the patent enforcement caused > the algorithm it was protecting to get dumped and forgotten because > nobody wanted to pay for it. Apart from *nix compress/uncompress and > inside the GIF format I'm not sure it's actually used anywhere else any > more. > > Technically we infringed the patent by including LZW support in BioJava, > but now the patent has expired we no longer need to worry. > > Question is, do we need to fix this inherently computer-science problem > which is entirely unrelated to biology or bioinformatics, or can we just > get people to use an alternative library instead which supports it > better and is more generic? They are out there, for instance: > > http://www.chilkatsoft.com/java-zip.asp > > cheers, > Richard > > Andy Yates wrote: >> Seems very strange this does. I don't know much about decompression but >> by the looks of things LZW isn't supported by the JDK. >> >> Richard Holland wrote: >> AFAIK the Zip algorithm is just LZW with bells on, so it should produce >> exactly the same results. >> >> Chris Dagdigian wrote: >>>>> Passing on this email that came to me ... >>>>> >>>>> Regards, >>>>> Chris Dagdigian >>>>> OBF >>>>> >>>>> >>>>> Begin forwarded message: >>>>> >>>>>> From: "Miguel Duarte" >>>>>> Date: April 6, 2007 2:16:52 PM EDT >>>>>> To: dag at sonsorol.org >>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>>>>> >>>>>> Hi Chris, >>>>>> >>>>>>> From >>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598, >>>>>>> >>>>>>> >>>>>> i've learned that you're maintaining the class >>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>>>>> case please forward this mail to the maintainer. >>>>>> >>>>>> I've discovered a nasty bug: With some read block sizes the algorithm >>>>>> truncates a few bytes from the end of the stream. I've verified this >>>>>> comparing the gzip/uncompress output for some files versus what >>>>>> org/biojava/utils/io/UncompressInputStream.java generates. >>>>>> >>>>>> Unfortunately i've not discovered the bug yet, but i can contribute >>>>>> with the attached test case. How to verify the bug: >>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>>>>> compare the results. >>>>>> >>>>>> Thanks, >>>>>> Miguel Duarte >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4 > CwgzY5S7KELC3TA1oKKtjUw= > =9xEM > -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Tue Apr 10 10:27:39 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 10 Apr 2007 11:27:39 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: References: Message-ID: Hi! I committed this class a while ago, since I did not find any other way to read .Z compressed files. Unfortunately PDB files are often stored like that ... If anybody has a suggestion how to read unix compressed files (.Z) in a better way, I would be glad to hear. Parsing them as Zip or GZip did not work in my trials... Andreas On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote: > Does anyone maintain this class?? > > More to the point, does anyone know what it is for??? If I look at the > Uses link in javadoc there are aparently none at the public or package > level. Additionally why does biojava need one, are there not java.io > classes that can handle compressed streams?? > > Is there a good reason why we cannot just clean it out? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > Chris Dagdigian > Sent by: biojava-dev-bounces at lists.open-bio.org > 04/07/2007 09:52 AM > > > To: biojava-dev at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] Fwd: Bug in > org/biojava/utils/io/UncompressInputStream.java > > > > Passing on this email that came to me ... > > Regards, > Chris Dagdigian > OBF > > > Begin forwarded message: > >> From: "Miguel Duarte" >> Date: April 6, 2007 2:16:52 PM EDT >> To: dag at sonsorol.org >> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >> >> Hi Chris, >> >>> From http://sourceforge.net/project/shownotes.php? >>> release_id=314770&group_id=18598, >> i've learned that you're maintaining the class >> org/biojava/utils/io/UncompressInputStream.java. If that's not the >> case please forward this mail to the maintainer. >> >> I've discovered a nasty bug: With some read block sizes the algorithm >> truncates a few bytes from the end of the stream. I've verified this >> comparing the gzip/uncompress output for some files versus what >> org/biojava/utils/io/UncompressInputStream.java generates. >> >> Unfortunately i've not discovered the bug yet, but i can contribute >> with the attached test case. How to verify the bug: >> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >> compare the results. >> >> Thanks, >> Miguel Duarte > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] > [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark > Schreiber ] > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From holland at ebi.ac.uk Tue Apr 10 11:01:36 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 10 Apr 2007 12:01:36 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: References: Message-ID: <461B6E90.3090501@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andreas - did you write the class? If so, then you may understand it better than the rest of us. Would you be willing to attempt to fix it? cheers, Richard Andreas Prlic wrote: > Hi! > > I committed this class a while ago, since I did not find any other way > to read .Z compressed files. > > Unfortunately PDB files are often stored like that ... > > If anybody has a suggestion how to read unix compressed files (.Z) in a > better way, I would be glad to hear. > > Parsing them as Zip or GZip did not work in my trials... > > Andreas > > > > > > On 9 Apr 2007, at 03:13, mark.schreiber at novartis.com wrote: > >> Does anyone maintain this class?? >> >> More to the point, does anyone know what it is for??? If I look at the >> Uses link in javadoc there are aparently none at the public or package >> level. Additionally why does biojava need one, are there not java.io >> classes that can handle compressed streams?? >> >> Is there a good reason why we cannot just clean it out? >> >> - Mark >> >> Mark Schreiber >> Research Investigator (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> >> >> >> >> Chris Dagdigian >> Sent by: biojava-dev-bounces at lists.open-bio.org >> 04/07/2007 09:52 AM >> >> >> To: biojava-dev at biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-dev] Fwd: Bug in >> org/biojava/utils/io/UncompressInputStream.java >> >> >> >> Passing on this email that came to me ... >> >> Regards, >> Chris Dagdigian >> OBF >> >> >> Begin forwarded message: >> >>> From: "Miguel Duarte" >>> Date: April 6, 2007 2:16:52 PM EDT >>> To: dag at sonsorol.org >>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java >>> >>> Hi Chris, >>> >>>> From http://sourceforge.net/project/shownotes.php? >>>> release_id=314770&group_id=18598, >>> i've learned that you're maintaining the class >>> org/biojava/utils/io/UncompressInputStream.java. If that's not the >>> case please forward this mail to the maintainer. >>> >>> I've discovered a nasty bug: With some read block sizes the algorithm >>> truncates a few bytes from the end of the stream. I've verified this >>> comparing the gzip/uncompress output for some files versus what >>> org/biojava/utils/io/UncompressInputStream.java generates. >>> >>> Unfortunately i've not discovered the bug yet, but i can contribute >>> with the attached test case. How to verify the bug: >>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and >>> compare the results. >>> >>> Thanks, >>> Miguel Duarte >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] >> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark >> Schreiber ] >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGG26Q4C5LeMEKA/QRAiMxAJ4u4RUjTGODjClIM1LIRzP12xNUOgCgifA+ 14CbPaY5SwcG1/wUHJVpl/U= =wBDT -----END PGP SIGNATURE----- From markjschreiber at gmail.com Tue Apr 10 11:29:03 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 10 Apr 2007 19:29:03 +0800 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B60FC.7040903@ebi.ac.uk> References: <461B51F4.8010402@ebi.ac.uk> <461B5F69.9060506@ebi.ac.uk> <461B60FC.7040903@ebi.ac.uk> Message-ID: <93b45ca50704100429u388f5b8ax86ba05e5d05e02a9@mail.gmail.com> Without looking at the code I would guess that dropping 88 bytes could be because of a buffered reader or writer not flushing before it is closed?? - Mark On 4/10/07, Andy Yates wrote: > Okay a quick run of uncompress on the mac with the files in question > does produce a file which is equivalent to the file produced by gzip but > not to the one produced by UncompressInputStream. > > The required md5sum for a pass should be (after a md5 digest): > > 9f0924237d20288793172091d61f85b8 uncompressed_by_gzip > > But we get: > > 17447efd34a245e430f20bc8d9b28a7b uncompressed_by_uncompressInputStream > > Okay so looks like there is something "wrong". Seems like it drops 88 > bytes from the decompression. > > Wonder what happens if we pass this file type through the > GZIPInputStream from the JDK? > > Andy Yates wrote: > > I don't think there are standard classes for this compression format in > > the SDK. There are ones for GZIP & ZIP but not for LZW which this one is > > dealing with. Also I'm not sure about using GZIP to unzip a file > > compressed with LZW since GZIP uses DEFLATE. > > > > We need to decompress the file using uncompress (which is missing from > > my Linux box but is on the mac ... go figure) and then match that up to > > the output from UncompressInputStream & see if they agree or not. > > > > Andy > > > > Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> I have no idea what it is for. There are generic Java classes provided > >> with the SDK that do the same job. I think we should probably drop it. > >> Lets wait to see if anyone shouts first. > >> > >> mark.schreiber at novartis.com wrote: > >>> Does anyone maintain this class?? > >>> > >>> More to the point, does anyone know what it is for??? If I look at the > >>> Uses link in javadoc there are aparently none at the public or package > >>> level. Additionally why does biojava need one, are there not java.io > >>> classes that can handle compressed streams?? > >>> > >>> Is there a good reason why we cannot just clean it out? > >>> > >>> - Mark > >>> > >>> Mark Schreiber > >>> Research Investigator (Bioinformatics) > >>> > >>> Novartis Institute for Tropical Diseases (NITD) > >>> 10 Biopolis Road > >>> #05-01 Chromos > >>> Singapore 138670 > >>> www.nitd.novartis.com > >>> > >>> phone +65 6722 2973 > >>> fax +65 6722 2910 > >>> > >>> > >>> > >>> > >>> > >>> Chris Dagdigian > >>> Sent by: biojava-dev-bounces at lists.open-bio.org > >>> 04/07/2007 09:52 AM > >>> > >>> > >>> To: biojava-dev at biojava.org > >>> cc: (bcc: Mark Schreiber/GP/Novartis) > >>> Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java > >>> > >>> > >>> > >>> Passing on this email that came to me ... > >>> > >>> Regards, > >>> Chris Dagdigian > >>> OBF > >>> > >>> > >>> Begin forwarded message: > >>> > >>>> From: "Miguel Duarte" > >>>> Date: April 6, 2007 2:16:52 PM EDT > >>>> To: dag at sonsorol.org > >>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java > >>>> > >>>> Hi Chris, > >>>> > >>>>> From http://sourceforge.net/project/shownotes.php? > >>>>> release_id=314770&group_id=18598, > >>>> i've learned that you're maintaining the class > >>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the > >>>> case please forward this mail to the maintainer. > >>>> > >>>> I've discovered a nasty bug: With some read block sizes the algorithm > >>>> truncates a few bytes from the end of the stream. I've verified this > >>>> comparing the gzip/uncompress output for some files versus what > >>>> org/biojava/utils/io/UncompressInputStream.java generates. > >>>> > >>>> Unfortunately i've not discovered the bug yet, but i can contribute > >>>> with the attached test case. How to verify the bug: > >>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and > >>>> compare the results. > >>>> > >>>> Thanks, > >>>> Miguel Duarte > >>> > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>> > >>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ] > >>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ] > >>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark > >>> Schreiber ] > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>> > >> -----BEGIN PGP SIGNATURE----- > >> Version: GnuPG v1.4.2.2 (GNU/Linux) > >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > >> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17 > >> HoCuWrx5k2ONg/9oxIfVVPI= > >> =cGTy > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From bugzilla-daemon at portal.open-bio.org Tue Apr 10 11:58:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Apr 2007 07:58:59 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704101158.l3ABwxRV028563@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |holland at ebi.ac.uk ------- Comment #1 from holland at ebi.ac.uk 2007-04-10 07:58 EST ------- I think the only reason it doesn't already do so is because when I wrote RichSequence, RichFeatureHolder hadn't been invented yet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 10 12:15:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Apr 2007 08:15:28 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704101215.l3ACFSYY029387@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 ------- Comment #2 from holland at ebi.ac.uk 2007-04-10 08:15 EST ------- Just looked at this a bit closer and found that only Features can hold other Features - RichFeatureHolder represents the FeatureRelationship portion of BioSQL. FeatureRelationships exist between features, and not between sequences and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe it should be FeatureRelationshipHolder, or something like that? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ap3 at sanger.ac.uk Tue Apr 10 13:02:17 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 10 Apr 2007 14:02:17 +0100 Subject: [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java In-Reply-To: <461B6E90.3090501@ebi.ac.uk> References: <461B6E90.3090501@ebi.ac.uk> Message-ID: > Andreas - did you write the class? If so, then you may understand it > better than the rest of us. Would you be willing to attempt to fix it? No, I did not write it - it is a LGPL class which I found in another project. see http://www.innovation.ch/java/HTTPClient/ or also the header in the file. I will try to have a look at this problem, but not sure if I can fix it quickly. PDB data is still available for download as .Z files, e.g. ftp://ftp.rcsb.org/pub/pdb/data/structures/divided/pdb/ar/ that's why I would need to have some tools for reading these. I agree this is a general problem and the solution does not necessarily have to be part of BioJava. I don;t think any patent got infringed, since the file got committed after they had expired. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From bugzilla-daemon at portal.open-bio.org Wed Apr 11 05:12:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 11 Apr 2007 01:12:29 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704110512.l3B5CT9v008783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 ------- Comment #3 from mark.schreiber at novartis.com 2007-04-11 01:12 EST ------- (In reply to comment #2) > Just looked at this a bit closer and found that only Features can hold other > Features - RichFeatureHolder represents the FeatureRelationship portion of > BioSQL. FeatureRelationships exist between features, and not between sequences > and features. Maybe RichFeatureHolder is therefore a bit of a misnomer? Maybe > it should be FeatureRelationshipHolder, or something like that? I would agree with the proposal to rename it. It would save a lot of confusion. It should be a pretty simple task to refactor it but it would need to happen before a release version. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mark.schreiber at novartis.com Thu Apr 12 03:19:18 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 12 Apr 2007 11:19:18 +0800 Subject: [Biojava-dev] javacc Message-ID: Hello - Has anyone ever written a javacc lexer / parser for Genbank (or any of the other major formats?). - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From bugzilla-daemon at portal.open-bio.org Fri Apr 13 05:08:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Apr 2007 01:08:33 -0400 Subject: [Biojava-dev] [Bug 2273] New: More problems writing uniprot files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2273 Summary: More problems writing uniprot files Product: BioJava Version: live (CVS source) Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: gwaldon at geneinfinity.org I found a few problems during the writing of uniprot files. Using P04941 as a test exemple: 1. The ID line does not appear with a fix format (this is probably not a bug actually): (before/after - read/write) ID KV6A7_MOUSE Reviewed; 107 AA. ID KV6A7_MOUSE Reviewed; 107 AA. 2. The reference title get truncated at the end by one character after each read/write operation: RT phenyloxazolone and its early diversification."; RT phenyloxazolone and its early diversification"; RT phenyloxazolone and its early diversificatio"; ... 3. The FT line is not formatted correctly; this is a bug because the FT line has a fixed format, the I of Ig should be at position 35: (before/after - read/write) FT CHAIN 1 >107 Ig kappa chain V-VI region NQ2-48.2.2. FT CHAIN 1 107> Ig kappa chain V-VI region NQ2-48.2.2. 4. SQ line, are-these exactly the same CRC64 number? SQ SEQUENCE 107 AA; 11557 MW; 72488DA9EF354934 CRC64; SQ SEQUENCE 107 AA; 11564 MW; ffffffffe278ca323958dd50 CRC64; - George -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From invite at facebook.com Fri Apr 13 17:12:04 2007 From: invite at facebook.com (Biswaroop Ghosh) Date: Fri, 13 Apr 2007 10:12:04 -0700 Subject: [Biojava-dev] I've added you as a friend on Facebook... Message-ID: <82cdae6537486b8fd6048bc766c5c1c5@register.facebook.com> I've requested to add you as a friend on Facebook. You can use Facebook to see the profiles of the people around you, share photos, and connect with friends. Now everyone can join Facebook, even if you couldn't before. Thanks, Biswaroop P.S. Here's the link: http://www.facebook.com/p.php?i=695556070&k=10ae46824c&r&v=2 From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:08:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:08:29 -0400 Subject: [Biojava-dev] [Bug 2273] More problems writing uniprot files In-Reply-To: Message-ID: <200704171208.l3HC8T2G004508@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2273 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:08 EST ------- I have fixed points 1-3. Point 4 I have raised as a new bug for someone else to fix - the problem goes deeper than just UniProtFormat! Can you check the code I have committed in CVS and update this bug accordingly with what you find. I have not written a unit test as I'm very busy at present and don't have the time. If you could add one in that would be great. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:11:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:11:11 -0400 Subject: [Biojava-dev] [Bug 2274] New: CRC64 checksum toString() returning incorrect values Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2274 Summary: CRC64 checksum toString() returning incorrect values Product: BioJava Version: live (CVS source) Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Others AssignedTo: biojava-dev at biojava.org ReportedBy: holland at ebi.ac.uk In org.biojavax.utils.CRC64Checksum the toString() method returns 24-character strings, when CRC64 checksums are only 16-character. Also need to check that the correct polynomials etc. are being used. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:19:04 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:19:04 -0400 Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning incorrect values In-Reply-To: Message-ID: <200704171219.l3HCJ4op005457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2274 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from holland at ebi.ac.uk 2007-04-17 08:19 EST ------- Fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:19:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:19:15 -0400 Subject: [Biojava-dev] [Bug 2274] CRC64 checksum toString() returning incorrect values In-Reply-To: Message-ID: <200704171219.l3HCJFwu005500@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2274 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 17 12:25:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Apr 2007 08:25:50 -0400 Subject: [Biojava-dev] [Bug 2261] Request for enhancement of RichSequence In-Reply-To: Message-ID: <200704171225.l3HCPo16006114@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2261 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from holland at ebi.ac.uk 2007-04-17 08:25 EST ------- Done. Renamed to RichFeatureRelationshipHolder and removed reference to RichFeatureHolder as it is technically no such thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jburdick at keyfitz.org Thu Apr 19 18:26:55 2007 From: jburdick at keyfitz.org (Josh Burdick) Date: Thu, 19 Apr 2007 14:26:55 -0400 Subject: [Biojava-dev] reading a subsequence from a .nib file In-Reply-To: References: Message-ID: <1177007215.5481.4.camel@localhost.localdomain> On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote: > Hi - > > Too my knowledge nothing like this exists in BioJava. Could someone take > it the last mile and make it produce SymbolLists? > I went ahead and added a method getSymbolListByLocation() which takes the string and converts it to a SymbolList using DNATools. There are bound to be more efficient ways to do this, but I think this a reasonable start. The files are in the same locations: http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java Hopefully someone will find this code useful. Josh > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > [...] From gwaldon at geneinfinity.org Thu Apr 19 21:05:09 2007 From: gwaldon at geneinfinity.org (george waldon) Date: Thu, 19 Apr 2007 14:05:09 -0700 Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM Message-ID: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com> LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format. This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence. I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations? Thanks, George From mark.schreiber at novartis.com Fri Apr 20 02:44:51 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 20 Apr 2007 10:44:51 +0800 Subject: [Biojava-dev] reading a subsequence from a .nib file Message-ID: Hi Josh - Looks good. Just one thing, your JUnit test contains a hardcoded file path to the test file which means it is not portable. Could you modify that so that it loads the file from the classpath as a resource (see some of the IO unit tests for examples). Can you also provide the test file. Best regards, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Josh Burdick 04/20/2007 02:26 AM To: mark.schreiber at novartis.com cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] reading a subsequence from a .nib file On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote: > Hi - > > Too my knowledge nothing like this exists in BioJava. Could someone take > it the last mile and make it produce SymbolLists? > I went ahead and added a method getSymbolListByLocation() which takes the string and converts it to a SymbolList using DNATools. There are bound to be more efficient ways to do this, but I think this a reasonable start. The files are in the same locations: http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java Hopefully someone will find this code useful. Josh > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > [...] From mark.schreiber at novartis.com Fri Apr 20 02:46:56 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 20 Apr 2007 10:46:56 +0800 Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM Message-ID: I think this sounds sensible. Having another term for something that can be derived from the alphabet is redundant. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "george waldon" Sent by: biojava-dev-bounces at lists.open-bio.org 04/20/2007 05:05 AM Please respond to george waldon To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format. This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence. I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations? Thanks, George _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Fri Apr 20 08:29:18 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 20 Apr 2007 09:29:18 +0100 Subject: [Biojava-dev] no need for LENGTH_TYPE_TERM In-Reply-To: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com> References: <20070419210509.75384.qmail@mmm1924.dulles19-verio.com> Message-ID: <462879DE.5000507@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 No objections. I think your logic is better than mine here. Please go ahead. cheers, Richard george waldon wrote: > LENGTH_TYPE_TERM is a RichSequence term that is used to distinguish between "aa" and "bp" during the write operation in uniprot format and in genbank format. > > This code is error-prone. For instance, converting a protein sequence from a fasta file to a genbank formatted file still write "bp". Indeed, the sequence annotation for this term should be generated during the enrichment of sequence. > > I don't think the extra-work is really necessary. Is-there any objection that I remove this term and rely instead on the alphabet (either PROTEIN or PROTEIN_TERM) during the writing operations? > > Thanks, > George > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGKHne4C5LeMEKA/QRAsD6AKCN4Nj7LMk3fCjAcrfE1Lw+Se3FJQCdH142 Sz6DYxYj1HedeHPZJpejtQs= =9HDc -----END PGP SIGNATURE----- From jburdick at keyfitz.org Wed Apr 11 17:42:18 2007 From: jburdick at keyfitz.org (Josh Burdick) Date: Wed, 11 Apr 2007 17:42:18 -0000 Subject: [Biojava-dev] reading a subsequence from a .nib file In-Reply-To: References: Message-ID: <1176312083.21937.42.camel@localhost.localdomain> On Tue, 2007-04-03 at 09:03 +0800, mark.schreiber at novartis.com wrote: > Hi - > > Too my knowledge nothing like this exists in BioJava. Could someone take > it the last mile and make it produce SymbolLists? > I added a method that just takes the string and makes it into a SymbolList using DNATools. This is somewhat inefficient (you can make a SymbolList directly as an array of numbers, but I wasn't certain enough that I understood it to try that.) The package name should be changed, and the test code should probably do somewhat more, but other than that, if someone wants to add it, feel free. (The two files are at the same location as before.) Josh > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 >