[Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java

Andy Yates ayates at ebi.ac.uk
Tue Apr 10 10:54:27 UTC 2007


I guess it all depends really on what is the software that is producing 
these files. If it is something very common to Bioinformatics we might 
have to accept that support needs to come in from somewhere; and by the 
looks of things the techniques for compression are quite varied (the man 
for compress mentions things about adaptive dictionaries and the alike).

Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Why are these files in compress/uncompress format? Is it proprietary
> software creating them, or a legacy system of some kind? Wouldn't gzip
> give better results both in terms of compression ratios and performance
> as it is far more up-to-date?
> 
> I believe that the JDK doesn't support LZW because LZW was patented, and
> that patent expired only very recently (in 2003/4/5/6 depending on where
> you live and in what form you use LZW):
> 
> http://www.gnu.org/philosophy/gif.html
> 
> It's one of those wonderful cases where the patent enforcement caused
> the algorithm it was protecting to get dumped and forgotten because
> nobody wanted to pay for it. Apart from *nix compress/uncompress and
> inside the GIF format I'm not sure it's actually used anywhere else any
> more.
> 
> Technically we infringed the patent by including LZW support in BioJava,
> but now the patent has expired we no longer need to worry.
> 
> Question is, do we need to fix this inherently computer-science problem
> which is entirely unrelated to biology or bioinformatics, or can we just
> get people to use an alternative library instead which supports it
> better and is more generic? They are out there, for instance:
> 
> http://www.chilkatsoft.com/java-zip.asp
> 
> cheers,
> Richard
> 
> Andy Yates wrote:
>> Seems very strange this does. I don't know much about decompression but
>> by the looks of things LZW isn't supported by the JDK.
>>
>> Richard Holland wrote:
>> AFAIK the Zip algorithm is just LZW with bells on, so it should produce
>> exactly the same results.
>>
>> Chris Dagdigian wrote:
>>>>> Passing on this email that came to me ...
>>>>>
>>>>> Regards,
>>>>> Chris Dagdigian
>>>>> OBF
>>>>>
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>>> From: "Miguel Duarte" <malduarte at gmail.com>
>>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>>> To: dag at sonsorol.org
>>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>>> From
>>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>>>>>
>>>>>>>
>>>>>> i've learned that you're maintaining the class
>>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>>> case please forward this mail to the maintainer.
>>>>>>
>>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>>> comparing the gzip/uncompress output for some files versus what
>>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>>>
>>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>>> with the attached test case. How to verify the bug:
>>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>>> compare the results.
>>>>>>
>>>>>> Thanks,
>>>>>> Miguel Duarte
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFGG2jw4C5LeMEKA/QRAiCwAJ9vNlDX2zwG5paYHbaFv2gSQeblOQCdHaW4
> CwgzY5S7KELC3TA1oKKtjUw=
> =9xEM
> -----END PGP SIGNATURE-----



More information about the biojava-dev mailing list