From holland at ebi.ac.uk Wed Jul 4 04:06:19 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 09:06:19 +0100 Subject: [Biojava-dev] Request for help! Message-ID: <468B54FB.3090606@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi guys. I need help with a programming question! In Java, you can find out the line-end symbol that the JRE is using by calling: System.getProperty("line.separator"); On *nix this returns "\n", for instance. Our file parsers all rely on this to return the symbol to break lines at when parsing files. This usually works fine. BUT... on Windows machines, for certain files, it does not appear to work! I suspect that these text files were generated on a *nix machine then transferred by copying files across file systems using native copy commands, or using binary FTP so that the system retained the *nix line-end symbols instead of replacing them for the local line-end symbols as it would have done if they were transferred in text mode via FTP. I don't have access to a Windows machine I can test on, but I suspect that the fix is quite a simple one and boils down to replacing the System() call with something more intelligent. Is there any regex or similar thing we can use to spot _all_ kinds of line-end symbols in text files regardless of the platform the file was created on or the platform the parser is being run on? (For information, the only two users who have reported problems like this are both using Nexus files - I'm not sure what tool generated them though. The Nexus parser uses the same rules as all the other parsers in BioJava so I don't think there's anything specifically wrong with it as opposed to say the GenBank or FASTA parsers.) cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh 3ppr3WRdJcQgzIAJdUoIX0U= =Cboa -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Jul 4 10:10:12 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 4 Jul 2007 22:10:12 +0800 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> Message-ID: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> BufferedWriter provides a newLine() method that writes a line separator but I'm not sure if that gives you a different result or not. This may be a JVM bug that needs to be submitted to Sun. As a very ugly work around it is possible to determine the OS from the System object as well. - Mark On 7/4/07, Hilmar Lapp wrote: > In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g > though I'm not sure this wouldn't incur too much overhead in Java. > > You can certainly detect the eol character(s) by line.indexOf('\r'); > if found and the preceding character is '\n' you have DOS/Win-style > line endings, and otherwise if found it is Mac-style. > > However, this all seems like a lot of trouble to go through if all > that one would need to ask of people is to make sure that the file > matches the native eol style of the platform, which is really trivial > to achieve. > > For example, to convert Win-style line endings to Unix: > > $ perl -pi -e 's/\r//g;' > > and from Mac to Unix: > > $ perl -pi -e 's/\r/\n/g;' > > I have these and other simple conversions defined as aliases in > my .profile, and don't really ever worry about writing lots of code > to accommodate arbitrary line endings :-) > > -hilmar > > On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi guys. > > > > I need help with a programming question! > > > > In Java, you can find out the line-end symbol that the JRE is using by > > calling: > > > > System.getProperty("line.separator"); > > > > On *nix this returns "\n", for instance. > > > > Our file parsers all rely on this to return the symbol to break > > lines at > > when parsing files. This usually works fine. > > > > BUT... on Windows machines, for certain files, it does not appear to > > work! I suspect that these text files were generated on a *nix machine > > then transferred by copying files across file systems using native > > copy > > commands, or using binary FTP so that the system retained the *nix > > line-end symbols instead of replacing them for the local line-end > > symbols as it would have done if they were transferred in text mode > > via > > FTP. > > > > I don't have access to a Windows machine I can test on, but I suspect > > that the fix is quite a simple one and boils down to replacing the > > System() call with something more intelligent. > > > > Is there any regex or similar thing we can use to spot _all_ kinds of > > line-end symbols in text files regardless of the platform the file was > > created on or the platform the parser is being run on? > > > > (For information, the only two users who have reported problems like > > this are both using Nexus files - I'm not sure what tool generated > > them > > though. The Nexus parser uses the same rules as all the other > > parsers in > > BioJava so I don't think there's anything specifically wrong with > > it as > > opposed to say the GenBank or FASTA parsers.) > > > > cheers, > > Richard > > > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh > > 3ppr3WRdJcQgzIAJdUoIX0U= > > =Cboa > > -----END PGP SIGNATURE----- > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Wed Jul 4 10:33:28 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 04 Jul 2007 15:33:28 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> Message-ID: <468BAFB8.708@ebi.ac.uk> BufferedWriter will always use the value of System.getProperty("line.separator") however BufferedReader knows that an end of line can be \r\n, \r or \n so in Java land is perfectly legal to have any common line terminator & still write files in an OS specific manner. I sent a regex to Rich which he improved on but the net result is the extraction of the EOL regardless of which one it is. I'm not 100% sure on where the problem lies. So long as the parsers use BufferedReader for it's text file reading (which they all seem to do) this shouldn't have been a problem. In fact this is the line from the BufferedReader.readLine() in the JDK: "Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed." Very very strange but the regex sounds like it was a pragmatic solution Andy Mark Schreiber wrote: > BufferedWriter provides a newLine() method that writes a line > separator but I'm not sure if that gives you a different result or > not. > > This may be a JVM bug that needs to be submitted to Sun. > > As a very ugly work around it is possible to determine the OS from the > System object as well. > > - Mark > > On 7/4/07, Hilmar Lapp wrote: >> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >> though I'm not sure this wouldn't incur too much overhead in Java. >> >> You can certainly detect the eol character(s) by line.indexOf('\r'); >> if found and the preceding character is '\n' you have DOS/Win-style >> line endings, and otherwise if found it is Mac-style. >> >> However, this all seems like a lot of trouble to go through if all >> that one would need to ask of people is to make sure that the file >> matches the native eol style of the platform, which is really trivial >> to achieve. >> >> For example, to convert Win-style line endings to Unix: >> >> $ perl -pi -e 's/\r//g;' >> >> and from Mac to Unix: >> >> $ perl -pi -e 's/\r/\n/g;' >> >> I have these and other simple conversions defined as aliases in >> my .profile, and don't really ever worry about writing lots of code >> to accommodate arbitrary line endings :-) >> >> -hilmar >> >> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Hi guys. >>> >>> I need help with a programming question! >>> >>> In Java, you can find out the line-end symbol that the JRE is using by >>> calling: >>> >>> System.getProperty("line.separator"); >>> >>> On *nix this returns "\n", for instance. >>> >>> Our file parsers all rely on this to return the symbol to break >>> lines at >>> when parsing files. This usually works fine. >>> >>> BUT... on Windows machines, for certain files, it does not appear to >>> work! I suspect that these text files were generated on a *nix machine >>> then transferred by copying files across file systems using native >>> copy >>> commands, or using binary FTP so that the system retained the *nix >>> line-end symbols instead of replacing them for the local line-end >>> symbols as it would have done if they were transferred in text mode >>> via >>> FTP. >>> >>> I don't have access to a Windows machine I can test on, but I suspect >>> that the fix is quite a simple one and boils down to replacing the >>> System() call with something more intelligent. >>> >>> Is there any regex or similar thing we can use to spot _all_ kinds of >>> line-end symbols in text files regardless of the platform the file was >>> created on or the platform the parser is being run on? >>> >>> (For information, the only two users who have reported problems like >>> this are both using Nexus files - I'm not sure what tool generated >>> them >>> though. The Nexus parser uses the same rules as all the other >>> parsers in >>> BioJava so I don't think there's anything specifically wrong with >>> it as >>> opposed to say the GenBank or FASTA parsers.) >>> >>> cheers, >>> Richard >>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh >>> 3ppr3WRdJcQgzIAJdUoIX0U= >>> =Cboa >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Wed Jul 4 11:04:41 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 16:04:41 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> Message-ID: <468BB709.4010704@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks everyone for your replies. Turns out a regex of the various combinations of \r and \n is the best way. cheers, Richard Mark Schreiber wrote: > BufferedWriter provides a newLine() method that writes a line > separator but I'm not sure if that gives you a different result or > not. > > This may be a JVM bug that needs to be submitted to Sun. > > As a very ugly work around it is possible to determine the OS from the > System object as well. > > - Mark > > On 7/4/07, Hilmar Lapp wrote: >> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >> though I'm not sure this wouldn't incur too much overhead in Java. >> >> You can certainly detect the eol character(s) by line.indexOf('\r'); >> if found and the preceding character is '\n' you have DOS/Win-style >> line endings, and otherwise if found it is Mac-style. >> >> However, this all seems like a lot of trouble to go through if all >> that one would need to ask of people is to make sure that the file >> matches the native eol style of the platform, which is really trivial >> to achieve. >> >> For example, to convert Win-style line endings to Unix: >> >> $ perl -pi -e 's/\r//g;' >> >> and from Mac to Unix: >> >> $ perl -pi -e 's/\r/\n/g;' >> >> I have these and other simple conversions defined as aliases in >> my .profile, and don't really ever worry about writing lots of code >> to accommodate arbitrary line endings :-) >> >> -hilmar >> >> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >> > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi7cJ4C5LeMEKA/QRAumDAKCJ5yc8PoZ+sLhcBOkL2Jdp/unW+gCfZrxG AoVCPngmYX3b/pxfiGJbzic= =2cyA -----END PGP SIGNATURE----- From holland at ebi.ac.uk Wed Jul 4 11:06:32 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 16:06:32 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <468BAFB8.708@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> Message-ID: <468BB778.2050704@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The problem was that I was using the newline in a tokenizer, which needed to return and regcognize the newline symbols themselves (the Nexus format is new-line sensitive). Hence I had to deal with files that may not have the system new-line operator. cheers, Richard Andy Yates wrote: > BufferedWriter will always use the value of > System.getProperty("line.separator") however BufferedReader knows that > an end of line can be \r\n, \r or \n so in Java land is perfectly legal > to have any common line terminator & still write files in an OS specific > manner. > > I sent a regex to Rich which he improved on but the net result is the > extraction of the EOL regardless of which one it is. > > I'm not 100% sure on where the problem lies. So long as the parsers use > BufferedReader for it's text file reading (which they all seem to do) > this shouldn't have been a problem. In fact this is the line from the > BufferedReader.readLine() in the JDK: > > "Read a line of text. A line is considered to be terminated by any one > of a line feed ('\n'), a carriage return ('\r'), or a carriage return > followed immediately by a linefeed." > > Very very strange but the regex sounds like it was a pragmatic solution > > Andy > > Mark Schreiber wrote: >> BufferedWriter provides a newLine() method that writes a line >> separator but I'm not sure if that gives you a different result or >> not. >> >> This may be a JVM bug that needs to be submitted to Sun. >> >> As a very ugly work around it is possible to determine the OS from the >> System object as well. >> >> - Mark >> >> On 7/4/07, Hilmar Lapp wrote: >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >>> though I'm not sure this wouldn't incur too much overhead in Java. >>> >>> You can certainly detect the eol character(s) by line.indexOf('\r'); >>> if found and the preceding character is '\n' you have DOS/Win-style >>> line endings, and otherwise if found it is Mac-style. >>> >>> However, this all seems like a lot of trouble to go through if all >>> that one would need to ask of people is to make sure that the file >>> matches the native eol style of the platform, which is really trivial >>> to achieve. >>> >>> For example, to convert Win-style line endings to Unix: >>> >>> $ perl -pi -e 's/\r//g;' >>> >>> and from Mac to Unix: >>> >>> $ perl -pi -e 's/\r/\n/g;' >>> >>> I have these and other simple conversions defined as aliases in >>> my .profile, and don't really ever worry about writing lots of code >>> to accommodate arbitrary line endings :-) >>> >>> -hilmar >>> >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >>> > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp ctRVPlj5VA0oDzMsoxP4Ohs= =6wg0 -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Jul 4 21:29:35 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 5 Jul 2007 09:29:35 +0800 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <468BB778.2050704@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk> Message-ID: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> Slightly related to this ... It might be worth making a quick check of the biojava code base to see how often a "\n" appears in the source code. - Mark On 7/4/07, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > The problem was that I was using the newline in a tokenizer, which > needed to return and regcognize the newline symbols themselves (the > Nexus format is new-line sensitive). Hence I had to deal with files that > may not have the system new-line operator. > > cheers, > Richard > > Andy Yates wrote: > > BufferedWriter will always use the value of > > System.getProperty("line.separator") however BufferedReader knows that > > an end of line can be \r\n, \r or \n so in Java land is perfectly legal > > to have any common line terminator & still write files in an OS specific > > manner. > > > > I sent a regex to Rich which he improved on but the net result is the > > extraction of the EOL regardless of which one it is. > > > > I'm not 100% sure on where the problem lies. So long as the parsers use > > BufferedReader for it's text file reading (which they all seem to do) > > this shouldn't have been a problem. In fact this is the line from the > > BufferedReader.readLine() in the JDK: > > > > "Read a line of text. A line is considered to be terminated by any one > > of a line feed ('\n'), a carriage return ('\r'), or a carriage return > > followed immediately by a linefeed." > > > > Very very strange but the regex sounds like it was a pragmatic solution > > > > Andy > > > > Mark Schreiber wrote: > >> BufferedWriter provides a newLine() method that writes a line > >> separator but I'm not sure if that gives you a different result or > >> not. > >> > >> This may be a JVM bug that needs to be submitted to Sun. > >> > >> As a very ugly work around it is possible to determine the OS from the > >> System object as well. > >> > >> - Mark > >> > >> On 7/4/07, Hilmar Lapp wrote: > >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g > >>> though I'm not sure this wouldn't incur too much overhead in Java. > >>> > >>> You can certainly detect the eol character(s) by line.indexOf('\r'); > >>> if found and the preceding character is '\n' you have DOS/Win-style > >>> line endings, and otherwise if found it is Mac-style. > >>> > >>> However, this all seems like a lot of trouble to go through if all > >>> that one would need to ask of people is to make sure that the file > >>> matches the native eol style of the platform, which is really trivial > >>> to achieve. > >>> > >>> For example, to convert Win-style line endings to Unix: > >>> > >>> $ perl -pi -e 's/\r//g;' > >>> > >>> and from Mac to Unix: > >>> > >>> $ perl -pi -e 's/\r/\n/g;' > >>> > >>> I have these and other simple conversions defined as aliases in > >>> my .profile, and don't really ever worry about writing lots of code > >>> to accommodate arbitrary line endings :-) > >>> > >>> -hilmar > >>> > >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > >>> > > Hi guys. > > > > I need help with a programming question! > > > > In Java, you can find out the line-end symbol that the JRE is using by > > calling: > > > > System.getProperty("line.separator"); > > > > On *nix this returns "\n", for instance. > > > > Our file parsers all rely on this to return the symbol to break > > lines at > > when parsing files. This usually works fine. > > > > BUT... on Windows machines, for certain files, it does not appear to > > work! I suspect that these text files were generated on a *nix machine > > then transferred by copying files across file systems using native > > copy > > commands, or using binary FTP so that the system retained the *nix > > line-end symbols instead of replacing them for the local line-end > > symbols as it would have done if they were transferred in text mode > > via > > FTP. > > > > I don't have access to a Windows machine I can test on, but I suspect > > that the fix is quite a simple one and boils down to replacing the > > System() call with something more intelligent. > > > > Is there any regex or similar thing we can use to spot _all_ kinds of > > line-end symbols in text files regardless of the platform the file was > > created on or the platform the parser is being run on? > > > > (For information, the only two users who have reported problems like > > this are both using Nexus files - I'm not sure what tool generated > > them > > though. The Nexus parser uses the same rules as all the other > > parsers in > > BioJava so I don't think there's anything specifically wrong with > > it as > > opposed to say the GenBank or FASTA parsers.) > > > > cheers, > > Richard > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp > ctRVPlj5VA0oDzMsoxP4Ohs= > =6wg0 > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Thu Jul 5 03:40:14 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 05 Jul 2007 08:40:14 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk> <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> Message-ID: <468CA05E.6070308@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 "\n" is used 262 times in 76 different locations: src/org/biojava/bio/alignment/NeedlemanWunsch.java src/org/biojava/bio/alignment/SequenceAlignment.java src/org/biojava/bio/alignment/SmithWaterman.java src/org/biojava/bio/alignment/SubstitutionMatrix.java src/org/biojava/bio/chromatogram/graphic/ChromatogramGraphic.java src/org/biojava/bio/dist/AbstractDistribution.java src/org/biojava/bio/dp/onehead/SingleDP.java src/org/biojava/bio/dp/twohead/DPInterpreter.java src/org/biojava/bio/dp/XmlMarkovModel.java src/org/biojava/bio/gui/sequence/ImageMap.java src/org/biojava/bio/program/abi/ABIFParser.java src/org/biojava/bio/program/blast2html/AbstractAlignmentStyler.java src/org/biojava/bio/program/blast2html/HTMLRenderer.java src/org/biojava/bio/program/das/dasalignment/Alignment.java src/org/biojava/bio/program/das/FeatureRequestManager.java src/org/biojava/bio/program/sax/BlastLikeAlignmentSAXParser.java src/org/biojava/bio/program/sax/ClustalWAlignmentSAXParser.java src/org/biojava/bio/program/sax/FastaSequenceSAXParser.java src/org/biojava/bio/program/sax/NeedleAlignmentSAXParser.java src/org/biojava/bio/search/KnuthMorrisPrattSearch.java src/org/biojava/bio/seq/db/BioIndex.java src/org/biojava/bio/seq/db/GenbankSequenceDB.java src/org/biojava/bio/seq/db/TabIndexStore.java src/org/biojava/bio/seq/io/agave/AGAVEBioSeqHandler.java src/org/biojava/bio/seq/io/agave/AGAVEContigHandler.java src/org/biojava/bio/seq/io/agave/AGAVEDbId.java src/org/biojava/bio/seq/io/agave/AGAVEKeywordPropHandler.java src/org/biojava/bio/seq/io/agave/AGAVEMapLocation.java src/org/biojava/bio/seq/io/agave/AGAVEMapPosition.java src/org/biojava/bio/seq/io/agave/AGAVEMatchRegion.java src/org/biojava/bio/seq/io/agave/AGAVEProperty.java src/org/biojava/bio/seq/io/agave/AGAVEQueryRegion.java src/org/biojava/bio/seq/io/agave/AGAVERelatedAnnot.java src/org/biojava/bio/seq/io/agave/AGAVESeqPropHandler.java src/org/biojava/bio/seq/io/agave/AgaveWriter.java src/org/biojava/bio/seq/io/agave/AGAVEXref.java src/org/biojava/bio/seq/io/agave/AGAVEXrefs.java src/org/biojava/bio/seq/io/agave/Embl2AgaveAnnotFilter.java src/org/biojava/bio/seq/io/FastaFormat.java src/org/biojava/bio/seq/io/GenbankFileFormer.java src/org/biojava/bio/seq/io/ParseException.java src/org/biojava/bio/structure/align/pairwise/AlternativeAlignment.java src/org/biojava/bio/structure/ChainImpl.java src/org/biojava/bio/structure/io/FileConvert.java src/org/biojava/bio/structure/StructureImpl.java src/org/biojava/bio/symbol/AbstractSimpleBasisSymbol.java src/org/biojava/bio/symbol/AlphabetManager.java src/org/biojava/bio/symbol/DoubleAlphabet.java src/org/biojava/bio/symbol/IntegerAlphabet.java src/org/biojava/bio/symbol/SimpleAlignment.java src/org/biojava/stats/svm/tools/TrainRegression.java src/org/biojava/utils/automata/DfaBuilder.java src/org/biojava/utils/automata/FiniteAutomaton.java src/org/biojava/utils/automata/PatternMaker.java src/org/biojava/utils/candy/CandyEntry.java src/org/biojava/utils/ChangeSupport.java src/org/biojava/utils/ExecRunner.java src/org/biojava/utils/io/CountedBufferedReader.java src/org/biojava/utils/ParserException.java src/org/biojava/utils/StaticMemberPlaceHolder.java src/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.java src/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.java src/org/biojavax/bio/phylo/io/nexus/CharactersBlockParser.java src/org/biojavax/bio/phylo/io/nexus/DistancesBlockParser.java src/org/biojavax/bio/phylo/io/nexus/NexusFileFormat.java src/org/biojavax/bio/phylo/MultipleHitCorrection.java src/org/biojavax/bio/seq/io/DebuggingRichSeqIOListener.java src/org/biojavax/bio/seq/io/EMBLFormat.java src/org/biojavax/bio/seq/io/FastaFormat.java src/org/biojavax/bio/seq/io/GenbankFormat.java src/org/biojavax/bio/seq/io/UniProtCommentParser.java src/org/biojavax/bio/seq/io/UniProtFormat.java src/org/biojavax/bio/taxa/SimpleNCBITaxonName.java src/org/biojavax/utils/StringTools.java src/org/biojavax/utils/XMLTools.java Not all of these are 'bad' newlines - but still, it's a lot to search through. I've put it on my list of to-do things for when I'm bored. cheers, Richard Mark Schreiber wrote: > Slightly related to this ... > > It might be worth making a quick check of the biojava code base to see > how often a "\n" appears in the source code. > > - Mark > > On 7/4/07, Richard Holland wrote: > The problem was that I was using the newline in a tokenizer, which > needed to return and regcognize the newline symbols themselves (the > Nexus format is new-line sensitive). Hence I had to deal with files that > may not have the system new-line operator. > > cheers, > Richard > > Andy Yates wrote: >>>> BufferedWriter will always use the value of >>>> System.getProperty("line.separator") however BufferedReader knows that >>>> an end of line can be \r\n, \r or \n so in Java land is perfectly legal >>>> to have any common line terminator & still write files in an OS specific >>>> manner. >>>> >>>> I sent a regex to Rich which he improved on but the net result is the >>>> extraction of the EOL regardless of which one it is. >>>> >>>> I'm not 100% sure on where the problem lies. So long as the parsers use >>>> BufferedReader for it's text file reading (which they all seem to do) >>>> this shouldn't have been a problem. In fact this is the line from the >>>> BufferedReader.readLine() in the JDK: >>>> >>>> "Read a line of text. A line is considered to be terminated by any one >>>> of a line feed ('\n'), a carriage return ('\r'), or a carriage return >>>> followed immediately by a linefeed." >>>> >>>> Very very strange but the regex sounds like it was a pragmatic solution >>>> >>>> Andy >>>> >>>> Mark Schreiber wrote: >>>>> BufferedWriter provides a newLine() method that writes a line >>>>> separator but I'm not sure if that gives you a different result or >>>>> not. >>>>> >>>>> This may be a JVM bug that needs to be submitted to Sun. >>>>> >>>>> As a very ugly work around it is possible to determine the OS from the >>>>> System object as well. >>>>> >>>>> - Mark >>>>> >>>>> On 7/4/07, Hilmar Lapp wrote: >>>>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >>>>>> though I'm not sure this wouldn't incur too much overhead in Java. >>>>>> >>>>>> You can certainly detect the eol character(s) by line.indexOf('\r'); >>>>>> if found and the preceding character is '\n' you have DOS/Win-style >>>>>> line endings, and otherwise if found it is Mac-style. >>>>>> >>>>>> However, this all seems like a lot of trouble to go through if all >>>>>> that one would need to ask of people is to make sure that the file >>>>>> matches the native eol style of the platform, which is really trivial >>>>>> to achieve. >>>>>> >>>>>> For example, to convert Win-style line endings to Unix: >>>>>> >>>>>> $ perl -pi -e 's/\r//g;' >>>>>> >>>>>> and from Mac to Unix: >>>>>> >>>>>> $ perl -pi -e 's/\r/\n/g;' >>>>>> >>>>>> I have these and other simple conversions defined as aliases in >>>>>> my .profile, and don't really ever worry about writing lots of code >>>>>> to accommodate arbitrary line endings :-) >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >>>>>> >>>> Hi guys. >>>> >>>> I need help with a programming question! >>>> >>>> In Java, you can find out the line-end symbol that the JRE is using by >>>> calling: >>>> >>>> System.getProperty("line.separator"); >>>> >>>> On *nix this returns "\n", for instance. >>>> >>>> Our file parsers all rely on this to return the symbol to break >>>> lines at >>>> when parsing files. This usually works fine. >>>> >>>> BUT... on Windows machines, for certain files, it does not appear to >>>> work! I suspect that these text files were generated on a *nix machine >>>> then transferred by copying files across file systems using native >>>> copy >>>> commands, or using binary FTP so that the system retained the *nix >>>> line-end symbols instead of replacing them for the local line-end >>>> symbols as it would have done if they were transferred in text mode >>>> via >>>> FTP. >>>> >>>> I don't have access to a Windows machine I can test on, but I suspect >>>> that the fix is quite a simple one and boils down to replacing the >>>> System() call with something more intelligent. >>>> >>>> Is there any regex or similar thing we can use to spot _all_ kinds of >>>> line-end symbols in text files regardless of the platform the file was >>>> created on or the platform the parser is being run on? >>>> >>>> (For information, the only two users who have reported problems like >>>> this are both using Nexus files - I'm not sure what tool generated >>>> them >>>> though. The Nexus parser uses the same rules as all the other >>>> parsers in >>>> BioJava so I don't think there's anything specifically wrong with >>>> it as >>>> opposed to say the GenBank or FASTA parsers.) >>>> >>>> cheers, >>>> Richard >>>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjKBd4C5LeMEKA/QRAuARAJsGmSZpdOEuNyYDNn0Xn1rBA6KBjgCeLr8s qkMnk1CwoMnqBT0RCwQjuSI= =X9+G -----END PGP SIGNATURE----- From kdoshi at asuragen.com Mon Jul 9 15:31:02 2007 From: kdoshi at asuragen.com (kdoshi at asuragen.com) Date: Mon, 9 Jul 2007 14:31:02 -0500 Subject: [Biojava-dev] Hello and support for FASTA34 Message-ID: <88A6938E58BEF34BA54B1531006D77DCD0467F@SVREXCH.asuragen.us> Hello Biojava. My name is Kishore Doshi. I have been programming in Java and C++ for 8+ years now and I just completed my PhD in Molecular Biology at the University of Texas at Austin. For my PhD work, I developed a software toolkit in Java for RNA comparative sequence analysis at the Gutell Lab. As part of my new job, I have just come across the Biojava API and I am very impressed with the capabilities available. I wish I would have used it a few years ago when developing my RNA comparative sequence analysis toolkit. I would be interested in lending my time to help Biojava as it continues to evolve. One area I have noticed I could provide immediate help would be in support for parsing FASTA search results. The classes FastaSearchParser and FastaSearchSAXParser appear to support FASTA 3.3; however, they do not appear to support FASTA 3.4. I have modified FastaSearchParser and FastaSearchSAXParser to support the tag modifications in FASTA 3.4 output. I would be interested in contributing my changes back to the community if possible. Please advise on how I should move forward. Thanks Kishore Doshi, M.S.;Ph.D. | Bioinformatics Asuragen, Inc. -- A Spin-Off of Ambion 2150 Woodward, Suite 100, Austin TX USA 78744 Tel: 1-512-681-5397 | Fax: 1-512-681-5201 From holland at ebi.ac.uk Mon Jul 9 17:41:15 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 9 Jul 2007 22:41:15 +0100 (BST) Subject: [Biojava-dev] Hello and support for FASTA34 In-Reply-To: <88A6938E58BEF34BA54B1531006D77DCD0467F@SVREXCH.asuragen.us> References: <88A6938E58BEF34BA54B1531006D77DCD0467F@SVREXCH.asuragen.us> Message-ID: <57489.80.42.44.136.1184017275.squirrel@webmail.ebi.ac.uk> Hello! It's great that you are volunteering. We really appreciate all the help we can get. Does your new code support both the older 3.3 and the new 3.4 format, or does it support only the newer one? It'd be nice if it could read both even if it can only write the newer one (that's the way our newer parsers in the BioJavaX extension packages work). The best way to contribute small amounts of code is to do what you have done and post a message to the list. Then, someone with CVS access will offer to review it and commit it. So, the next step is to email me the code you have written as an attachment, to the email address I am sending this message from. I'll then check it through and commit it. We also ask that test cases be written for each new piece of code submitted. I don't know if you've come across JUnit at all, but if you have and you know how to write a JUnit test to test your new code then that would be extremely useful to have. Thanks again for your help and I'm looking forward to seeing your code. cheers, Richard On Mon, July 9, 2007 8:31 pm, kdoshi at asuragen.com wrote: > Hello Biojava. > > My name is Kishore Doshi. I have been programming in Java and C++ for 8+ > years now and I just completed my PhD in Molecular Biology at the > University of Texas at Austin. For my PhD work, I developed a software > toolkit in Java for RNA comparative sequence analysis at the Gutell Lab. > > As part of my new job, I have just come across the Biojava API and I am > very impressed with the capabilities available. I wish I would have used > it a few years ago when developing my RNA comparative sequence analysis > toolkit. > > I would be interested in lending my time to help Biojava as it continues > to evolve. One area I have noticed I could provide immediate help would > be in support for parsing FASTA search results. The classes > FastaSearchParser and FastaSearchSAXParser appear to support FASTA 3.3; > however, they do not appear to support FASTA 3.4. I have modified > FastaSearchParser and FastaSearchSAXParser to support the tag > modifications in FASTA 3.4 output. I would be interested in contributing > my changes back to the community if possible. Please advise on how I > should move forward. > > Thanks > Kishore Doshi, M.S.;Ph.D. | Bioinformatics > Asuragen, Inc. -- A Spin-Off of Ambion > 2150 Woodward, Suite 100, Austin TX USA 78744 > Tel: 1-512-681-5397 | Fax: 1-512-681-5201 > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From bugzilla-daemon at portal.open-bio.org Tue Jul 10 03:48:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Jul 2007 03:48:52 -0400 Subject: [Biojava-dev] [Bug 2330] New: DP/ Profile HMM bug Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2330 Summary: DP/ Profile HMM bug Product: BioJava Version: 1.5 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: dist/dp AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com I have a problem when running demos/dp/SearchProfile.java. The program return an error message : classes\demos>java dp.SearchProfile fake.fasta Loading sequences Creating profile HMM Estimating alignment as having length 999 org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an indexed member of the alphabet Transitions from i-791 despite being in the alphabet. at org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) at org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) at org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) at org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) at dp.SearchProfile.randomize(SearchProfile.java:155) at dp.SearchProfile.createProfile(SearchProfile.java:104) at dp.SearchProfile.main(SearchProfile.java:31) This is certain to be an issue with Gaps and Alphabet manager. Make sure a ProfileHMM can be serialized and deserialized multiple times after fixing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From pm66 at nyu.edu Tue Jul 10 22:34:20 2007 From: pm66 at nyu.edu (Philip E Macmenamin) Date: Tue, 10 Jul 2007 22:34:20 -0400 Subject: [Biojava-dev] Request for applet for drawing ab1 or scf files... Message-ID: Hi, Does anyone know of an Applet that will act as a Chromatogram Viewer given ab1 or scf file? We hava / had one, however I only have the class file, and it allegedly does not run on Microsoft's wonderful Vista operating system. Only everywhere else. Since I have only the class file, I can't really monkey around with what I have. If anyone can give me, or point me to anything that would do the job, I would be very grateful and credit you as author on the site. Thanks again for any help, Philip MacMenamin, Center for Comparative Functional Genomics, 7th fl. Department of Biology New York University. From holland at ebi.ac.uk Wed Jul 11 03:40:26 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 11 Jul 2007 08:40:26 +0100 Subject: [Biojava-dev] Request for applet for drawing ab1 or scf files... In-Reply-To: References: Message-ID: <4694896A.9050707@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There are BioJava components which you could use to write an applet, but we don't have a complete applet (because everyone's definition of what such an applet should do is bound to be different!). The drawing code is here: http://www.biojava.org/docs/api/org/biojava/bio/chromatogram/graphic/package-summary.html And the code for parsing an ABI file into a Chromatogram object is here: http://www.biojava.org/docs/api/org/biojava/bio/program/abi/ABIFChromatogram.html cheers, Richard Philip E Macmenamin wrote: > Hi, > > Does anyone know of an Applet that will act as a Chromatogram Viewer given ab1 or scf file? > We hava / had one, however I only have the class file, and it allegedly does not run on Microsoft's wonderful Vista operating system. Only everywhere else. > Since I have only the class file, I can't really monkey around with what I have. > If anyone can give me, or point me to anything that would do the job, I would be very grateful and credit you as author on the site. > > Thanks again for any help, > > Philip MacMenamin, > > Center for Comparative Functional Genomics, 7th fl. > Department of Biology > New York University. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlIlq4C5LeMEKA/QRAjhSAJ0aVqPm+GzCAcyWf1p8+Dc5qmeEoQCeIScl ipuV6Eg6mb4IxBvTBs94vyA= =Kk7G -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Wed Jul 11 04:15:33 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 11 Jul 2007 09:15:33 +0100 Subject: [Biojava-dev] Request for applet for drawing ab1 or scf files... In-Reply-To: References: Message-ID: <469491A5.6030701@ebi.ac.uk> Hi Philip, There is also the bioview2 project being run by members of the Cancer Genome Project available from http://code.google.com/p/bioview2/. The code works alongside biojava's ABI & SCF parsers but with it's own drawing code & is intended to render portions of traces to files rather than work in a more GUI fashion. There used to be a Java webstart program written by Rhett Sutphin (the person responsible for writing the original chromatogram code) but it seems to have disappeared. Personally I would follow Richard's suggestion & look at using the Chromatogram drawing code available in BioJava. The code itself is very straightforward to use & very configurable. Good luck Andy Philip E Macmenamin wrote: > Hi, > > Does anyone know of an Applet that will act as a Chromatogram Viewer given ab1 or scf file? > We hava / had one, however I only have the class file, and it allegedly does not run on Microsoft's wonderful Vista operating system. Only everywhere else. > Since I have only the class file, I can't really monkey around with what I have. > If anyone can give me, or point me to anything that would do the job, I would be very grateful and credit you as author on the site. > > Thanks again for any help, > > Philip MacMenamin, > > Center for Comparative Functional Genomics, 7th fl. > Department of Biology > New York University. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From hlapp at gmx.net Wed Jul 4 09:02:20 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 04 Jul 2007 13:02:20 -0000 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <468B54FB.3090606@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> Message-ID: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g though I'm not sure this wouldn't incur too much overhead in Java. You can certainly detect the eol character(s) by line.indexOf('\r'); if found and the preceding character is '\n' you have DOS/Win-style line endings, and otherwise if found it is Mac-style. However, this all seems like a lot of trouble to go through if all that one would need to ask of people is to make sure that the file matches the native eol style of the platform, which is really trivial to achieve. For example, to convert Win-style line endings to Unix: $ perl -pi -e 's/\r//g;' and from Mac to Unix: $ perl -pi -e 's/\r/\n/g;' I have these and other simple conversions defined as aliases in my .profile, and don't really ever worry about writing lots of code to accommodate arbitrary line endings :-) -hilmar On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh > 3ppr3WRdJcQgzIAJdUoIX0U= > =Cboa > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Wed Jul 4 08:06:19 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 09:06:19 +0100 Subject: [Biojava-dev] Request for help! Message-ID: <468B54FB.3090606@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi guys. I need help with a programming question! In Java, you can find out the line-end symbol that the JRE is using by calling: System.getProperty("line.separator"); On *nix this returns "\n", for instance. Our file parsers all rely on this to return the symbol to break lines at when parsing files. This usually works fine. BUT... on Windows machines, for certain files, it does not appear to work! I suspect that these text files were generated on a *nix machine then transferred by copying files across file systems using native copy commands, or using binary FTP so that the system retained the *nix line-end symbols instead of replacing them for the local line-end symbols as it would have done if they were transferred in text mode via FTP. I don't have access to a Windows machine I can test on, but I suspect that the fix is quite a simple one and boils down to replacing the System() call with something more intelligent. Is there any regex or similar thing we can use to spot _all_ kinds of line-end symbols in text files regardless of the platform the file was created on or the platform the parser is being run on? (For information, the only two users who have reported problems like this are both using Nexus files - I'm not sure what tool generated them though. The Nexus parser uses the same rules as all the other parsers in BioJava so I don't think there's anything specifically wrong with it as opposed to say the GenBank or FASTA parsers.) cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh 3ppr3WRdJcQgzIAJdUoIX0U= =Cboa -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Jul 4 14:10:12 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 4 Jul 2007 22:10:12 +0800 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> Message-ID: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> BufferedWriter provides a newLine() method that writes a line separator but I'm not sure if that gives you a different result or not. This may be a JVM bug that needs to be submitted to Sun. As a very ugly work around it is possible to determine the OS from the System object as well. - Mark On 7/4/07, Hilmar Lapp wrote: > In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g > though I'm not sure this wouldn't incur too much overhead in Java. > > You can certainly detect the eol character(s) by line.indexOf('\r'); > if found and the preceding character is '\n' you have DOS/Win-style > line endings, and otherwise if found it is Mac-style. > > However, this all seems like a lot of trouble to go through if all > that one would need to ask of people is to make sure that the file > matches the native eol style of the platform, which is really trivial > to achieve. > > For example, to convert Win-style line endings to Unix: > > $ perl -pi -e 's/\r//g;' > > and from Mac to Unix: > > $ perl -pi -e 's/\r/\n/g;' > > I have these and other simple conversions defined as aliases in > my .profile, and don't really ever worry about writing lots of code > to accommodate arbitrary line endings :-) > > -hilmar > > On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi guys. > > > > I need help with a programming question! > > > > In Java, you can find out the line-end symbol that the JRE is using by > > calling: > > > > System.getProperty("line.separator"); > > > > On *nix this returns "\n", for instance. > > > > Our file parsers all rely on this to return the symbol to break > > lines at > > when parsing files. This usually works fine. > > > > BUT... on Windows machines, for certain files, it does not appear to > > work! I suspect that these text files were generated on a *nix machine > > then transferred by copying files across file systems using native > > copy > > commands, or using binary FTP so that the system retained the *nix > > line-end symbols instead of replacing them for the local line-end > > symbols as it would have done if they were transferred in text mode > > via > > FTP. > > > > I don't have access to a Windows machine I can test on, but I suspect > > that the fix is quite a simple one and boils down to replacing the > > System() call with something more intelligent. > > > > Is there any regex or similar thing we can use to spot _all_ kinds of > > line-end symbols in text files regardless of the platform the file was > > created on or the platform the parser is being run on? > > > > (For information, the only two users who have reported problems like > > this are both using Nexus files - I'm not sure what tool generated > > them > > though. The Nexus parser uses the same rules as all the other > > parsers in > > BioJava so I don't think there's anything specifically wrong with > > it as > > opposed to say the GenBank or FASTA parsers.) > > > > cheers, > > Richard > > > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh > > 3ppr3WRdJcQgzIAJdUoIX0U= > > =Cboa > > -----END PGP SIGNATURE----- > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Wed Jul 4 14:33:28 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 04 Jul 2007 15:33:28 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> Message-ID: <468BAFB8.708@ebi.ac.uk> BufferedWriter will always use the value of System.getProperty("line.separator") however BufferedReader knows that an end of line can be \r\n, \r or \n so in Java land is perfectly legal to have any common line terminator & still write files in an OS specific manner. I sent a regex to Rich which he improved on but the net result is the extraction of the EOL regardless of which one it is. I'm not 100% sure on where the problem lies. So long as the parsers use BufferedReader for it's text file reading (which they all seem to do) this shouldn't have been a problem. In fact this is the line from the BufferedReader.readLine() in the JDK: "Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed." Very very strange but the regex sounds like it was a pragmatic solution Andy Mark Schreiber wrote: > BufferedWriter provides a newLine() method that writes a line > separator but I'm not sure if that gives you a different result or > not. > > This may be a JVM bug that needs to be submitted to Sun. > > As a very ugly work around it is possible to determine the OS from the > System object as well. > > - Mark > > On 7/4/07, Hilmar Lapp wrote: >> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >> though I'm not sure this wouldn't incur too much overhead in Java. >> >> You can certainly detect the eol character(s) by line.indexOf('\r'); >> if found and the preceding character is '\n' you have DOS/Win-style >> line endings, and otherwise if found it is Mac-style. >> >> However, this all seems like a lot of trouble to go through if all >> that one would need to ask of people is to make sure that the file >> matches the native eol style of the platform, which is really trivial >> to achieve. >> >> For example, to convert Win-style line endings to Unix: >> >> $ perl -pi -e 's/\r//g;' >> >> and from Mac to Unix: >> >> $ perl -pi -e 's/\r/\n/g;' >> >> I have these and other simple conversions defined as aliases in >> my .profile, and don't really ever worry about writing lots of code >> to accommodate arbitrary line endings :-) >> >> -hilmar >> >> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Hi guys. >>> >>> I need help with a programming question! >>> >>> In Java, you can find out the line-end symbol that the JRE is using by >>> calling: >>> >>> System.getProperty("line.separator"); >>> >>> On *nix this returns "\n", for instance. >>> >>> Our file parsers all rely on this to return the symbol to break >>> lines at >>> when parsing files. This usually works fine. >>> >>> BUT... on Windows machines, for certain files, it does not appear to >>> work! I suspect that these text files were generated on a *nix machine >>> then transferred by copying files across file systems using native >>> copy >>> commands, or using binary FTP so that the system retained the *nix >>> line-end symbols instead of replacing them for the local line-end >>> symbols as it would have done if they were transferred in text mode >>> via >>> FTP. >>> >>> I don't have access to a Windows machine I can test on, but I suspect >>> that the fix is quite a simple one and boils down to replacing the >>> System() call with something more intelligent. >>> >>> Is there any regex or similar thing we can use to spot _all_ kinds of >>> line-end symbols in text files regardless of the platform the file was >>> created on or the platform the parser is being run on? >>> >>> (For information, the only two users who have reported problems like >>> this are both using Nexus files - I'm not sure what tool generated >>> them >>> though. The Nexus parser uses the same rules as all the other >>> parsers in >>> BioJava so I don't think there's anything specifically wrong with >>> it as >>> opposed to say the GenBank or FASTA parsers.) >>> >>> cheers, >>> Richard >>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh >>> 3ppr3WRdJcQgzIAJdUoIX0U= >>> =Cboa >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Wed Jul 4 15:04:41 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 16:04:41 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> Message-ID: <468BB709.4010704@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks everyone for your replies. Turns out a regex of the various combinations of \r and \n is the best way. cheers, Richard Mark Schreiber wrote: > BufferedWriter provides a newLine() method that writes a line > separator but I'm not sure if that gives you a different result or > not. > > This may be a JVM bug that needs to be submitted to Sun. > > As a very ugly work around it is possible to determine the OS from the > System object as well. > > - Mark > > On 7/4/07, Hilmar Lapp wrote: >> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >> though I'm not sure this wouldn't incur too much overhead in Java. >> >> You can certainly detect the eol character(s) by line.indexOf('\r'); >> if found and the preceding character is '\n' you have DOS/Win-style >> line endings, and otherwise if found it is Mac-style. >> >> However, this all seems like a lot of trouble to go through if all >> that one would need to ask of people is to make sure that the file >> matches the native eol style of the platform, which is really trivial >> to achieve. >> >> For example, to convert Win-style line endings to Unix: >> >> $ perl -pi -e 's/\r//g;' >> >> and from Mac to Unix: >> >> $ perl -pi -e 's/\r/\n/g;' >> >> I have these and other simple conversions defined as aliases in >> my .profile, and don't really ever worry about writing lots of code >> to accommodate arbitrary line endings :-) >> >> -hilmar >> >> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >> > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi7cJ4C5LeMEKA/QRAumDAKCJ5yc8PoZ+sLhcBOkL2Jdp/unW+gCfZrxG AoVCPngmYX3b/pxfiGJbzic= =2cyA -----END PGP SIGNATURE----- From holland at ebi.ac.uk Wed Jul 4 15:06:32 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 16:06:32 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <468BAFB8.708@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> Message-ID: <468BB778.2050704@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The problem was that I was using the newline in a tokenizer, which needed to return and regcognize the newline symbols themselves (the Nexus format is new-line sensitive). Hence I had to deal with files that may not have the system new-line operator. cheers, Richard Andy Yates wrote: > BufferedWriter will always use the value of > System.getProperty("line.separator") however BufferedReader knows that > an end of line can be \r\n, \r or \n so in Java land is perfectly legal > to have any common line terminator & still write files in an OS specific > manner. > > I sent a regex to Rich which he improved on but the net result is the > extraction of the EOL regardless of which one it is. > > I'm not 100% sure on where the problem lies. So long as the parsers use > BufferedReader for it's text file reading (which they all seem to do) > this shouldn't have been a problem. In fact this is the line from the > BufferedReader.readLine() in the JDK: > > "Read a line of text. A line is considered to be terminated by any one > of a line feed ('\n'), a carriage return ('\r'), or a carriage return > followed immediately by a linefeed." > > Very very strange but the regex sounds like it was a pragmatic solution > > Andy > > Mark Schreiber wrote: >> BufferedWriter provides a newLine() method that writes a line >> separator but I'm not sure if that gives you a different result or >> not. >> >> This may be a JVM bug that needs to be submitted to Sun. >> >> As a very ugly work around it is possible to determine the OS from the >> System object as well. >> >> - Mark >> >> On 7/4/07, Hilmar Lapp wrote: >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >>> though I'm not sure this wouldn't incur too much overhead in Java. >>> >>> You can certainly detect the eol character(s) by line.indexOf('\r'); >>> if found and the preceding character is '\n' you have DOS/Win-style >>> line endings, and otherwise if found it is Mac-style. >>> >>> However, this all seems like a lot of trouble to go through if all >>> that one would need to ask of people is to make sure that the file >>> matches the native eol style of the platform, which is really trivial >>> to achieve. >>> >>> For example, to convert Win-style line endings to Unix: >>> >>> $ perl -pi -e 's/\r//g;' >>> >>> and from Mac to Unix: >>> >>> $ perl -pi -e 's/\r/\n/g;' >>> >>> I have these and other simple conversions defined as aliases in >>> my .profile, and don't really ever worry about writing lots of code >>> to accommodate arbitrary line endings :-) >>> >>> -hilmar >>> >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >>> > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp ctRVPlj5VA0oDzMsoxP4Ohs= =6wg0 -----END PGP SIGNATURE----- From markjschreiber at gmail.com Thu Jul 5 01:29:35 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 5 Jul 2007 09:29:35 +0800 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <468BB778.2050704@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk> Message-ID: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> Slightly related to this ... It might be worth making a quick check of the biojava code base to see how often a "\n" appears in the source code. - Mark On 7/4/07, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > The problem was that I was using the newline in a tokenizer, which > needed to return and regcognize the newline symbols themselves (the > Nexus format is new-line sensitive). Hence I had to deal with files that > may not have the system new-line operator. > > cheers, > Richard > > Andy Yates wrote: > > BufferedWriter will always use the value of > > System.getProperty("line.separator") however BufferedReader knows that > > an end of line can be \r\n, \r or \n so in Java land is perfectly legal > > to have any common line terminator & still write files in an OS specific > > manner. > > > > I sent a regex to Rich which he improved on but the net result is the > > extraction of the EOL regardless of which one it is. > > > > I'm not 100% sure on where the problem lies. So long as the parsers use > > BufferedReader for it's text file reading (which they all seem to do) > > this shouldn't have been a problem. In fact this is the line from the > > BufferedReader.readLine() in the JDK: > > > > "Read a line of text. A line is considered to be terminated by any one > > of a line feed ('\n'), a carriage return ('\r'), or a carriage return > > followed immediately by a linefeed." > > > > Very very strange but the regex sounds like it was a pragmatic solution > > > > Andy > > > > Mark Schreiber wrote: > >> BufferedWriter provides a newLine() method that writes a line > >> separator but I'm not sure if that gives you a different result or > >> not. > >> > >> This may be a JVM bug that needs to be submitted to Sun. > >> > >> As a very ugly work around it is possible to determine the OS from the > >> System object as well. > >> > >> - Mark > >> > >> On 7/4/07, Hilmar Lapp wrote: > >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g > >>> though I'm not sure this wouldn't incur too much overhead in Java. > >>> > >>> You can certainly detect the eol character(s) by line.indexOf('\r'); > >>> if found and the preceding character is '\n' you have DOS/Win-style > >>> line endings, and otherwise if found it is Mac-style. > >>> > >>> However, this all seems like a lot of trouble to go through if all > >>> that one would need to ask of people is to make sure that the file > >>> matches the native eol style of the platform, which is really trivial > >>> to achieve. > >>> > >>> For example, to convert Win-style line endings to Unix: > >>> > >>> $ perl -pi -e 's/\r//g;' > >>> > >>> and from Mac to Unix: > >>> > >>> $ perl -pi -e 's/\r/\n/g;' > >>> > >>> I have these and other simple conversions defined as aliases in > >>> my .profile, and don't really ever worry about writing lots of code > >>> to accommodate arbitrary line endings :-) > >>> > >>> -hilmar > >>> > >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > >>> > > Hi guys. > > > > I need help with a programming question! > > > > In Java, you can find out the line-end symbol that the JRE is using by > > calling: > > > > System.getProperty("line.separator"); > > > > On *nix this returns "\n", for instance. > > > > Our file parsers all rely on this to return the symbol to break > > lines at > > when parsing files. This usually works fine. > > > > BUT... on Windows machines, for certain files, it does not appear to > > work! I suspect that these text files were generated on a *nix machine > > then transferred by copying files across file systems using native > > copy > > commands, or using binary FTP so that the system retained the *nix > > line-end symbols instead of replacing them for the local line-end > > symbols as it would have done if they were transferred in text mode > > via > > FTP. > > > > I don't have access to a Windows machine I can test on, but I suspect > > that the fix is quite a simple one and boils down to replacing the > > System() call with something more intelligent. > > > > Is there any regex or similar thing we can use to spot _all_ kinds of > > line-end symbols in text files regardless of the platform the file was > > created on or the platform the parser is being run on? > > > > (For information, the only two users who have reported problems like > > this are both using Nexus files - I'm not sure what tool generated > > them > > though. The Nexus parser uses the same rules as all the other > > parsers in > > BioJava so I don't think there's anything specifically wrong with > > it as > > opposed to say the GenBank or FASTA parsers.) > > > > cheers, > > Richard > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp > ctRVPlj5VA0oDzMsoxP4Ohs= > =6wg0 > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Thu Jul 5 07:40:14 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 05 Jul 2007 08:40:14 +0100 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk> <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> Message-ID: <468CA05E.6070308@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 "\n" is used 262 times in 76 different locations: src/org/biojava/bio/alignment/NeedlemanWunsch.java src/org/biojava/bio/alignment/SequenceAlignment.java src/org/biojava/bio/alignment/SmithWaterman.java src/org/biojava/bio/alignment/SubstitutionMatrix.java src/org/biojava/bio/chromatogram/graphic/ChromatogramGraphic.java src/org/biojava/bio/dist/AbstractDistribution.java src/org/biojava/bio/dp/onehead/SingleDP.java src/org/biojava/bio/dp/twohead/DPInterpreter.java src/org/biojava/bio/dp/XmlMarkovModel.java src/org/biojava/bio/gui/sequence/ImageMap.java src/org/biojava/bio/program/abi/ABIFParser.java src/org/biojava/bio/program/blast2html/AbstractAlignmentStyler.java src/org/biojava/bio/program/blast2html/HTMLRenderer.java src/org/biojava/bio/program/das/dasalignment/Alignment.java src/org/biojava/bio/program/das/FeatureRequestManager.java src/org/biojava/bio/program/sax/BlastLikeAlignmentSAXParser.java src/org/biojava/bio/program/sax/ClustalWAlignmentSAXParser.java src/org/biojava/bio/program/sax/FastaSequenceSAXParser.java src/org/biojava/bio/program/sax/NeedleAlignmentSAXParser.java src/org/biojava/bio/search/KnuthMorrisPrattSearch.java src/org/biojava/bio/seq/db/BioIndex.java src/org/biojava/bio/seq/db/GenbankSequenceDB.java src/org/biojava/bio/seq/db/TabIndexStore.java src/org/biojava/bio/seq/io/agave/AGAVEBioSeqHandler.java src/org/biojava/bio/seq/io/agave/AGAVEContigHandler.java src/org/biojava/bio/seq/io/agave/AGAVEDbId.java src/org/biojava/bio/seq/io/agave/AGAVEKeywordPropHandler.java src/org/biojava/bio/seq/io/agave/AGAVEMapLocation.java src/org/biojava/bio/seq/io/agave/AGAVEMapPosition.java src/org/biojava/bio/seq/io/agave/AGAVEMatchRegion.java src/org/biojava/bio/seq/io/agave/AGAVEProperty.java src/org/biojava/bio/seq/io/agave/AGAVEQueryRegion.java src/org/biojava/bio/seq/io/agave/AGAVERelatedAnnot.java src/org/biojava/bio/seq/io/agave/AGAVESeqPropHandler.java src/org/biojava/bio/seq/io/agave/AgaveWriter.java src/org/biojava/bio/seq/io/agave/AGAVEXref.java src/org/biojava/bio/seq/io/agave/AGAVEXrefs.java src/org/biojava/bio/seq/io/agave/Embl2AgaveAnnotFilter.java src/org/biojava/bio/seq/io/FastaFormat.java src/org/biojava/bio/seq/io/GenbankFileFormer.java src/org/biojava/bio/seq/io/ParseException.java src/org/biojava/bio/structure/align/pairwise/AlternativeAlignment.java src/org/biojava/bio/structure/ChainImpl.java src/org/biojava/bio/structure/io/FileConvert.java src/org/biojava/bio/structure/StructureImpl.java src/org/biojava/bio/symbol/AbstractSimpleBasisSymbol.java src/org/biojava/bio/symbol/AlphabetManager.java src/org/biojava/bio/symbol/DoubleAlphabet.java src/org/biojava/bio/symbol/IntegerAlphabet.java src/org/biojava/bio/symbol/SimpleAlignment.java src/org/biojava/stats/svm/tools/TrainRegression.java src/org/biojava/utils/automata/DfaBuilder.java src/org/biojava/utils/automata/FiniteAutomaton.java src/org/biojava/utils/automata/PatternMaker.java src/org/biojava/utils/candy/CandyEntry.java src/org/biojava/utils/ChangeSupport.java src/org/biojava/utils/ExecRunner.java src/org/biojava/utils/io/CountedBufferedReader.java src/org/biojava/utils/ParserException.java src/org/biojava/utils/StaticMemberPlaceHolder.java src/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.java src/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.java src/org/biojavax/bio/phylo/io/nexus/CharactersBlockParser.java src/org/biojavax/bio/phylo/io/nexus/DistancesBlockParser.java src/org/biojavax/bio/phylo/io/nexus/NexusFileFormat.java src/org/biojavax/bio/phylo/MultipleHitCorrection.java src/org/biojavax/bio/seq/io/DebuggingRichSeqIOListener.java src/org/biojavax/bio/seq/io/EMBLFormat.java src/org/biojavax/bio/seq/io/FastaFormat.java src/org/biojavax/bio/seq/io/GenbankFormat.java src/org/biojavax/bio/seq/io/UniProtCommentParser.java src/org/biojavax/bio/seq/io/UniProtFormat.java src/org/biojavax/bio/taxa/SimpleNCBITaxonName.java src/org/biojavax/utils/StringTools.java src/org/biojavax/utils/XMLTools.java Not all of these are 'bad' newlines - but still, it's a lot to search through. I've put it on my list of to-do things for when I'm bored. cheers, Richard Mark Schreiber wrote: > Slightly related to this ... > > It might be worth making a quick check of the biojava code base to see > how often a "\n" appears in the source code. > > - Mark > > On 7/4/07, Richard Holland wrote: > The problem was that I was using the newline in a tokenizer, which > needed to return and regcognize the newline symbols themselves (the > Nexus format is new-line sensitive). Hence I had to deal with files that > may not have the system new-line operator. > > cheers, > Richard > > Andy Yates wrote: >>>> BufferedWriter will always use the value of >>>> System.getProperty("line.separator") however BufferedReader knows that >>>> an end of line can be \r\n, \r or \n so in Java land is perfectly legal >>>> to have any common line terminator & still write files in an OS specific >>>> manner. >>>> >>>> I sent a regex to Rich which he improved on but the net result is the >>>> extraction of the EOL regardless of which one it is. >>>> >>>> I'm not 100% sure on where the problem lies. So long as the parsers use >>>> BufferedReader for it's text file reading (which they all seem to do) >>>> this shouldn't have been a problem. In fact this is the line from the >>>> BufferedReader.readLine() in the JDK: >>>> >>>> "Read a line of text. A line is considered to be terminated by any one >>>> of a line feed ('\n'), a carriage return ('\r'), or a carriage return >>>> followed immediately by a linefeed." >>>> >>>> Very very strange but the regex sounds like it was a pragmatic solution >>>> >>>> Andy >>>> >>>> Mark Schreiber wrote: >>>>> BufferedWriter provides a newLine() method that writes a line >>>>> separator but I'm not sure if that gives you a different result or >>>>> not. >>>>> >>>>> This may be a JVM bug that needs to be submitted to Sun. >>>>> >>>>> As a very ugly work around it is possible to determine the OS from the >>>>> System object as well. >>>>> >>>>> - Mark >>>>> >>>>> On 7/4/07, Hilmar Lapp wrote: >>>>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >>>>>> though I'm not sure this wouldn't incur too much overhead in Java. >>>>>> >>>>>> You can certainly detect the eol character(s) by line.indexOf('\r'); >>>>>> if found and the preceding character is '\n' you have DOS/Win-style >>>>>> line endings, and otherwise if found it is Mac-style. >>>>>> >>>>>> However, this all seems like a lot of trouble to go through if all >>>>>> that one would need to ask of people is to make sure that the file >>>>>> matches the native eol style of the platform, which is really trivial >>>>>> to achieve. >>>>>> >>>>>> For example, to convert Win-style line endings to Unix: >>>>>> >>>>>> $ perl -pi -e 's/\r//g;' >>>>>> >>>>>> and from Mac to Unix: >>>>>> >>>>>> $ perl -pi -e 's/\r/\n/g;' >>>>>> >>>>>> I have these and other simple conversions defined as aliases in >>>>>> my .profile, and don't really ever worry about writing lots of code >>>>>> to accommodate arbitrary line endings :-) >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >>>>>> >>>> Hi guys. >>>> >>>> I need help with a programming question! >>>> >>>> In Java, you can find out the line-end symbol that the JRE is using by >>>> calling: >>>> >>>> System.getProperty("line.separator"); >>>> >>>> On *nix this returns "\n", for instance. >>>> >>>> Our file parsers all rely on this to return the symbol to break >>>> lines at >>>> when parsing files. This usually works fine. >>>> >>>> BUT... on Windows machines, for certain files, it does not appear to >>>> work! I suspect that these text files were generated on a *nix machine >>>> then transferred by copying files across file systems using native >>>> copy >>>> commands, or using binary FTP so that the system retained the *nix >>>> line-end symbols instead of replacing them for the local line-end >>>> symbols as it would have done if they were transferred in text mode >>>> via >>>> FTP. >>>> >>>> I don't have access to a Windows machine I can test on, but I suspect >>>> that the fix is quite a simple one and boils down to replacing the >>>> System() call with something more intelligent. >>>> >>>> Is there any regex or similar thing we can use to spot _all_ kinds of >>>> line-end symbols in text files regardless of the platform the file was >>>> created on or the platform the parser is being run on? >>>> >>>> (For information, the only two users who have reported problems like >>>> this are both using Nexus files - I'm not sure what tool generated >>>> them >>>> though. The Nexus parser uses the same rules as all the other >>>> parsers in >>>> BioJava so I don't think there's anything specifically wrong with >>>> it as >>>> opposed to say the GenBank or FASTA parsers.) >>>> >>>> cheers, >>>> Richard >>>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjKBd4C5LeMEKA/QRAuARAJsGmSZpdOEuNyYDNn0Xn1rBA6KBjgCeLr8s qkMnk1CwoMnqBT0RCwQjuSI= =X9+G -----END PGP SIGNATURE----- From kdoshi at asuragen.com Mon Jul 9 19:31:02 2007 From: kdoshi at asuragen.com (kdoshi at asuragen.com) Date: Mon, 9 Jul 2007 14:31:02 -0500 Subject: [Biojava-dev] Hello and support for FASTA34 Message-ID: <88A6938E58BEF34BA54B1531006D77DCD0467F@SVREXCH.asuragen.us> Hello Biojava. My name is Kishore Doshi. I have been programming in Java and C++ for 8+ years now and I just completed my PhD in Molecular Biology at the University of Texas at Austin. For my PhD work, I developed a software toolkit in Java for RNA comparative sequence analysis at the Gutell Lab. As part of my new job, I have just come across the Biojava API and I am very impressed with the capabilities available. I wish I would have used it a few years ago when developing my RNA comparative sequence analysis toolkit. I would be interested in lending my time to help Biojava as it continues to evolve. One area I have noticed I could provide immediate help would be in support for parsing FASTA search results. The classes FastaSearchParser and FastaSearchSAXParser appear to support FASTA 3.3; however, they do not appear to support FASTA 3.4. I have modified FastaSearchParser and FastaSearchSAXParser to support the tag modifications in FASTA 3.4 output. I would be interested in contributing my changes back to the community if possible. Please advise on how I should move forward. Thanks Kishore Doshi, M.S.;Ph.D. | Bioinformatics Asuragen, Inc. -- A Spin-Off of Ambion 2150 Woodward, Suite 100, Austin TX USA 78744 Tel: 1-512-681-5397 | Fax: 1-512-681-5201 From holland at ebi.ac.uk Mon Jul 9 21:41:15 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 9 Jul 2007 22:41:15 +0100 (BST) Subject: [Biojava-dev] Hello and support for FASTA34 In-Reply-To: <88A6938E58BEF34BA54B1531006D77DCD0467F@SVREXCH.asuragen.us> References: <88A6938E58BEF34BA54B1531006D77DCD0467F@SVREXCH.asuragen.us> Message-ID: <57489.80.42.44.136.1184017275.squirrel@webmail.ebi.ac.uk> Hello! It's great that you are volunteering. We really appreciate all the help we can get. Does your new code support both the older 3.3 and the new 3.4 format, or does it support only the newer one? It'd be nice if it could read both even if it can only write the newer one (that's the way our newer parsers in the BioJavaX extension packages work). The best way to contribute small amounts of code is to do what you have done and post a message to the list. Then, someone with CVS access will offer to review it and commit it. So, the next step is to email me the code you have written as an attachment, to the email address I am sending this message from. I'll then check it through and commit it. We also ask that test cases be written for each new piece of code submitted. I don't know if you've come across JUnit at all, but if you have and you know how to write a JUnit test to test your new code then that would be extremely useful to have. Thanks again for your help and I'm looking forward to seeing your code. cheers, Richard On Mon, July 9, 2007 8:31 pm, kdoshi at asuragen.com wrote: > Hello Biojava. > > My name is Kishore Doshi. I have been programming in Java and C++ for 8+ > years now and I just completed my PhD in Molecular Biology at the > University of Texas at Austin. For my PhD work, I developed a software > toolkit in Java for RNA comparative sequence analysis at the Gutell Lab. > > As part of my new job, I have just come across the Biojava API and I am > very impressed with the capabilities available. I wish I would have used > it a few years ago when developing my RNA comparative sequence analysis > toolkit. > > I would be interested in lending my time to help Biojava as it continues > to evolve. One area I have noticed I could provide immediate help would > be in support for parsing FASTA search results. The classes > FastaSearchParser and FastaSearchSAXParser appear to support FASTA 3.3; > however, they do not appear to support FASTA 3.4. I have modified > FastaSearchParser and FastaSearchSAXParser to support the tag > modifications in FASTA 3.4 output. I would be interested in contributing > my changes back to the community if possible. Please advise on how I > should move forward. > > Thanks > Kishore Doshi, M.S.;Ph.D. | Bioinformatics > Asuragen, Inc. -- A Spin-Off of Ambion > 2150 Woodward, Suite 100, Austin TX USA 78744 > Tel: 1-512-681-5397 | Fax: 1-512-681-5201 > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From bugzilla-daemon at portal.open-bio.org Tue Jul 10 07:48:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Jul 2007 03:48:52 -0400 Subject: [Biojava-dev] [Bug 2330] New: DP/ Profile HMM bug Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2330 Summary: DP/ Profile HMM bug Product: BioJava Version: 1.5 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: dist/dp AssignedTo: biojava-dev at biojava.org ReportedBy: mark.schreiber at novartis.com I have a problem when running demos/dp/SearchProfile.java. The program return an error message : classes\demos>java dp.SearchProfile fake.fasta Loading sequences Creating profile HMM Estimating alignment as having length 999 org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an indexed member of the alphabet Transitions from i-791 despite being in the alphabet. at org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) at org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) at org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) at org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) at dp.SearchProfile.randomize(SearchProfile.java:155) at dp.SearchProfile.createProfile(SearchProfile.java:104) at dp.SearchProfile.main(SearchProfile.java:31) This is certain to be an issue with Gaps and Alphabet manager. Make sure a ProfileHMM can be serialized and deserialized multiple times after fixing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From pm66 at nyu.edu Wed Jul 11 02:34:20 2007 From: pm66 at nyu.edu (Philip E Macmenamin) Date: Tue, 10 Jul 2007 22:34:20 -0400 Subject: [Biojava-dev] Request for applet for drawing ab1 or scf files... Message-ID: Hi, Does anyone know of an Applet that will act as a Chromatogram Viewer given ab1 or scf file? We hava / had one, however I only have the class file, and it allegedly does not run on Microsoft's wonderful Vista operating system. Only everywhere else. Since I have only the class file, I can't really monkey around with what I have. If anyone can give me, or point me to anything that would do the job, I would be very grateful and credit you as author on the site. Thanks again for any help, Philip MacMenamin, Center for Comparative Functional Genomics, 7th fl. Department of Biology New York University. From holland at ebi.ac.uk Wed Jul 11 07:40:26 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 11 Jul 2007 08:40:26 +0100 Subject: [Biojava-dev] Request for applet for drawing ab1 or scf files... In-Reply-To: References: Message-ID: <4694896A.9050707@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There are BioJava components which you could use to write an applet, but we don't have a complete applet (because everyone's definition of what such an applet should do is bound to be different!). The drawing code is here: http://www.biojava.org/docs/api/org/biojava/bio/chromatogram/graphic/package-summary.html And the code for parsing an ABI file into a Chromatogram object is here: http://www.biojava.org/docs/api/org/biojava/bio/program/abi/ABIFChromatogram.html cheers, Richard Philip E Macmenamin wrote: > Hi, > > Does anyone know of an Applet that will act as a Chromatogram Viewer given ab1 or scf file? > We hava / had one, however I only have the class file, and it allegedly does not run on Microsoft's wonderful Vista operating system. Only everywhere else. > Since I have only the class file, I can't really monkey around with what I have. > If anyone can give me, or point me to anything that would do the job, I would be very grateful and credit you as author on the site. > > Thanks again for any help, > > Philip MacMenamin, > > Center for Comparative Functional Genomics, 7th fl. > Department of Biology > New York University. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlIlq4C5LeMEKA/QRAjhSAJ0aVqPm+GzCAcyWf1p8+Dc5qmeEoQCeIScl ipuV6Eg6mb4IxBvTBs94vyA= =Kk7G -----END PGP SIGNATURE----- From ayates at ebi.ac.uk Wed Jul 11 08:15:33 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 11 Jul 2007 09:15:33 +0100 Subject: [Biojava-dev] Request for applet for drawing ab1 or scf files... In-Reply-To: References: Message-ID: <469491A5.6030701@ebi.ac.uk> Hi Philip, There is also the bioview2 project being run by members of the Cancer Genome Project available from http://code.google.com/p/bioview2/. The code works alongside biojava's ABI & SCF parsers but with it's own drawing code & is intended to render portions of traces to files rather than work in a more GUI fashion. There used to be a Java webstart program written by Rhett Sutphin (the person responsible for writing the original chromatogram code) but it seems to have disappeared. Personally I would follow Richard's suggestion & look at using the Chromatogram drawing code available in BioJava. The code itself is very straightforward to use & very configurable. Good luck Andy Philip E Macmenamin wrote: > Hi, > > Does anyone know of an Applet that will act as a Chromatogram Viewer given ab1 or scf file? > We hava / had one, however I only have the class file, and it allegedly does not run on Microsoft's wonderful Vista operating system. Only everywhere else. > Since I have only the class file, I can't really monkey around with what I have. > If anyone can give me, or point me to anything that would do the job, I would be very grateful and credit you as author on the site. > > Thanks again for any help, > > Philip MacMenamin, > > Center for Comparative Functional Genomics, 7th fl. > Department of Biology > New York University. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From hlapp at gmx.net Wed Jul 4 13:02:20 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 04 Jul 2007 13:02:20 -0000 Subject: [Biojava-dev] [Biojava-l] Request for help! In-Reply-To: <468B54FB.3090606@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> Message-ID: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g though I'm not sure this wouldn't incur too much overhead in Java. You can certainly detect the eol character(s) by line.indexOf('\r'); if found and the preceding character is '\n' you have DOS/Win-style line endings, and otherwise if found it is Mac-style. However, this all seems like a lot of trouble to go through if all that one would need to ask of people is to make sure that the file matches the native eol style of the platform, which is really trivial to achieve. For example, to convert Win-style line endings to Unix: $ perl -pi -e 's/\r//g;' and from Mac to Unix: $ perl -pi -e 's/\r/\n/g;' I have these and other simple conversions defined as aliases in my .profile, and don't really ever worry about writing lots of code to accommodate arbitrary line endings :-) -hilmar On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh > 3ppr3WRdJcQgzIAJdUoIX0U= > =Cboa > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : ===========================================================