From zt_2003 at 163.com Mon Jul 2 01:35:08 2007 From: zt_2003 at 163.com (zt_2003) Date: Mon, 2 Jul 2007 13:35:08 +0800 (CST) Subject: [Biojava-l] Where can I find the demo of using svm in biojava? Message-ID: <16701458.2297451183354508767.JavaMail.coremail@bj163app62.163.com> Who can tell me, where can I find the demo of using svm in biojava? And will biojava support artificial network or bayesian network in future? From kavita_mbi at yahoo.com Wed Jul 4 00:46:03 2007 From: kavita_mbi at yahoo.com (Kavita Agarwal) Date: Tue, 3 Jul 2007 21:46:03 -0700 (PDT) Subject: [Biojava-l] Fwd: biojava error Message-ID: <520964.87338.qm@web39713.mail.mud.yahoo.com> Hi, Iam using biojava in an applet and I get the error :- Error: Unable to initialise DNATools but the biojava code runs fine when I use it in an application. I am running my applat in the appletviewer. Can anyone tell me how should I exactly set my classpath for biojava and java files. I have these folders- jdk1.5.0 located at C:\Program files\Java jre1.5.0 at the same location biojava -all 6 jar files at C:\Program files\biojava --------------------------------- Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. From kavita_mbi at yahoo.com Wed Jul 4 00:46:10 2007 From: kavita_mbi at yahoo.com (Kavita Agarwal) Date: Tue, 3 Jul 2007 21:46:10 -0700 (PDT) Subject: [Biojava-l] Fwd: biojava error Message-ID: <823658.22799.qm@web39712.mail.mud.yahoo.com> Hi, Iam using biojava in an applet and I get the error :- Error: Unable to initialise DNATools but the biojava code runs fine when I use it in an application. I am running my applat in the appletviewer. Can anyone tell me how should I exactly set my classpath for biojava and java files. I have these folders- jdk1.5.0 located at C:\Program files\Java jre1.5.0 at the same location biojava -all 6 jar files at C:\Program files\biojava --------------------------------- Pinpoint customers who are looking for what you sell. From holland at ebi.ac.uk Wed Jul 4 04:06:19 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 09:06:19 +0100 Subject: [Biojava-l] Request for help! Message-ID: <468B54FB.3090606@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi guys. I need help with a programming question! In Java, you can find out the line-end symbol that the JRE is using by calling: System.getProperty("line.separator"); On *nix this returns "\n", for instance. Our file parsers all rely on this to return the symbol to break lines at when parsing files. This usually works fine. BUT... on Windows machines, for certain files, it does not appear to work! I suspect that these text files were generated on a *nix machine then transferred by copying files across file systems using native copy commands, or using binary FTP so that the system retained the *nix line-end symbols instead of replacing them for the local line-end symbols as it would have done if they were transferred in text mode via FTP. I don't have access to a Windows machine I can test on, but I suspect that the fix is quite a simple one and boils down to replacing the System() call with something more intelligent. Is there any regex or similar thing we can use to spot _all_ kinds of line-end symbols in text files regardless of the platform the file was created on or the platform the parser is being run on? (For information, the only two users who have reported problems like this are both using Nexus files - I'm not sure what tool generated them though. The Nexus parser uses the same rules as all the other parsers in BioJava so I don't think there's anything specifically wrong with it as opposed to say the GenBank or FASTA parsers.) cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh 3ppr3WRdJcQgzIAJdUoIX0U= =Cboa -----END PGP SIGNATURE----- From hlapp at gmx.net Wed Jul 4 08:55:28 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 4 Jul 2007 08:55:28 -0400 Subject: [Biojava-l] Request for help! In-Reply-To: <468B54FB.3090606@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> Message-ID: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g though I'm not sure this wouldn't incur too much overhead in Java. You can certainly detect the eol character(s) by line.indexOf('\r'); if found and the preceding character is '\n' you have DOS/Win-style line endings, and otherwise if found it is Mac-style. However, this all seems like a lot of trouble to go through if all that one would need to ask of people is to make sure that the file matches the native eol style of the platform, which is really trivial to achieve. For example, to convert Win-style line endings to Unix: $ perl -pi -e 's/\r//g;' and from Mac to Unix: $ perl -pi -e 's/\r/\n/g;' I have these and other simple conversions defined as aliases in my .profile, and don't really ever worry about writing lots of code to accommodate arbitrary line endings :-) -hilmar On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh > 3ppr3WRdJcQgzIAJdUoIX0U= > =Cboa > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From markjschreiber at gmail.com Wed Jul 4 10:10:12 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 4 Jul 2007 22:10:12 +0800 Subject: [Biojava-l] Request for help! In-Reply-To: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> Message-ID: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> BufferedWriter provides a newLine() method that writes a line separator but I'm not sure if that gives you a different result or not. This may be a JVM bug that needs to be submitted to Sun. As a very ugly work around it is possible to determine the OS from the System object as well. - Mark On 7/4/07, Hilmar Lapp wrote: > In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g > though I'm not sure this wouldn't incur too much overhead in Java. > > You can certainly detect the eol character(s) by line.indexOf('\r'); > if found and the preceding character is '\n' you have DOS/Win-style > line endings, and otherwise if found it is Mac-style. > > However, this all seems like a lot of trouble to go through if all > that one would need to ask of people is to make sure that the file > matches the native eol style of the platform, which is really trivial > to achieve. > > For example, to convert Win-style line endings to Unix: > > $ perl -pi -e 's/\r//g;' > > and from Mac to Unix: > > $ perl -pi -e 's/\r/\n/g;' > > I have these and other simple conversions defined as aliases in > my .profile, and don't really ever worry about writing lots of code > to accommodate arbitrary line endings :-) > > -hilmar > > On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi guys. > > > > I need help with a programming question! > > > > In Java, you can find out the line-end symbol that the JRE is using by > > calling: > > > > System.getProperty("line.separator"); > > > > On *nix this returns "\n", for instance. > > > > Our file parsers all rely on this to return the symbol to break > > lines at > > when parsing files. This usually works fine. > > > > BUT... on Windows machines, for certain files, it does not appear to > > work! I suspect that these text files were generated on a *nix machine > > then transferred by copying files across file systems using native > > copy > > commands, or using binary FTP so that the system retained the *nix > > line-end symbols instead of replacing them for the local line-end > > symbols as it would have done if they were transferred in text mode > > via > > FTP. > > > > I don't have access to a Windows machine I can test on, but I suspect > > that the fix is quite a simple one and boils down to replacing the > > System() call with something more intelligent. > > > > Is there any regex or similar thing we can use to spot _all_ kinds of > > line-end symbols in text files regardless of the platform the file was > > created on or the platform the parser is being run on? > > > > (For information, the only two users who have reported problems like > > this are both using Nexus files - I'm not sure what tool generated > > them > > though. The Nexus parser uses the same rules as all the other > > parsers in > > BioJava so I don't think there's anything specifically wrong with > > it as > > opposed to say the GenBank or FASTA parsers.) > > > > cheers, > > Richard > > > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh > > 3ppr3WRdJcQgzIAJdUoIX0U= > > =Cboa > > -----END PGP SIGNATURE----- > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Wed Jul 4 10:33:28 2007 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 04 Jul 2007 15:33:28 +0100 Subject: [Biojava-l] [Biojava-dev] Request for help! In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> Message-ID: <468BAFB8.708@ebi.ac.uk> BufferedWriter will always use the value of System.getProperty("line.separator") however BufferedReader knows that an end of line can be \r\n, \r or \n so in Java land is perfectly legal to have any common line terminator & still write files in an OS specific manner. I sent a regex to Rich which he improved on but the net result is the extraction of the EOL regardless of which one it is. I'm not 100% sure on where the problem lies. So long as the parsers use BufferedReader for it's text file reading (which they all seem to do) this shouldn't have been a problem. In fact this is the line from the BufferedReader.readLine() in the JDK: "Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed." Very very strange but the regex sounds like it was a pragmatic solution Andy Mark Schreiber wrote: > BufferedWriter provides a newLine() method that writes a line > separator but I'm not sure if that gives you a different result or > not. > > This may be a JVM bug that needs to be submitted to Sun. > > As a very ugly work around it is possible to determine the OS from the > System object as well. > > - Mark > > On 7/4/07, Hilmar Lapp wrote: >> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >> though I'm not sure this wouldn't incur too much overhead in Java. >> >> You can certainly detect the eol character(s) by line.indexOf('\r'); >> if found and the preceding character is '\n' you have DOS/Win-style >> line endings, and otherwise if found it is Mac-style. >> >> However, this all seems like a lot of trouble to go through if all >> that one would need to ask of people is to make sure that the file >> matches the native eol style of the platform, which is really trivial >> to achieve. >> >> For example, to convert Win-style line endings to Unix: >> >> $ perl -pi -e 's/\r//g;' >> >> and from Mac to Unix: >> >> $ perl -pi -e 's/\r/\n/g;' >> >> I have these and other simple conversions defined as aliases in >> my .profile, and don't really ever worry about writing lots of code >> to accommodate arbitrary line endings :-) >> >> -hilmar >> >> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Hi guys. >>> >>> I need help with a programming question! >>> >>> In Java, you can find out the line-end symbol that the JRE is using by >>> calling: >>> >>> System.getProperty("line.separator"); >>> >>> On *nix this returns "\n", for instance. >>> >>> Our file parsers all rely on this to return the symbol to break >>> lines at >>> when parsing files. This usually works fine. >>> >>> BUT... on Windows machines, for certain files, it does not appear to >>> work! I suspect that these text files were generated on a *nix machine >>> then transferred by copying files across file systems using native >>> copy >>> commands, or using binary FTP so that the system retained the *nix >>> line-end symbols instead of replacing them for the local line-end >>> symbols as it would have done if they were transferred in text mode >>> via >>> FTP. >>> >>> I don't have access to a Windows machine I can test on, but I suspect >>> that the fix is quite a simple one and boils down to replacing the >>> System() call with something more intelligent. >>> >>> Is there any regex or similar thing we can use to spot _all_ kinds of >>> line-end symbols in text files regardless of the platform the file was >>> created on or the platform the parser is being run on? >>> >>> (For information, the only two users who have reported problems like >>> this are both using Nexus files - I'm not sure what tool generated >>> them >>> though. The Nexus parser uses the same rules as all the other >>> parsers in >>> BioJava so I don't think there's anything specifically wrong with >>> it as >>> opposed to say the GenBank or FASTA parsers.) >>> >>> cheers, >>> Richard >>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh >>> 3ppr3WRdJcQgzIAJdUoIX0U= >>> =Cboa >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at ebi.ac.uk Wed Jul 4 11:04:41 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 16:04:41 +0100 Subject: [Biojava-l] Request for help! In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> Message-ID: <468BB709.4010704@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks everyone for your replies. Turns out a regex of the various combinations of \r and \n is the best way. cheers, Richard Mark Schreiber wrote: > BufferedWriter provides a newLine() method that writes a line > separator but I'm not sure if that gives you a different result or > not. > > This may be a JVM bug that needs to be submitted to Sun. > > As a very ugly work around it is possible to determine the OS from the > System object as well. > > - Mark > > On 7/4/07, Hilmar Lapp wrote: >> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >> though I'm not sure this wouldn't incur too much overhead in Java. >> >> You can certainly detect the eol character(s) by line.indexOf('\r'); >> if found and the preceding character is '\n' you have DOS/Win-style >> line endings, and otherwise if found it is Mac-style. >> >> However, this all seems like a lot of trouble to go through if all >> that one would need to ask of people is to make sure that the file >> matches the native eol style of the platform, which is really trivial >> to achieve. >> >> For example, to convert Win-style line endings to Unix: >> >> $ perl -pi -e 's/\r//g;' >> >> and from Mac to Unix: >> >> $ perl -pi -e 's/\r/\n/g;' >> >> I have these and other simple conversions defined as aliases in >> my .profile, and don't really ever worry about writing lots of code >> to accommodate arbitrary line endings :-) >> >> -hilmar >> >> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >> > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi7cJ4C5LeMEKA/QRAumDAKCJ5yc8PoZ+sLhcBOkL2Jdp/unW+gCfZrxG AoVCPngmYX3b/pxfiGJbzic= =2cyA -----END PGP SIGNATURE----- From holland at ebi.ac.uk Wed Jul 4 11:06:32 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 04 Jul 2007 16:06:32 +0100 Subject: [Biojava-l] [Biojava-dev] Request for help! In-Reply-To: <468BAFB8.708@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> Message-ID: <468BB778.2050704@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The problem was that I was using the newline in a tokenizer, which needed to return and regcognize the newline symbols themselves (the Nexus format is new-line sensitive). Hence I had to deal with files that may not have the system new-line operator. cheers, Richard Andy Yates wrote: > BufferedWriter will always use the value of > System.getProperty("line.separator") however BufferedReader knows that > an end of line can be \r\n, \r or \n so in Java land is perfectly legal > to have any common line terminator & still write files in an OS specific > manner. > > I sent a regex to Rich which he improved on but the net result is the > extraction of the EOL regardless of which one it is. > > I'm not 100% sure on where the problem lies. So long as the parsers use > BufferedReader for it's text file reading (which they all seem to do) > this shouldn't have been a problem. In fact this is the line from the > BufferedReader.readLine() in the JDK: > > "Read a line of text. A line is considered to be terminated by any one > of a line feed ('\n'), a carriage return ('\r'), or a carriage return > followed immediately by a linefeed." > > Very very strange but the regex sounds like it was a pragmatic solution > > Andy > > Mark Schreiber wrote: >> BufferedWriter provides a newLine() method that writes a line >> separator but I'm not sure if that gives you a different result or >> not. >> >> This may be a JVM bug that needs to be submitted to Sun. >> >> As a very ugly work around it is possible to determine the OS from the >> System object as well. >> >> - Mark >> >> On 7/4/07, Hilmar Lapp wrote: >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >>> though I'm not sure this wouldn't incur too much overhead in Java. >>> >>> You can certainly detect the eol character(s) by line.indexOf('\r'); >>> if found and the preceding character is '\n' you have DOS/Win-style >>> line endings, and otherwise if found it is Mac-style. >>> >>> However, this all seems like a lot of trouble to go through if all >>> that one would need to ask of people is to make sure that the file >>> matches the native eol style of the platform, which is really trivial >>> to achieve. >>> >>> For example, to convert Win-style line endings to Unix: >>> >>> $ perl -pi -e 's/\r//g;' >>> >>> and from Mac to Unix: >>> >>> $ perl -pi -e 's/\r/\n/g;' >>> >>> I have these and other simple conversions defined as aliases in >>> my .profile, and don't really ever worry about writing lots of code >>> to accommodate arbitrary line endings :-) >>> >>> -hilmar >>> >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >>> > Hi guys. > > I need help with a programming question! > > In Java, you can find out the line-end symbol that the JRE is using by > calling: > > System.getProperty("line.separator"); > > On *nix this returns "\n", for instance. > > Our file parsers all rely on this to return the symbol to break > lines at > when parsing files. This usually works fine. > > BUT... on Windows machines, for certain files, it does not appear to > work! I suspect that these text files were generated on a *nix machine > then transferred by copying files across file systems using native > copy > commands, or using binary FTP so that the system retained the *nix > line-end symbols instead of replacing them for the local line-end > symbols as it would have done if they were transferred in text mode > via > FTP. > > I don't have access to a Windows machine I can test on, but I suspect > that the fix is quite a simple one and boils down to replacing the > System() call with something more intelligent. > > Is there any regex or similar thing we can use to spot _all_ kinds of > line-end symbols in text files regardless of the platform the file was > created on or the platform the parser is being run on? > > (For information, the only two users who have reported problems like > this are both using Nexus files - I'm not sure what tool generated > them > though. The Nexus parser uses the same rules as all the other > parsers in > BioJava so I don't think there's anything specifically wrong with > it as > opposed to say the GenBank or FASTA parsers.) > > cheers, > Richard > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp ctRVPlj5VA0oDzMsoxP4Ohs= =6wg0 -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Jul 4 21:29:35 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 5 Jul 2007 09:29:35 +0800 Subject: [Biojava-l] [Biojava-dev] Request for help! In-Reply-To: <468BB778.2050704@ebi.ac.uk> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk> Message-ID: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> Slightly related to this ... It might be worth making a quick check of the biojava code base to see how often a "\n" appears in the source code. - Mark On 7/4/07, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > The problem was that I was using the newline in a tokenizer, which > needed to return and regcognize the newline symbols themselves (the > Nexus format is new-line sensitive). Hence I had to deal with files that > may not have the system new-line operator. > > cheers, > Richard > > Andy Yates wrote: > > BufferedWriter will always use the value of > > System.getProperty("line.separator") however BufferedReader knows that > > an end of line can be \r\n, \r or \n so in Java land is perfectly legal > > to have any common line terminator & still write files in an OS specific > > manner. > > > > I sent a regex to Rich which he improved on but the net result is the > > extraction of the EOL regardless of which one it is. > > > > I'm not 100% sure on where the problem lies. So long as the parsers use > > BufferedReader for it's text file reading (which they all seem to do) > > this shouldn't have been a problem. In fact this is the line from the > > BufferedReader.readLine() in the JDK: > > > > "Read a line of text. A line is considered to be terminated by any one > > of a line feed ('\n'), a carriage return ('\r'), or a carriage return > > followed immediately by a linefeed." > > > > Very very strange but the regex sounds like it was a pragmatic solution > > > > Andy > > > > Mark Schreiber wrote: > >> BufferedWriter provides a newLine() method that writes a line > >> separator but I'm not sure if that gives you a different result or > >> not. > >> > >> This may be a JVM bug that needs to be submitted to Sun. > >> > >> As a very ugly work around it is possible to determine the OS from the > >> System object as well. > >> > >> - Mark > >> > >> On 7/4/07, Hilmar Lapp wrote: > >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g > >>> though I'm not sure this wouldn't incur too much overhead in Java. > >>> > >>> You can certainly detect the eol character(s) by line.indexOf('\r'); > >>> if found and the preceding character is '\n' you have DOS/Win-style > >>> line endings, and otherwise if found it is Mac-style. > >>> > >>> However, this all seems like a lot of trouble to go through if all > >>> that one would need to ask of people is to make sure that the file > >>> matches the native eol style of the platform, which is really trivial > >>> to achieve. > >>> > >>> For example, to convert Win-style line endings to Unix: > >>> > >>> $ perl -pi -e 's/\r//g;' > >>> > >>> and from Mac to Unix: > >>> > >>> $ perl -pi -e 's/\r/\n/g;' > >>> > >>> I have these and other simple conversions defined as aliases in > >>> my .profile, and don't really ever worry about writing lots of code > >>> to accommodate arbitrary line endings :-) > >>> > >>> -hilmar > >>> > >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: > >>> > > Hi guys. > > > > I need help with a programming question! > > > > In Java, you can find out the line-end symbol that the JRE is using by > > calling: > > > > System.getProperty("line.separator"); > > > > On *nix this returns "\n", for instance. > > > > Our file parsers all rely on this to return the symbol to break > > lines at > > when parsing files. This usually works fine. > > > > BUT... on Windows machines, for certain files, it does not appear to > > work! I suspect that these text files were generated on a *nix machine > > then transferred by copying files across file systems using native > > copy > > commands, or using binary FTP so that the system retained the *nix > > line-end symbols instead of replacing them for the local line-end > > symbols as it would have done if they were transferred in text mode > > via > > FTP. > > > > I don't have access to a Windows machine I can test on, but I suspect > > that the fix is quite a simple one and boils down to replacing the > > System() call with something more intelligent. > > > > Is there any regex or similar thing we can use to spot _all_ kinds of > > line-end symbols in text files regardless of the platform the file was > > created on or the platform the parser is being run on? > > > > (For information, the only two users who have reported problems like > > this are both using Nexus files - I'm not sure what tool generated > > them > > though. The Nexus parser uses the same rules as all the other > > parsers in > > BioJava so I don't think there's anything specifically wrong with > > it as > > opposed to say the GenBank or FASTA parsers.) > > > > cheers, > > Richard > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> -- > >>> =========================================================== > >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>> =========================================================== > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp > ctRVPlj5VA0oDzMsoxP4Ohs= > =6wg0 > -----END PGP SIGNATURE----- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Thu Jul 5 03:40:14 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 05 Jul 2007 08:40:14 +0100 Subject: [Biojava-l] [Biojava-dev] Request for help! In-Reply-To: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> References: <468B54FB.3090606@ebi.ac.uk> <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net> <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com> <468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk> <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com> Message-ID: <468CA05E.6070308@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 "\n" is used 262 times in 76 different locations: src/org/biojava/bio/alignment/NeedlemanWunsch.java src/org/biojava/bio/alignment/SequenceAlignment.java src/org/biojava/bio/alignment/SmithWaterman.java src/org/biojava/bio/alignment/SubstitutionMatrix.java src/org/biojava/bio/chromatogram/graphic/ChromatogramGraphic.java src/org/biojava/bio/dist/AbstractDistribution.java src/org/biojava/bio/dp/onehead/SingleDP.java src/org/biojava/bio/dp/twohead/DPInterpreter.java src/org/biojava/bio/dp/XmlMarkovModel.java src/org/biojava/bio/gui/sequence/ImageMap.java src/org/biojava/bio/program/abi/ABIFParser.java src/org/biojava/bio/program/blast2html/AbstractAlignmentStyler.java src/org/biojava/bio/program/blast2html/HTMLRenderer.java src/org/biojava/bio/program/das/dasalignment/Alignment.java src/org/biojava/bio/program/das/FeatureRequestManager.java src/org/biojava/bio/program/sax/BlastLikeAlignmentSAXParser.java src/org/biojava/bio/program/sax/ClustalWAlignmentSAXParser.java src/org/biojava/bio/program/sax/FastaSequenceSAXParser.java src/org/biojava/bio/program/sax/NeedleAlignmentSAXParser.java src/org/biojava/bio/search/KnuthMorrisPrattSearch.java src/org/biojava/bio/seq/db/BioIndex.java src/org/biojava/bio/seq/db/GenbankSequenceDB.java src/org/biojava/bio/seq/db/TabIndexStore.java src/org/biojava/bio/seq/io/agave/AGAVEBioSeqHandler.java src/org/biojava/bio/seq/io/agave/AGAVEContigHandler.java src/org/biojava/bio/seq/io/agave/AGAVEDbId.java src/org/biojava/bio/seq/io/agave/AGAVEKeywordPropHandler.java src/org/biojava/bio/seq/io/agave/AGAVEMapLocation.java src/org/biojava/bio/seq/io/agave/AGAVEMapPosition.java src/org/biojava/bio/seq/io/agave/AGAVEMatchRegion.java src/org/biojava/bio/seq/io/agave/AGAVEProperty.java src/org/biojava/bio/seq/io/agave/AGAVEQueryRegion.java src/org/biojava/bio/seq/io/agave/AGAVERelatedAnnot.java src/org/biojava/bio/seq/io/agave/AGAVESeqPropHandler.java src/org/biojava/bio/seq/io/agave/AgaveWriter.java src/org/biojava/bio/seq/io/agave/AGAVEXref.java src/org/biojava/bio/seq/io/agave/AGAVEXrefs.java src/org/biojava/bio/seq/io/agave/Embl2AgaveAnnotFilter.java src/org/biojava/bio/seq/io/FastaFormat.java src/org/biojava/bio/seq/io/GenbankFileFormer.java src/org/biojava/bio/seq/io/ParseException.java src/org/biojava/bio/structure/align/pairwise/AlternativeAlignment.java src/org/biojava/bio/structure/ChainImpl.java src/org/biojava/bio/structure/io/FileConvert.java src/org/biojava/bio/structure/StructureImpl.java src/org/biojava/bio/symbol/AbstractSimpleBasisSymbol.java src/org/biojava/bio/symbol/AlphabetManager.java src/org/biojava/bio/symbol/DoubleAlphabet.java src/org/biojava/bio/symbol/IntegerAlphabet.java src/org/biojava/bio/symbol/SimpleAlignment.java src/org/biojava/stats/svm/tools/TrainRegression.java src/org/biojava/utils/automata/DfaBuilder.java src/org/biojava/utils/automata/FiniteAutomaton.java src/org/biojava/utils/automata/PatternMaker.java src/org/biojava/utils/candy/CandyEntry.java src/org/biojava/utils/ChangeSupport.java src/org/biojava/utils/ExecRunner.java src/org/biojava/utils/io/CountedBufferedReader.java src/org/biojava/utils/ParserException.java src/org/biojava/utils/StaticMemberPlaceHolder.java src/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.java src/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.java src/org/biojavax/bio/phylo/io/nexus/CharactersBlockParser.java src/org/biojavax/bio/phylo/io/nexus/DistancesBlockParser.java src/org/biojavax/bio/phylo/io/nexus/NexusFileFormat.java src/org/biojavax/bio/phylo/MultipleHitCorrection.java src/org/biojavax/bio/seq/io/DebuggingRichSeqIOListener.java src/org/biojavax/bio/seq/io/EMBLFormat.java src/org/biojavax/bio/seq/io/FastaFormat.java src/org/biojavax/bio/seq/io/GenbankFormat.java src/org/biojavax/bio/seq/io/UniProtCommentParser.java src/org/biojavax/bio/seq/io/UniProtFormat.java src/org/biojavax/bio/taxa/SimpleNCBITaxonName.java src/org/biojavax/utils/StringTools.java src/org/biojavax/utils/XMLTools.java Not all of these are 'bad' newlines - but still, it's a lot to search through. I've put it on my list of to-do things for when I'm bored. cheers, Richard Mark Schreiber wrote: > Slightly related to this ... > > It might be worth making a quick check of the biojava code base to see > how often a "\n" appears in the source code. > > - Mark > > On 7/4/07, Richard Holland wrote: > The problem was that I was using the newline in a tokenizer, which > needed to return and regcognize the newline symbols themselves (the > Nexus format is new-line sensitive). Hence I had to deal with files that > may not have the system new-line operator. > > cheers, > Richard > > Andy Yates wrote: >>>> BufferedWriter will always use the value of >>>> System.getProperty("line.separator") however BufferedReader knows that >>>> an end of line can be \r\n, \r or \n so in Java land is perfectly legal >>>> to have any common line terminator & still write files in an OS specific >>>> manner. >>>> >>>> I sent a regex to Rich which he improved on but the net result is the >>>> extraction of the EOL regardless of which one it is. >>>> >>>> I'm not 100% sure on where the problem lies. So long as the parsers use >>>> BufferedReader for it's text file reading (which they all seem to do) >>>> this shouldn't have been a problem. In fact this is the line from the >>>> BufferedReader.readLine() in the JDK: >>>> >>>> "Read a line of text. A line is considered to be terminated by any one >>>> of a line feed ('\n'), a carriage return ('\r'), or a carriage return >>>> followed immediately by a linefeed." >>>> >>>> Very very strange but the regex sounds like it was a pragmatic solution >>>> >>>> Andy >>>> >>>> Mark Schreiber wrote: >>>>> BufferedWriter provides a newLine() method that writes a line >>>>> separator but I'm not sure if that gives you a different result or >>>>> not. >>>>> >>>>> This may be a JVM bug that needs to be submitted to Sun. >>>>> >>>>> As a very ugly work around it is possible to determine the OS from the >>>>> System object as well. >>>>> >>>>> - Mark >>>>> >>>>> On 7/4/07, Hilmar Lapp wrote: >>>>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g >>>>>> though I'm not sure this wouldn't incur too much overhead in Java. >>>>>> >>>>>> You can certainly detect the eol character(s) by line.indexOf('\r'); >>>>>> if found and the preceding character is '\n' you have DOS/Win-style >>>>>> line endings, and otherwise if found it is Mac-style. >>>>>> >>>>>> However, this all seems like a lot of trouble to go through if all >>>>>> that one would need to ask of people is to make sure that the file >>>>>> matches the native eol style of the platform, which is really trivial >>>>>> to achieve. >>>>>> >>>>>> For example, to convert Win-style line endings to Unix: >>>>>> >>>>>> $ perl -pi -e 's/\r//g;' >>>>>> >>>>>> and from Mac to Unix: >>>>>> >>>>>> $ perl -pi -e 's/\r/\n/g;' >>>>>> >>>>>> I have these and other simple conversions defined as aliases in >>>>>> my .profile, and don't really ever worry about writing lots of code >>>>>> to accommodate arbitrary line endings :-) >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote: >>>>>> >>>> Hi guys. >>>> >>>> I need help with a programming question! >>>> >>>> In Java, you can find out the line-end symbol that the JRE is using by >>>> calling: >>>> >>>> System.getProperty("line.separator"); >>>> >>>> On *nix this returns "\n", for instance. >>>> >>>> Our file parsers all rely on this to return the symbol to break >>>> lines at >>>> when parsing files. This usually works fine. >>>> >>>> BUT... on Windows machines, for certain files, it does not appear to >>>> work! I suspect that these text files were generated on a *nix machine >>>> then transferred by copying files across file systems using native >>>> copy >>>> commands, or using binary FTP so that the system retained the *nix >>>> line-end symbols instead of replacing them for the local line-end >>>> symbols as it would have done if they were transferred in text mode >>>> via >>>> FTP. >>>> >>>> I don't have access to a Windows machine I can test on, but I suspect >>>> that the fix is quite a simple one and boils down to replacing the >>>> System() call with something more intelligent. >>>> >>>> Is there any regex or similar thing we can use to spot _all_ kinds of >>>> line-end symbols in text files regardless of the platform the file was >>>> created on or the platform the parser is being run on? >>>> >>>> (For information, the only two users who have reported problems like >>>> this are both using Nexus files - I'm not sure what tool generated >>>> them >>>> though. The Nexus parser uses the same rules as all the other >>>> parsers in >>>> BioJava so I don't think there's anything specifically wrong with >>>> it as >>>> opposed to say the GenBank or FASTA parsers.) >>>> >>>> cheers, >>>> Richard >>>> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjKBd4C5LeMEKA/QRAuARAJsGmSZpdOEuNyYDNn0Xn1rBA6KBjgCeLr8s qkMnk1CwoMnqBT0RCwQjuSI= =X9+G -----END PGP SIGNATURE----- From aulia at students.itb.ac.id Mon Jul 9 03:08:39 2007 From: aulia at students.itb.ac.id (Aulia Rahma Amin) Date: Mon, 9 Jul 2007 14:08:39 +0700 (WIT) Subject: [Biojava-l] How to read and write a ProfileHMM into file Message-ID: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id> How to read and write a ProfileHMM into file? I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM when I read the file. -- Aulia Rahma Amin ARC05/IF03 Y! ID : aulia_ra Skype ID : aulia_ra MSN ID : aulia_ra at hotmail.com AIM ID : auliara ICQ ID : aulia_ra Homepage : http://www.aulia-ra.org From holland at ebi.ac.uk Mon Jul 9 03:46:02 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 09 Jul 2007 08:46:02 +0100 Subject: [Biojava-l] How to read and write a ProfileHMM into file In-Reply-To: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id> References: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id> Message-ID: <4691E7BA.9030209@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Have you tried the classes in org.biojava.bio.program.hmmer ? There is a parser in there which will read the output from HMMER. cheers, Richard Aulia Rahma Amin wrote: > How to read and write a ProfileHMM into file? > I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM > when I read the file. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGkee64C5LeMEKA/QRAvevAKCVYFeUNByQwew6a900oj2MJjHnmACdHE8M lSJgI+HuhRAjEngMlxI+JVo= =Ft98 -----END PGP SIGNATURE----- From markjschreiber at gmail.com Mon Jul 9 11:59:49 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 9 Jul 2007 23:59:49 +0800 Subject: [Biojava-l] How to read and write a ProfileHMM into file In-Reply-To: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id> References: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id> Message-ID: <93b45ca50707090859m13c7ff89wd942f838cf6bdbea@mail.gmail.com> Hi - The best possible solution would be to extend XMLMarkovModel so that it can attempt to construct a ProfileHMM from an XML file. - Mark On 7/9/07, Aulia Rahma Amin wrote: > > How to read and write a ProfileHMM into file? > I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM > when I read the file. > > -- > Aulia Rahma Amin > ARC05/IF03 > Y! ID : aulia_ra > Skype ID : aulia_ra > MSN ID : aulia_ra at hotmail.com > AIM ID : auliara > ICQ ID : aulia_ra > Homepage : http://www.aulia-ra.org > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From aulia at students.itb.ac.id Tue Jul 10 02:05:22 2007 From: aulia at students.itb.ac.id (Aulia Rahma Amin) Date: Tue, 10 Jul 2007 13:05:22 +0700 (WIT) Subject: [Biojava-l] Problem with SearchProfile demo Message-ID: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> I have a problem when running demos/dp/SearchProfile.java. The program return an error message : classes\demos>java dp.SearchProfile fake.fasta Loading sequences Creating profile HMM Estimating alignment as having length 999 org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an indexed member of the alphabet Transitions from i-791 despite being in the alphabet. at org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) at org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) at org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) at org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) at dp.SearchProfile.randomize(SearchProfile.java:155) at dp.SearchProfile.createProfile(SearchProfile.java:104) at dp.SearchProfile.main(SearchProfile.java:31) I suspect this problem occur when the the program run train method in SimpleDistribution class. Is this a bug or what? Any help will be deeply appreciated... ===== Aulia Rahma Amin Undergraduate Student School of Electrical Engineering and Informatics Bandung Institute of Technology Indonesia From markjschreiber at gmail.com Tue Jul 10 02:13:55 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 10 Jul 2007 14:13:55 +0800 Subject: [Biojava-l] Problem with SearchProfile demo In-Reply-To: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> Message-ID: <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> Hi - What version of BioJava do yo have? - Mark On 7/10/07, Aulia Rahma Amin wrote: > I have a problem when running demos/dp/SearchProfile.java. The program > return an error message : > > classes\demos>java dp.SearchProfile fake.fasta > Loading sequences > Creating profile HMM > Estimating alignment as having length 999 > org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an > indexed member of the alphabet Transitions from i-791 despite being in the > alphabet. > at > org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) > at > org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) > at > org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) > at > org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) > at dp.SearchProfile.randomize(SearchProfile.java:155) > at dp.SearchProfile.createProfile(SearchProfile.java:104) > at dp.SearchProfile.main(SearchProfile.java:31) > > I suspect this problem occur when the the program run train method in > SimpleDistribution class. Is this a bug or what? > > Any help will be deeply appreciated... > > ===== > Aulia Rahma Amin > Undergraduate Student > School of Electrical Engineering and Informatics > Bandung Institute of Technology > Indonesia > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From aulia at students.itb.ac.id Tue Jul 10 02:29:48 2007 From: aulia at students.itb.ac.id (Aulia Rahma Amin) Date: Tue, 10 Jul 2007 13:29:48 +0700 (WIT) Subject: [Biojava-l] Problem with SearchProfile demo In-Reply-To: <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> Message-ID: <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id> I'm using BioJava 1.5. I didn't find this problem when using BioJava 1.4. -aulia- > Hi - > > What version of BioJava do yo have? > > - Mark > > On 7/10/07, Aulia Rahma Amin wrote: >> I have a problem when running demos/dp/SearchProfile.java. The program >> return an error message : >> >> classes\demos>java dp.SearchProfile fake.fasta >> Loading sequences >> Creating profile HMM >> Estimating alignment as having length 999 >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an >> indexed member of the alphabet Transitions from i-791 despite being in >> the >> alphabet. >> at >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) >> at >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) >> at >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) >> at >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) >> at dp.SearchProfile.randomize(SearchProfile.java:155) >> at dp.SearchProfile.createProfile(SearchProfile.java:104) >> at dp.SearchProfile.main(SearchProfile.java:31) >> >> I suspect this problem occur when the the program run train method in >> SimpleDistribution class. Is this a bug or what? >> >> Any help will be deeply appreciated... >> >> ===== >> Aulia Rahma Amin >> Undergraduate Student >> School of Electrical Engineering and Informatics >> Bandung Institute of Technology >> Indonesia >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From markjschreiber at gmail.com Tue Jul 10 02:54:51 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 10 Jul 2007 14:54:51 +0800 Subject: [Biojava-l] Problem with SearchProfile demo In-Reply-To: <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id> References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id> Message-ID: <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com> Does the problem occur with the ProfileHMM example in the cookbook? (http://biojava.org/wiki/BioJava:CookBook:DP:HMM) - Mark On 7/10/07, Aulia Rahma Amin wrote: > I'm using BioJava 1.5. I didn't find this problem when using BioJava 1.4. > > -aulia- > > > Hi - > > > > What version of BioJava do yo have? > > > > - Mark > > > > On 7/10/07, Aulia Rahma Amin wrote: > >> I have a problem when running demos/dp/SearchProfile.java. The program > >> return an error message : > >> > >> classes\demos>java dp.SearchProfile fake.fasta > >> Loading sequences > >> Creating profile HMM > >> Estimating alignment as having length 999 > >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an > >> indexed member of the alphabet Transitions from i-791 despite being in > >> the > >> alphabet. > >> at > >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) > >> at > >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) > >> at > >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) > >> at > >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) > >> at dp.SearchProfile.randomize(SearchProfile.java:155) > >> at dp.SearchProfile.createProfile(SearchProfile.java:104) > >> at dp.SearchProfile.main(SearchProfile.java:31) > >> > >> I suspect this problem occur when the the program run train method in > >> SimpleDistribution class. Is this a bug or what? > >> > >> Any help will be deeply appreciated... > >> > >> ===== > >> Aulia Rahma Amin > >> Undergraduate Student > >> School of Electrical Engineering and Informatics > >> Bandung Institute of Technology > >> Indonesia > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From aulia at students.itb.ac.id Tue Jul 10 03:02:08 2007 From: aulia at students.itb.ac.id (Aulia Rahma Amin) Date: Tue, 10 Jul 2007 14:02:08 +0700 (WIT) Subject: [Biojava-l] Problem with SearchProfile demo In-Reply-To: <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com> References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id> <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com> Message-ID: <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id> Yes, it happens, always end with org.biojava.bio.BioError: Assertion Failure. But I have no problems when running the example using BioJava 1.4. -aulia- > Does the problem occur with the ProfileHMM example in the cookbook? > (http://biojava.org/wiki/BioJava:CookBook:DP:HMM) > > - Mark > > On 7/10/07, Aulia Rahma Amin wrote: >> I'm using BioJava 1.5. I didn't find this problem when using BioJava >> 1.4. >> >> -aulia- >> >> > Hi - >> > >> > What version of BioJava do yo have? >> > >> > - Mark >> > >> > On 7/10/07, Aulia Rahma Amin wrote: >> >> I have a problem when running demos/dp/SearchProfile.java. The >> program >> >> return an error message : >> >> >> >> classes\demos>java dp.SearchProfile fake.fasta >> >> Loading sequences >> >> Creating profile HMM >> >> Estimating alignment as having length 999 >> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an >> >> indexed member of the alphabet Transitions from i-791 despite being >> in >> >> the >> >> alphabet. >> >> at >> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) >> >> at >> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) >> >> at >> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) >> >> at >> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) >> >> at dp.SearchProfile.randomize(SearchProfile.java:155) >> >> at dp.SearchProfile.createProfile(SearchProfile.java:104) >> >> at dp.SearchProfile.main(SearchProfile.java:31) >> >> >> >> I suspect this problem occur when the the program run train method in >> >> SimpleDistribution class. Is this a bug or what? >> >> >> >> Any help will be deeply appreciated... >> >> >> >> ===== >> >> Aulia Rahma Amin >> >> Undergraduate Student >> >> School of Electrical Engineering and Informatics >> >> Bandung Institute of Technology >> >> Indonesia >> >> >> >> >> >> _______________________________________________ >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- Aulia Rahma Amin ARC05/IF03 Y! ID : aulia_ra Skype ID : aulia_ra MSN ID : aulia_ra at hotmail.com AIM ID : auliara ICQ ID : aulia_ra Homepage : http://www.aulia-ra.org From markjschreiber at gmail.com Tue Jul 10 06:01:30 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 10 Jul 2007 18:01:30 +0800 Subject: [Biojava-l] Problem with SearchProfile demo In-Reply-To: <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id> References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id> <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com> <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id> Message-ID: <93b45ca50707100301s61a8c522m5039b02447f0bd07@mail.gmail.com> I have submitted this as a bug report. It seems to be a bug in all HMM code. Some initial testing suggests it is a problem with Flyweight symbols (States) not behaiving properly. My tests of ProfileHMMs still worked about 7-8 months ago. According to CVS the only thing that happened after that time to classes that might be relevant was a semi-automated removal of crud from the code (unused parameters etc). It is very hard to tell which change did the damage. I suspect I will have to write some unit tests for the DP classes. Somehow I think this should have happened about 6 years ago (MRP, are you listening!!) but better late than never : ) - Mark On 7/10/07, Aulia Rahma Amin wrote: > Yes, it happens, always end with org.biojava.bio.BioError: Assertion > Failure. But I have no problems when running the example using BioJava > 1.4. > > -aulia- > > > Does the problem occur with the ProfileHMM example in the cookbook? > > (http://biojava.org/wiki/BioJava:CookBook:DP:HMM) > > > > - Mark > > > > On 7/10/07, Aulia Rahma Amin wrote: > >> I'm using BioJava 1.5. I didn't find this problem when using BioJava > >> 1.4. > >> > >> -aulia- > >> > >> > Hi - > >> > > >> > What version of BioJava do yo have? > >> > > >> > - Mark > >> > > >> > On 7/10/07, Aulia Rahma Amin wrote: > >> >> I have a problem when running demos/dp/SearchProfile.java. The > >> program > >> >> return an error message : > >> >> > >> >> classes\demos>java dp.SearchProfile fake.fasta > >> >> Loading sequences > >> >> Creating profile HMM > >> >> Estimating alignment as having length 999 > >> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an > >> >> indexed member of the alphabet Transitions from i-791 despite being > >> in > >> >> the > >> >> alphabet. > >> >> at > >> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118) > >> >> at > >> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98) > >> >> at > >> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273) > >> >> at > >> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85) > >> >> at dp.SearchProfile.randomize(SearchProfile.java:155) > >> >> at dp.SearchProfile.createProfile(SearchProfile.java:104) > >> >> at dp.SearchProfile.main(SearchProfile.java:31) > >> >> > >> >> I suspect this problem occur when the the program run train method in > >> >> SimpleDistribution class. Is this a bug or what? > >> >> > >> >> Any help will be deeply appreciated... > >> >> > >> >> ===== > >> >> Aulia Rahma Amin > >> >> Undergraduate Student > >> >> School of Electrical Engineering and Informatics > >> >> Bandung Institute of Technology > >> >> Indonesia > >> >> > >> >> > >> >> _______________________________________________ > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> >> > >> > > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > > -- > Aulia Rahma Amin > ARC05/IF03 > Y! ID : aulia_ra > Skype ID : aulia_ra > MSN ID : aulia_ra at hotmail.com > AIM ID : auliara > ICQ ID : aulia_ra > Homepage : http://www.aulia-ra.org > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From Jonathan.Warren at agresearch.co.nz Fri Jul 13 00:37:56 2007 From: Jonathan.Warren at agresearch.co.nz (Warren, Jonathan) Date: Fri, 13 Jul 2007 16:37:56 +1200 Subject: [Biojava-l] ACE parser Message-ID: Hi I've seen posts related to people writing an ace file format parser (contig assembly output type http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I believe there is not one available in biojava? I am thinking of writing one and contributing it to biojava. Thinking about the design of it - has anyone got any advice or pointers? If I want to hide the data and mechanics from users I don't want to give access to all the data it gathers - but not knowing how people are going to use it implies that maybe I should give a lot of access to the data?? Cheers Jonathan. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From holland at ebi.ac.uk Fri Jul 13 03:34:01 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 13 Jul 2007 08:34:01 +0100 Subject: [Biojava-l] ACE parser In-Reply-To: References: Message-ID: <46972AE9.7000205@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Jon! There is still no ACE parser in BioJava that I know about, so a new parser would be most welcome. Thanks for volunteering! The way we write parsers is to split the task into various stages: xxx : some BioJava object that can represent all the data in the file (e.g. Sequence, or ABIChromatogram). xxxFormat : actually reads the file, accepts an xxxListener as a parameter whilst doing so and signals events to that listener as it processes various parts of the file. Also has a method for writing a new file based on some existing xxx object. The xxxFormat input parts always work from InputStreams, with convenience methods that accept Files (or sometimes even URLs) and delegate to the main InputStream methods. Same goes for the output parts - OutputStream by default, with appropriate File/URL/etc. convenience methods. xxxListener : listens for 'events' - this is an interface (e.g. startNewSequence(), addSequenceChunk(), startFeature(), addLocation(), endSequence(), etc.). xxxBuilder : implements xxxListener and has an extra method to retrieve an xxx object containing all the data it has received so far (for instance, the builders that listen for events from sequence files build Sequence objects). The idea is that the xxxBuilder object will build a complete object with as much relevant data from the file as possible, but if you don't want that much information you can pass in your own xxxListener implementation to xxxParser which only listens to events representing bits of the file it is interested in. There is usually a default xxxListener implementation for every xxxListener interface with empty methods that ignore everything, which xxxBuilder or your own custom implementation then extends, overriding the methods which supply the data that it wants. cheers, Richard Warren, Jonathan wrote: > Hi > > I've seen posts related to people writing an ace file format parser > (contig assembly output type > http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I > believe there is not one available in biojava? > > I am thinking of writing one and contributing it to biojava. > > Thinking about the design of it - has anyone got any advice or pointers? > If I want to hide the data and mechanics from users I don't want to give > access to all the data it gathers - but not knowing how people are going > to use it implies that maybe I should give a lot of access to the data?? > > > Cheers > > Jonathan. > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlyrp4C5LeMEKA/QRArAsAKCZIOPFSpXv5a8WqtY3zE5caJpk4gCfSBLC AW3L7kAWOFmEQ3zRN467qhA= =qX7u -----END PGP SIGNATURE----- From ilangocal at yahoo.com Tue Jul 17 23:09:48 2007 From: ilangocal at yahoo.com (ilango) Date: Tue, 17 Jul 2007 20:09:48 -0700 (PDT) Subject: [Biojava-l] newbie with just a Computer Science Background Message-ID: <727899.49968.qm@web56103.mail.re3.yahoo.com> Hi I have a Master Degree in Computer Science. However I would like to develop in BioJava. I am wondering if I can do this, with my lack of a degree in Biology or the Life Sciences. Is it possible to contribute to the development of BioJava and if so, in what way. thanks very much ilango --------------------------------- Got a little couch potato? Check out fun summer activities for kids. From markjschreiber at gmail.com Wed Jul 18 01:07:36 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 18 Jul 2007 13:07:36 +0800 Subject: [Biojava-l] newbie with just a Computer Science Background In-Reply-To: <727899.49968.qm@web56103.mail.re3.yahoo.com> References: <727899.49968.qm@web56103.mail.re3.yahoo.com> Message-ID: <93b45ca50707172207odb8afbl48ce54df9b70883a@mail.gmail.com> Hi - If you have no background in biology there will be some limitations but you may be interested in looking at things like HMMs in the DP package. It would also be interesting for someone to do some profilling of the code base to find examples of poor code etc. We always need more unit tests as well! - Mark On 7/18/07, ilango wrote: > > Hi > I have a Master Degree in Computer Science. However I would like to > develop in BioJava. I am wondering if I can do this, with my lack of a > degree in Biology or the Life Sciences. > > Is it possible to contribute to the development of BioJava and if so, in > what way. > > thanks very much > ilango > > > > > > --------------------------------- > Got a little couch potato? > Check out fun summer activities for kids. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Wed Jul 18 06:19:42 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 18 Jul 2007 18:19:42 +0800 Subject: [Biojava-l] BOSC 2007 Biojava Presentation Message-ID: <93b45ca50707180319y1a5fcdc1r170c22ae1fcb8a8d@mail.gmail.com> Hi - If you couldn't make it to BOSC 2007 this year then you can get a copy of Richard's BioJava talk from the current events tab of www.biojava.org or here http://www.biojava.org/download/files/bosc2007.pdf - Mark From dmitry.repchevski at bsc.es Mon Jul 23 06:51:35 2007 From: dmitry.repchevski at bsc.es (Dmitry Repchevsky) Date: Mon, 23 Jul 2007 12:51:35 +0200 Subject: [Biojava-l] Blast XML + XSL = HTML Message-ID: <46A48837.5070405@bsc.es> Hello! I used biojava Blast2HTMLHandler, but found it unflexible and slow (?). Finally I made an xsl stylesheet to convert blast output into html
element. Also I have a class BlastXML2HTML to make the transform , it's pretty simple. May I contribute it? Best regards, Dmitry From holland at ebi.ac.uk Mon Jul 23 07:40:29 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 23 Jul 2007 12:40:29 +0100 Subject: [Biojava-l] Blast XML + XSL = HTML In-Reply-To: <46A48837.5070405@bsc.es> References: <46A48837.5070405@bsc.es> Message-ID: <46A493AD.3000107@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Dmitry. Thanks for your efforts and offer to contribute your code to the BioJava project. You say that the existing Blast2HTMLHanlder handler is inflexible, which is true enough. However I don't see how substituting the handler for an XSL stylesheet would make it any more flexible - the user would still have to live with an HTML format specified by the designer of the stylesheet unless the wrapping code that calls the stylesheet could somehow dynamically modify the XML based on method calls. I'm also unsure as to whether the whole transformation is appropriate for BioJava (the Blast2HTMLHandler itself is on borderline territory - saved only by the fact that it creates the reports based on SAX events that can potentially come from non-BlastXML sources). BioJava is a Java toolkit, and the transformation from XML to HTML via an XSL stylesheet doesn't require Java at all. Mark - if you're reading this - guidance, please? cheers, Richard Dmitry Repchevsky wrote: > Hello! > > I used biojava Blast2HTMLHandler, but found it unflexible and slow (?). > Finally I made an xsl stylesheet to convert blast output into html >
element. > Also I have a class BlastXML2HTML to make the transform , it's pretty > simple. > May I contribute it? > > Best regards, > > Dmitry > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGpJOs4C5LeMEKA/QRApF6AJ9kibj7mJ44W2/fTw/cYPHOx/O74gCfT3Zn b90G56jji+Ro32fq/kuxbJA= =X95V -----END PGP SIGNATURE----- From markjschreiber at gmail.com Mon Jul 23 07:45:04 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 23 Jul 2007 19:45:04 +0800 Subject: [Biojava-l] Blast XML + XSL = HTML In-Reply-To: <46A493AD.3000107@ebi.ac.uk> References: <46A48837.5070405@bsc.es> <46A493AD.3000107@ebi.ac.uk> Message-ID: <93b45ca50707230445p12a059f1n9aedf2ab7b887df@mail.gmail.com> Hi Richard / Dmitry A good home for this might be the biojava cookbook on the biojava wiki ( www.biojava.org). Although it isn't strictly biojava people may find it a useful example. - Mark On 7/23/07, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Dmitry. > > Thanks for your efforts and offer to contribute your code to the BioJava > project. > > You say that the existing Blast2HTMLHanlder handler is inflexible, which > is true enough. However I don't see how substituting the handler for an > XSL stylesheet would make it any more flexible - the user would still > have to live with an HTML format specified by the designer of the > stylesheet unless the wrapping code that calls the stylesheet could > somehow dynamically modify the XML based on method calls. > > I'm also unsure as to whether the whole transformation is appropriate > for BioJava (the Blast2HTMLHandler itself is on borderline territory - > saved only by the fact that it creates the reports based on SAX events > that can potentially come from non-BlastXML sources). BioJava is a Java > toolkit, and the transformation from XML to HTML via an XSL stylesheet > doesn't require Java at all. > > Mark - if you're reading this - guidance, please? > > cheers, > Richard > > Dmitry Repchevsky wrote: > > Hello! > > > > I used biojava Blast2HTMLHandler, but found it unflexible and slow (?). > > Finally I made an xsl stylesheet to convert blast output into html > >
element. > > Also I have a class BlastXML2HTML to make the transform , it's pretty > > simple. > > May I contribute it? > > > > Best regards, > > > > Dmitry > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGpJOs4C5LeMEKA/QRApF6AJ9kibj7mJ44W2/fTw/cYPHOx/O74gCfT3Zn > b90G56jji+Ro32fq/kuxbJA= > =X95V > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From dms700 at gmail.com Wed Jul 25 16:10:11 2007 From: dms700 at gmail.com (dmitriy) Date: Wed, 25 Jul 2007 16:10:11 -0400 Subject: [Biojava-l] Extracting 3' UTR, 5' UTR, exons, introns, CD sequence structure for NCBI NM RefSeq Message-ID: <299614de0707251310p4d78c9d1p1360ebb3ee421ba6@mail.gmail.com> Hi Does anyone has the code which takes NCBI NM RefSeq number and NCBI NC RefSeq number gets NC RefSeq from NCBI and parses it in such way so for specified NM RefSeq "gene table" object is build. "Gene table" object should have information on 3' UTR, 5' UTR, exons, introns and CD sequence. The data in "gene table" should be sufficient for example to generate sequence string with 3' UTR, 5' UTR, introns, non coding exon(s) or part(s) of exon(s) in small letters and coding exon(s) or part(s) of exon(s) in capital letters. Thanks Dmitriy