From zt_2003 at 163.com  Mon Jul  2 01:35:08 2007
From: zt_2003 at 163.com (zt_2003)
Date: Mon, 2 Jul 2007 13:35:08 +0800 (CST)
Subject: [Biojava-l] Where can I find the demo of using svm in biojava?
Message-ID: <16701458.2297451183354508767.JavaMail.coremail@bj163app62.163.com>

Who can tell me, where can I find the demo of using svm in biojava? And will biojava support artificial network or bayesian network in future?  

From kavita_mbi at yahoo.com  Wed Jul  4 00:46:03 2007
From: kavita_mbi at yahoo.com (Kavita Agarwal)
Date: Tue, 3 Jul 2007 21:46:03 -0700 (PDT)
Subject: [Biojava-l] Fwd: biojava error
Message-ID: <520964.87338.qm@web39713.mail.mud.yahoo.com>


  Hi,
   
     Iam using biojava in an applet and I get the error :-
   
   Error: Unable to initialise DNATools
   
      but the biojava code runs fine when I use it in an application.
   
        I am running my applat in the appletviewer.  
   
   
    Can anyone tell me how should I exactly set my classpath for biojava and java files. I have these folders-
   
  jdk1.5.0 located  at C:\Program files\Java
  jre1.5.0 at the same location
   biojava -all 6 jar files at C:\Program files\biojava
   

---------------------------------
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.

From kavita_mbi at yahoo.com  Wed Jul  4 00:46:10 2007
From: kavita_mbi at yahoo.com (Kavita Agarwal)
Date: Tue, 3 Jul 2007 21:46:10 -0700 (PDT)
Subject: [Biojava-l] Fwd: biojava error
Message-ID: <823658.22799.qm@web39712.mail.mud.yahoo.com>


  Hi,
   
     Iam using biojava in an applet and I get the error :-
   
   Error: Unable to initialise DNATools
   
      but the biojava code runs fine when I use it in an application.
   
        I am running my applat in the appletviewer.  
   
   
    Can anyone tell me how should I exactly set my classpath for biojava and java files. I have these folders-
   
  jdk1.5.0 located  at C:\Program files\Java
  jre1.5.0 at the same location
   biojava -all 6 jar files at C:\Program files\biojava
   

---------------------------------
Pinpoint customers who are looking for what you sell. 

From holland at ebi.ac.uk  Wed Jul  4 04:06:19 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 04 Jul 2007 09:06:19 +0100
Subject: [Biojava-l] Request for help!
Message-ID: <468B54FB.3090606@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi guys.

I need help with a programming question!

In Java, you can find out the line-end symbol that the JRE is using by
calling:

   System.getProperty("line.separator");

On *nix this returns "\n", for instance.

Our file parsers all rely on this to return the symbol to break lines at
when parsing files. This usually works fine.

BUT... on Windows machines, for certain files, it does not appear to
work! I suspect that these text files were generated on a *nix machine
then transferred by copying files across file systems using native copy
commands, or using binary FTP so that the system retained the *nix
line-end symbols instead of replacing them for the local line-end
symbols as it would have done if they were transferred in text mode via
FTP.

I don't have access to a Windows machine I can test on, but I suspect
that the fix is quite a simple one and boils down to replacing the
System() call with something more intelligent.

Is there any regex or similar thing we can use to spot _all_ kinds of
line-end symbols in text files regardless of the platform the file was
created on or the platform the parser is being run on?

(For information, the only two users who have reported problems like
this are both using Nexus files - I'm not sure what tool generated them
though. The Nexus parser uses the same rules as all the other parsers in
BioJava so I don't think there's anything specifically wrong with it as
opposed to say the GenBank or FASTA parsers.)

cheers,
Richard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
3ppr3WRdJcQgzIAJdUoIX0U=
=Cboa
-----END PGP SIGNATURE-----

From hlapp at gmx.net  Wed Jul  4 08:55:28 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 4 Jul 2007 08:55:28 -0400
Subject: [Biojava-l] Request for help!
In-Reply-To: <468B54FB.3090606@ebi.ac.uk>
References: <468B54FB.3090606@ebi.ac.uk>
Message-ID: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>

In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g  
though I'm not sure this wouldn't incur too much overhead in Java.

You can certainly detect the eol character(s) by line.indexOf('\r');  
if found and the preceding character is '\n' you have DOS/Win-style  
line endings, and otherwise if found it is Mac-style.

However, this all seems like a lot of trouble to go through if all  
that one would need to ask of people is to make sure that the file  
matches the native eol style of the platform, which is really trivial  
to achieve.

For example, to convert Win-style line endings to  Unix:

	$ perl -pi -e 's/\r//g;' <your-files-here>

and from Mac to Unix:

	$ perl -pi -e 's/\r/\n/g;' <your-files-here>

I have these and other simple conversions defined as aliases in  
my .profile, and don't really ever worry about writing lots of code  
to accommodate arbitrary line endings :-)

-hilmar

On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi guys.
>
> I need help with a programming question!
>
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
>
>    System.getProperty("line.separator");
>
> On *nix this returns "\n", for instance.
>
> Our file parsers all rely on this to return the symbol to break  
> lines at
> when parsing files. This usually works fine.
>
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native  
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode  
> via
> FTP.
>
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
>
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
>
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated  
> them
> though. The Nexus parser uses the same rules as all the other  
> parsers in
> BioJava so I don't think there's anything specifically wrong with  
> it as
> opposed to say the GenBank or FASTA parsers.)
>
> cheers,
> Richard
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
> 3ppr3WRdJcQgzIAJdUoIX0U=
> =Cboa
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From markjschreiber at gmail.com  Wed Jul  4 10:10:12 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 4 Jul 2007 22:10:12 +0800
Subject: [Biojava-l] Request for help!
In-Reply-To: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
References: <468B54FB.3090606@ebi.ac.uk>
	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
Message-ID: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>

BufferedWriter provides a newLine() method that writes a line
separator but I'm not sure if that gives you a different result or
not.

This may be a JVM bug that needs to be submitted to Sun.

As a very ugly work around it is possible to determine the OS from the
System object as well.

- Mark

On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
> though I'm not sure this wouldn't incur too much overhead in Java.
>
> You can certainly detect the eol character(s) by line.indexOf('\r');
> if found and the preceding character is '\n' you have DOS/Win-style
> line endings, and otherwise if found it is Mac-style.
>
> However, this all seems like a lot of trouble to go through if all
> that one would need to ask of people is to make sure that the file
> matches the native eol style of the platform, which is really trivial
> to achieve.
>
> For example, to convert Win-style line endings to  Unix:
>
>         $ perl -pi -e 's/\r//g;' <your-files-here>
>
> and from Mac to Unix:
>
>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>
> I have these and other simple conversions defined as aliases in
> my .profile, and don't really ever worry about writing lots of code
> to accommodate arbitrary line endings :-)
>
> -hilmar
>
> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi guys.
> >
> > I need help with a programming question!
> >
> > In Java, you can find out the line-end symbol that the JRE is using by
> > calling:
> >
> >    System.getProperty("line.separator");
> >
> > On *nix this returns "\n", for instance.
> >
> > Our file parsers all rely on this to return the symbol to break
> > lines at
> > when parsing files. This usually works fine.
> >
> > BUT... on Windows machines, for certain files, it does not appear to
> > work! I suspect that these text files were generated on a *nix machine
> > then transferred by copying files across file systems using native
> > copy
> > commands, or using binary FTP so that the system retained the *nix
> > line-end symbols instead of replacing them for the local line-end
> > symbols as it would have done if they were transferred in text mode
> > via
> > FTP.
> >
> > I don't have access to a Windows machine I can test on, but I suspect
> > that the fix is quite a simple one and boils down to replacing the
> > System() call with something more intelligent.
> >
> > Is there any regex or similar thing we can use to spot _all_ kinds of
> > line-end symbols in text files regardless of the platform the file was
> > created on or the platform the parser is being run on?
> >
> > (For information, the only two users who have reported problems like
> > this are both using Nexus files - I'm not sure what tool generated
> > them
> > though. The Nexus parser uses the same rules as all the other
> > parsers in
> > BioJava so I don't think there's anything specifically wrong with
> > it as
> > opposed to say the GenBank or FASTA parsers.)
> >
> > cheers,
> > Richard
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
> > 3ppr3WRdJcQgzIAJdUoIX0U=
> > =Cboa
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From ayates at ebi.ac.uk  Wed Jul  4 10:33:28 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 04 Jul 2007 15:33:28 +0100
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
References: <468B54FB.3090606@ebi.ac.uk>	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
Message-ID: <468BAFB8.708@ebi.ac.uk>

BufferedWriter will always use the value of 
System.getProperty("line.separator") however BufferedReader knows that 
an end of line can be \r\n, \r or \n so in Java land is perfectly legal 
to have any common line terminator & still write files in an OS specific 
manner.

I sent a regex to Rich which he improved on but the net result is the 
extraction of the EOL regardless of which one it is.

I'm not 100% sure on where the problem lies. So long as the parsers use 
BufferedReader for it's text file reading (which they all seem to do) 
this shouldn't have been a problem. In fact this is the line from the 
BufferedReader.readLine() in the JDK:

"Read a line of text. A line is considered to be terminated by any one 
of a line feed ('\n'), a carriage return ('\r'), or a carriage return 
followed immediately by a linefeed."

Very very strange but the regex sounds like it was a pragmatic solution

Andy

Mark Schreiber wrote:
> BufferedWriter provides a newLine() method that writes a line
> separator but I'm not sure if that gives you a different result or
> not.
> 
> This may be a JVM bug that needs to be submitted to Sun.
> 
> As a very ugly work around it is possible to determine the OS from the
> System object as well.
> 
> - Mark
> 
> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>> though I'm not sure this wouldn't incur too much overhead in Java.
>>
>> You can certainly detect the eol character(s) by line.indexOf('\r');
>> if found and the preceding character is '\n' you have DOS/Win-style
>> line endings, and otherwise if found it is Mac-style.
>>
>> However, this all seems like a lot of trouble to go through if all
>> that one would need to ask of people is to make sure that the file
>> matches the native eol style of the platform, which is really trivial
>> to achieve.
>>
>> For example, to convert Win-style line endings to  Unix:
>>
>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>
>> and from Mac to Unix:
>>
>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>
>> I have these and other simple conversions defined as aliases in
>> my .profile, and don't really ever worry about writing lots of code
>> to accommodate arbitrary line endings :-)
>>
>> -hilmar
>>
>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi guys.
>>>
>>> I need help with a programming question!
>>>
>>> In Java, you can find out the line-end symbol that the JRE is using by
>>> calling:
>>>
>>>    System.getProperty("line.separator");
>>>
>>> On *nix this returns "\n", for instance.
>>>
>>> Our file parsers all rely on this to return the symbol to break
>>> lines at
>>> when parsing files. This usually works fine.
>>>
>>> BUT... on Windows machines, for certain files, it does not appear to
>>> work! I suspect that these text files were generated on a *nix machine
>>> then transferred by copying files across file systems using native
>>> copy
>>> commands, or using binary FTP so that the system retained the *nix
>>> line-end symbols instead of replacing them for the local line-end
>>> symbols as it would have done if they were transferred in text mode
>>> via
>>> FTP.
>>>
>>> I don't have access to a Windows machine I can test on, but I suspect
>>> that the fix is quite a simple one and boils down to replacing the
>>> System() call with something more intelligent.
>>>
>>> Is there any regex or similar thing we can use to spot _all_ kinds of
>>> line-end symbols in text files regardless of the platform the file was
>>> created on or the platform the parser is being run on?
>>>
>>> (For information, the only two users who have reported problems like
>>> this are both using Nexus files - I'm not sure what tool generated
>>> them
>>> though. The Nexus parser uses the same rules as all the other
>>> parsers in
>>> BioJava so I don't think there's anything specifically wrong with
>>> it as
>>> opposed to say the GenBank or FASTA parsers.)
>>>
>>> cheers,
>>> Richard
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
>>> 3ppr3WRdJcQgzIAJdUoIX0U=
>>> =Cboa
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

From holland at ebi.ac.uk  Wed Jul  4 11:04:41 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 04 Jul 2007 16:04:41 +0100
Subject: [Biojava-l] Request for help!
In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
References: <468B54FB.3090606@ebi.ac.uk>	
	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
Message-ID: <468BB709.4010704@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks everyone for your replies. Turns out a regex of the various
combinations of \r and \n is the best way.

cheers,
Richard

Mark Schreiber wrote:
> BufferedWriter provides a newLine() method that writes a line
> separator but I'm not sure if that gives you a different result or
> not.
> 
> This may be a JVM bug that needs to be submitted to Sun.
> 
> As a very ugly work around it is possible to determine the OS from the
> System object as well.
> 
> - Mark
> 
> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>> though I'm not sure this wouldn't incur too much overhead in Java.
>>
>> You can certainly detect the eol character(s) by line.indexOf('\r');
>> if found and the preceding character is '\n' you have DOS/Win-style
>> line endings, and otherwise if found it is Mac-style.
>>
>> However, this all seems like a lot of trouble to go through if all
>> that one would need to ask of people is to make sure that the file
>> matches the native eol style of the platform, which is really trivial
>> to achieve.
>>
>> For example, to convert Win-style line endings to  Unix:
>>
>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>
>> and from Mac to Unix:
>>
>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>
>> I have these and other simple conversions defined as aliases in
>> my .profile, and don't really ever worry about writing lots of code
>> to accommodate arbitrary line endings :-)
>>
>> -hilmar
>>
>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>
> Hi guys.
> 
> I need help with a programming question!
> 
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
> 
>    System.getProperty("line.separator");
> 
> On *nix this returns "\n", for instance.
> 
> Our file parsers all rely on this to return the symbol to break
> lines at
> when parsing files. This usually works fine.
> 
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode
> via
> FTP.
> 
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
> 
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
> 
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated
> them
> though. The Nexus parser uses the same rules as all the other
> parsers in
> BioJava so I don't think there's anything specifically wrong with
> it as
> opposed to say the GenBank or FASTA parsers.)
> 
> cheers,
> Richard
> 
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi7cJ4C5LeMEKA/QRAumDAKCJ5yc8PoZ+sLhcBOkL2Jdp/unW+gCfZrxG
AoVCPngmYX3b/pxfiGJbzic=
=2cyA
-----END PGP SIGNATURE-----

From holland at ebi.ac.uk  Wed Jul  4 11:06:32 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 04 Jul 2007 16:06:32 +0100
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <468BAFB8.708@ebi.ac.uk>
References: <468B54FB.3090606@ebi.ac.uk>	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
	<468BAFB8.708@ebi.ac.uk>
Message-ID: <468BB778.2050704@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The problem was that I was using the newline in a tokenizer, which
needed to return and regcognize the newline symbols themselves (the
Nexus format is new-line sensitive). Hence I had to deal with files that
may not have the system new-line operator.

cheers,
Richard

Andy Yates wrote:
> BufferedWriter will always use the value of 
> System.getProperty("line.separator") however BufferedReader knows that 
> an end of line can be \r\n, \r or \n so in Java land is perfectly legal 
> to have any common line terminator & still write files in an OS specific 
> manner.
> 
> I sent a regex to Rich which he improved on but the net result is the 
> extraction of the EOL regardless of which one it is.
> 
> I'm not 100% sure on where the problem lies. So long as the parsers use 
> BufferedReader for it's text file reading (which they all seem to do) 
> this shouldn't have been a problem. In fact this is the line from the 
> BufferedReader.readLine() in the JDK:
> 
> "Read a line of text. A line is considered to be terminated by any one 
> of a line feed ('\n'), a carriage return ('\r'), or a carriage return 
> followed immediately by a linefeed."
> 
> Very very strange but the regex sounds like it was a pragmatic solution
> 
> Andy
> 
> Mark Schreiber wrote:
>> BufferedWriter provides a newLine() method that writes a line
>> separator but I'm not sure if that gives you a different result or
>> not.
>>
>> This may be a JVM bug that needs to be submitted to Sun.
>>
>> As a very ugly work around it is possible to determine the OS from the
>> System object as well.
>>
>> - Mark
>>
>> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>>> though I'm not sure this wouldn't incur too much overhead in Java.
>>>
>>> You can certainly detect the eol character(s) by line.indexOf('\r');
>>> if found and the preceding character is '\n' you have DOS/Win-style
>>> line endings, and otherwise if found it is Mac-style.
>>>
>>> However, this all seems like a lot of trouble to go through if all
>>> that one would need to ask of people is to make sure that the file
>>> matches the native eol style of the platform, which is really trivial
>>> to achieve.
>>>
>>> For example, to convert Win-style line endings to  Unix:
>>>
>>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>>
>>> and from Mac to Unix:
>>>
>>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>>
>>> I have these and other simple conversions defined as aliases in
>>> my .profile, and don't really ever worry about writing lots of code
>>> to accommodate arbitrary line endings :-)
>>>
>>> -hilmar
>>>
>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>>
> Hi guys.
> 
> I need help with a programming question!
> 
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
> 
>    System.getProperty("line.separator");
> 
> On *nix this returns "\n", for instance.
> 
> Our file parsers all rely on this to return the symbol to break
> lines at
> when parsing files. This usually works fine.
> 
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode
> via
> FTP.
> 
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
> 
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
> 
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated
> them
> though. The Nexus parser uses the same rules as all the other
> parsers in
> BioJava so I don't think there's anything specifically wrong with
> it as
> opposed to say the GenBank or FASTA parsers.)
> 
> cheers,
> Richard
> 
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp
ctRVPlj5VA0oDzMsoxP4Ohs=
=6wg0
-----END PGP SIGNATURE-----

From markjschreiber at gmail.com  Wed Jul  4 21:29:35 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 5 Jul 2007 09:29:35 +0800
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <468BB778.2050704@ebi.ac.uk>
References: <468B54FB.3090606@ebi.ac.uk>
	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
	<468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk>
Message-ID: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com>

Slightly related to this ...

It might be worth making a quick check of the biojava code base to see
how often a "\n" appears in the source code.

- Mark

On 7/4/07, Richard Holland <holland at ebi.ac.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> The problem was that I was using the newline in a tokenizer, which
> needed to return and regcognize the newline symbols themselves (the
> Nexus format is new-line sensitive). Hence I had to deal with files that
> may not have the system new-line operator.
>
> cheers,
> Richard
>
> Andy Yates wrote:
> > BufferedWriter will always use the value of
> > System.getProperty("line.separator") however BufferedReader knows that
> > an end of line can be \r\n, \r or \n so in Java land is perfectly legal
> > to have any common line terminator & still write files in an OS specific
> > manner.
> >
> > I sent a regex to Rich which he improved on but the net result is the
> > extraction of the EOL regardless of which one it is.
> >
> > I'm not 100% sure on where the problem lies. So long as the parsers use
> > BufferedReader for it's text file reading (which they all seem to do)
> > this shouldn't have been a problem. In fact this is the line from the
> > BufferedReader.readLine() in the JDK:
> >
> > "Read a line of text. A line is considered to be terminated by any one
> > of a line feed ('\n'), a carriage return ('\r'), or a carriage return
> > followed immediately by a linefeed."
> >
> > Very very strange but the regex sounds like it was a pragmatic solution
> >
> > Andy
> >
> > Mark Schreiber wrote:
> >> BufferedWriter provides a newLine() method that writes a line
> >> separator but I'm not sure if that gives you a different result or
> >> not.
> >>
> >> This may be a JVM bug that needs to be submitted to Sun.
> >>
> >> As a very ugly work around it is possible to determine the OS from the
> >> System object as well.
> >>
> >> - Mark
> >>
> >> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
> >>> though I'm not sure this wouldn't incur too much overhead in Java.
> >>>
> >>> You can certainly detect the eol character(s) by line.indexOf('\r');
> >>> if found and the preceding character is '\n' you have DOS/Win-style
> >>> line endings, and otherwise if found it is Mac-style.
> >>>
> >>> However, this all seems like a lot of trouble to go through if all
> >>> that one would need to ask of people is to make sure that the file
> >>> matches the native eol style of the platform, which is really trivial
> >>> to achieve.
> >>>
> >>> For example, to convert Win-style line endings to  Unix:
> >>>
> >>>         $ perl -pi -e 's/\r//g;' <your-files-here>
> >>>
> >>> and from Mac to Unix:
> >>>
> >>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
> >>>
> >>> I have these and other simple conversions defined as aliases in
> >>> my .profile, and don't really ever worry about writing lots of code
> >>> to accommodate arbitrary line endings :-)
> >>>
> >>> -hilmar
> >>>
> >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
> >>>
> > Hi guys.
> >
> > I need help with a programming question!
> >
> > In Java, you can find out the line-end symbol that the JRE is using by
> > calling:
> >
> >    System.getProperty("line.separator");
> >
> > On *nix this returns "\n", for instance.
> >
> > Our file parsers all rely on this to return the symbol to break
> > lines at
> > when parsing files. This usually works fine.
> >
> > BUT... on Windows machines, for certain files, it does not appear to
> > work! I suspect that these text files were generated on a *nix machine
> > then transferred by copying files across file systems using native
> > copy
> > commands, or using binary FTP so that the system retained the *nix
> > line-end symbols instead of replacing them for the local line-end
> > symbols as it would have done if they were transferred in text mode
> > via
> > FTP.
> >
> > I don't have access to a Windows machine I can test on, but I suspect
> > that the fix is quite a simple one and boils down to replacing the
> > System() call with something more intelligent.
> >
> > Is there any regex or similar thing we can use to spot _all_ kinds of
> > line-end symbols in text files regardless of the platform the file was
> > created on or the platform the parser is being run on?
> >
> > (For information, the only two users who have reported problems like
> > this are both using Nexus files - I'm not sure what tool generated
> > them
> > though. The Nexus parser uses the same rules as all the other
> > parsers in
> > BioJava so I don't think there's anything specifically wrong with
> > it as
> > opposed to say the GenBank or FASTA parsers.)
> >
> > cheers,
> > Richard
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>> --
> >>> ===========================================================
> >>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>> ===========================================================
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp
> ctRVPlj5VA0oDzMsoxP4Ohs=
> =6wg0
> -----END PGP SIGNATURE-----
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From holland at ebi.ac.uk  Thu Jul  5 03:40:14 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Thu, 05 Jul 2007 08:40:14 +0100
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com>
References: <468B54FB.3090606@ebi.ac.uk>	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>	<468BAFB8.708@ebi.ac.uk>
	<468BB778.2050704@ebi.ac.uk>
	<93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com>
Message-ID: <468CA05E.6070308@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"\n" is used 262 times in 76 different locations:

src/org/biojava/bio/alignment/NeedlemanWunsch.java
src/org/biojava/bio/alignment/SequenceAlignment.java
src/org/biojava/bio/alignment/SmithWaterman.java
src/org/biojava/bio/alignment/SubstitutionMatrix.java
src/org/biojava/bio/chromatogram/graphic/ChromatogramGraphic.java
src/org/biojava/bio/dist/AbstractDistribution.java
src/org/biojava/bio/dp/onehead/SingleDP.java
src/org/biojava/bio/dp/twohead/DPInterpreter.java
src/org/biojava/bio/dp/XmlMarkovModel.java
src/org/biojava/bio/gui/sequence/ImageMap.java
src/org/biojava/bio/program/abi/ABIFParser.java
src/org/biojava/bio/program/blast2html/AbstractAlignmentStyler.java
src/org/biojava/bio/program/blast2html/HTMLRenderer.java
src/org/biojava/bio/program/das/dasalignment/Alignment.java
src/org/biojava/bio/program/das/FeatureRequestManager.java
src/org/biojava/bio/program/sax/BlastLikeAlignmentSAXParser.java
src/org/biojava/bio/program/sax/ClustalWAlignmentSAXParser.java
src/org/biojava/bio/program/sax/FastaSequenceSAXParser.java
src/org/biojava/bio/program/sax/NeedleAlignmentSAXParser.java
src/org/biojava/bio/search/KnuthMorrisPrattSearch.java
src/org/biojava/bio/seq/db/BioIndex.java
src/org/biojava/bio/seq/db/GenbankSequenceDB.java
src/org/biojava/bio/seq/db/TabIndexStore.java
src/org/biojava/bio/seq/io/agave/AGAVEBioSeqHandler.java
src/org/biojava/bio/seq/io/agave/AGAVEContigHandler.java
src/org/biojava/bio/seq/io/agave/AGAVEDbId.java
src/org/biojava/bio/seq/io/agave/AGAVEKeywordPropHandler.java
src/org/biojava/bio/seq/io/agave/AGAVEMapLocation.java
src/org/biojava/bio/seq/io/agave/AGAVEMapPosition.java
src/org/biojava/bio/seq/io/agave/AGAVEMatchRegion.java
src/org/biojava/bio/seq/io/agave/AGAVEProperty.java
src/org/biojava/bio/seq/io/agave/AGAVEQueryRegion.java
src/org/biojava/bio/seq/io/agave/AGAVERelatedAnnot.java
src/org/biojava/bio/seq/io/agave/AGAVESeqPropHandler.java
src/org/biojava/bio/seq/io/agave/AgaveWriter.java
src/org/biojava/bio/seq/io/agave/AGAVEXref.java
src/org/biojava/bio/seq/io/agave/AGAVEXrefs.java
src/org/biojava/bio/seq/io/agave/Embl2AgaveAnnotFilter.java
src/org/biojava/bio/seq/io/FastaFormat.java
src/org/biojava/bio/seq/io/GenbankFileFormer.java
src/org/biojava/bio/seq/io/ParseException.java
src/org/biojava/bio/structure/align/pairwise/AlternativeAlignment.java
src/org/biojava/bio/structure/ChainImpl.java
src/org/biojava/bio/structure/io/FileConvert.java
src/org/biojava/bio/structure/StructureImpl.java
src/org/biojava/bio/symbol/AbstractSimpleBasisSymbol.java
src/org/biojava/bio/symbol/AlphabetManager.java
src/org/biojava/bio/symbol/DoubleAlphabet.java
src/org/biojava/bio/symbol/IntegerAlphabet.java
src/org/biojava/bio/symbol/SimpleAlignment.java
src/org/biojava/stats/svm/tools/TrainRegression.java
src/org/biojava/utils/automata/DfaBuilder.java
src/org/biojava/utils/automata/FiniteAutomaton.java
src/org/biojava/utils/automata/PatternMaker.java
src/org/biojava/utils/candy/CandyEntry.java
src/org/biojava/utils/ChangeSupport.java
src/org/biojava/utils/ExecRunner.java
src/org/biojava/utils/io/CountedBufferedReader.java
src/org/biojava/utils/ParserException.java
src/org/biojava/utils/StaticMemberPlaceHolder.java
src/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.java
src/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.java
src/org/biojavax/bio/phylo/io/nexus/CharactersBlockParser.java
src/org/biojavax/bio/phylo/io/nexus/DistancesBlockParser.java
src/org/biojavax/bio/phylo/io/nexus/NexusFileFormat.java
src/org/biojavax/bio/phylo/MultipleHitCorrection.java
src/org/biojavax/bio/seq/io/DebuggingRichSeqIOListener.java
src/org/biojavax/bio/seq/io/EMBLFormat.java
src/org/biojavax/bio/seq/io/FastaFormat.java
src/org/biojavax/bio/seq/io/GenbankFormat.java
src/org/biojavax/bio/seq/io/UniProtCommentParser.java
src/org/biojavax/bio/seq/io/UniProtFormat.java
src/org/biojavax/bio/taxa/SimpleNCBITaxonName.java
src/org/biojavax/utils/StringTools.java
src/org/biojavax/utils/XMLTools.java

Not all of these are 'bad' newlines - but still, it's a lot to search
through. I've put it on my list of to-do things for when I'm bored.

cheers,
Richard


Mark Schreiber wrote:
> Slightly related to this ...
> 
> It might be worth making a quick check of the biojava code base to see
> how often a "\n" appears in the source code.
> 
> - Mark
> 
> On 7/4/07, Richard Holland <holland at ebi.ac.uk> wrote:
> The problem was that I was using the newline in a tokenizer, which
> needed to return and regcognize the newline symbols themselves (the
> Nexus format is new-line sensitive). Hence I had to deal with files that
> may not have the system new-line operator.
> 
> cheers,
> Richard
> 
> Andy Yates wrote:
>>>> BufferedWriter will always use the value of
>>>> System.getProperty("line.separator") however BufferedReader knows that
>>>> an end of line can be \r\n, \r or \n so in Java land is perfectly legal
>>>> to have any common line terminator & still write files in an OS specific
>>>> manner.
>>>>
>>>> I sent a regex to Rich which he improved on but the net result is the
>>>> extraction of the EOL regardless of which one it is.
>>>>
>>>> I'm not 100% sure on where the problem lies. So long as the parsers use
>>>> BufferedReader for it's text file reading (which they all seem to do)
>>>> this shouldn't have been a problem. In fact this is the line from the
>>>> BufferedReader.readLine() in the JDK:
>>>>
>>>> "Read a line of text. A line is considered to be terminated by any one
>>>> of a line feed ('\n'), a carriage return ('\r'), or a carriage return
>>>> followed immediately by a linefeed."
>>>>
>>>> Very very strange but the regex sounds like it was a pragmatic solution
>>>>
>>>> Andy
>>>>
>>>> Mark Schreiber wrote:
>>>>> BufferedWriter provides a newLine() method that writes a line
>>>>> separator but I'm not sure if that gives you a different result or
>>>>> not.
>>>>>
>>>>> This may be a JVM bug that needs to be submitted to Sun.
>>>>>
>>>>> As a very ugly work around it is possible to determine the OS from the
>>>>> System object as well.
>>>>>
>>>>> - Mark
>>>>>
>>>>> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>>>>>> though I'm not sure this wouldn't incur too much overhead in Java.
>>>>>>
>>>>>> You can certainly detect the eol character(s) by line.indexOf('\r');
>>>>>> if found and the preceding character is '\n' you have DOS/Win-style
>>>>>> line endings, and otherwise if found it is Mac-style.
>>>>>>
>>>>>> However, this all seems like a lot of trouble to go through if all
>>>>>> that one would need to ask of people is to make sure that the file
>>>>>> matches the native eol style of the platform, which is really trivial
>>>>>> to achieve.
>>>>>>
>>>>>> For example, to convert Win-style line endings to  Unix:
>>>>>>
>>>>>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>>>>>
>>>>>> and from Mac to Unix:
>>>>>>
>>>>>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>>>>>
>>>>>> I have these and other simple conversions defined as aliases in
>>>>>> my .profile, and don't really ever worry about writing lots of code
>>>>>> to accommodate arbitrary line endings :-)
>>>>>>
>>>>>> -hilmar
>>>>>>
>>>>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>>>>>
>>>> Hi guys.
>>>>
>>>> I need help with a programming question!
>>>>
>>>> In Java, you can find out the line-end symbol that the JRE is using by
>>>> calling:
>>>>
>>>>    System.getProperty("line.separator");
>>>>
>>>> On *nix this returns "\n", for instance.
>>>>
>>>> Our file parsers all rely on this to return the symbol to break
>>>> lines at
>>>> when parsing files. This usually works fine.
>>>>
>>>> BUT... on Windows machines, for certain files, it does not appear to
>>>> work! I suspect that these text files were generated on a *nix machine
>>>> then transferred by copying files across file systems using native
>>>> copy
>>>> commands, or using binary FTP so that the system retained the *nix
>>>> line-end symbols instead of replacing them for the local line-end
>>>> symbols as it would have done if they were transferred in text mode
>>>> via
>>>> FTP.
>>>>
>>>> I don't have access to a Windows machine I can test on, but I suspect
>>>> that the fix is quite a simple one and boils down to replacing the
>>>> System() call with something more intelligent.
>>>>
>>>> Is there any regex or similar thing we can use to spot _all_ kinds of
>>>> line-end symbols in text files regardless of the platform the file was
>>>> created on or the platform the parser is being run on?
>>>>
>>>> (For information, the only two users who have reported problems like
>>>> this are both using Nexus files - I'm not sure what tool generated
>>>> them
>>>> though. The Nexus parser uses the same rules as all the other
>>>> parsers in
>>>> BioJava so I don't think there's anything specifically wrong with
>>>> it as
>>>> opposed to say the GenBank or FASTA parsers.)
>>>>
>>>> cheers,
>>>> Richard
>>>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjKBd4C5LeMEKA/QRAuARAJsGmSZpdOEuNyYDNn0Xn1rBA6KBjgCeLr8s
qkMnk1CwoMnqBT0RCwQjuSI=
=X9+G
-----END PGP SIGNATURE-----

From aulia at students.itb.ac.id  Mon Jul  9 03:08:39 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Mon, 9 Jul 2007 14:08:39 +0700 (WIT)
Subject: [Biojava-l] How to read and write a ProfileHMM into file
Message-ID: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>


How to read and write a ProfileHMM into file?
I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM
when I read the file.

-- 
Aulia Rahma Amin
ARC05/IF03
Y! ID : aulia_ra
Skype ID : aulia_ra
MSN ID : aulia_ra at hotmail.com
AIM ID : auliara
ICQ ID : aulia_ra
Homepage : http://www.aulia-ra.org


From holland at ebi.ac.uk  Mon Jul  9 03:46:02 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Mon, 09 Jul 2007 08:46:02 +0100
Subject: [Biojava-l] How to read and write a ProfileHMM into file
In-Reply-To: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
References: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
Message-ID: <4691E7BA.9030209@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Have you tried the classes in org.biojava.bio.program.hmmer ? There is a
parser in there which will read the output from HMMER.

cheers,
Richard

Aulia Rahma Amin wrote:
> How to read and write a ProfileHMM into file?
> I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM
> when I read the file.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkee64C5LeMEKA/QRAvevAKCVYFeUNByQwew6a900oj2MJjHnmACdHE8M
lSJgI+HuhRAjEngMlxI+JVo=
=Ft98
-----END PGP SIGNATURE-----

From markjschreiber at gmail.com  Mon Jul  9 11:59:49 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 9 Jul 2007 23:59:49 +0800
Subject: [Biojava-l] How to read and write a ProfileHMM into file
In-Reply-To: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
References: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707090859m13c7ff89wd942f838cf6bdbea@mail.gmail.com>

Hi -

The best possible solution would be to extend XMLMarkovModel so that
it can attempt to construct a ProfileHMM from an XML file.

- Mark

On 7/9/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>
> How to read and write a ProfileHMM into file?
> I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM
> when I read the file.
>
> --
> Aulia Rahma Amin
> ARC05/IF03
> Y! ID : aulia_ra
> Skype ID : aulia_ra
> MSN ID : aulia_ra at hotmail.com
> AIM ID : auliara
> ICQ ID : aulia_ra
> Homepage : http://www.aulia-ra.org
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From aulia at students.itb.ac.id  Tue Jul 10 02:05:22 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Tue, 10 Jul 2007 13:05:22 +0700 (WIT)
Subject: [Biojava-l] Problem with SearchProfile demo
Message-ID: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>

I have a problem when running demos/dp/SearchProfile.java. The program
return an error message :

classes\demos>java dp.SearchProfile fake.fasta
Loading sequences
Creating profile HMM
Estimating alignment as having length 999
org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
indexed member of the alphabet Transitions from i-791 despite being in the
alphabet.
        at
org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
        at
org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
        at
org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
        at
org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
        at dp.SearchProfile.randomize(SearchProfile.java:155)
        at dp.SearchProfile.createProfile(SearchProfile.java:104)
        at dp.SearchProfile.main(SearchProfile.java:31)

I suspect this problem occur when the the program run train method in
SimpleDistribution class. Is this a bug or what?

Any help will be deeply appreciated...

=====
Aulia Rahma Amin
Undergraduate Student
School of Electrical Engineering and Informatics
Bandung Institute of Technology
Indonesia


From markjschreiber at gmail.com  Tue Jul 10 02:13:55 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Jul 2007 14:13:55 +0800
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>

Hi -

What version of BioJava do yo have?

- Mark

On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> I have a problem when running demos/dp/SearchProfile.java. The program
> return an error message :
>
> classes\demos>java dp.SearchProfile fake.fasta
> Loading sequences
> Creating profile HMM
> Estimating alignment as having length 999
> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
> indexed member of the alphabet Transitions from i-791 despite being in the
> alphabet.
>         at
> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
>         at
> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
>         at
> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
>         at
> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
>         at dp.SearchProfile.randomize(SearchProfile.java:155)
>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
>         at dp.SearchProfile.main(SearchProfile.java:31)
>
> I suspect this problem occur when the the program run train method in
> SimpleDistribution class. Is this a bug or what?
>
> Any help will be deeply appreciated...
>
> =====
> Aulia Rahma Amin
> Undergraduate Student
> School of Electrical Engineering and Informatics
> Bandung Institute of Technology
> Indonesia
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From aulia at students.itb.ac.id  Tue Jul 10 02:29:48 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Tue, 10 Jul 2007 13:29:48 +0700 (WIT)
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
Message-ID: <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>

I'm using BioJava 1.5. I didn't find this problem when using BioJava 1.4.

-aulia-

> Hi -
>
> What version of BioJava do yo have?
>
> - Mark
>
> On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>> I have a problem when running demos/dp/SearchProfile.java. The program
>> return an error message :
>>
>> classes\demos>java dp.SearchProfile fake.fasta
>> Loading sequences
>> Creating profile HMM
>> Estimating alignment as having length 999
>> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
>> indexed member of the alphabet Transitions from i-791 despite being in
>> the
>> alphabet.
>>         at
>> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
>>         at
>> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
>>         at
>> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
>>         at
>> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
>>         at dp.SearchProfile.randomize(SearchProfile.java:155)
>>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
>>         at dp.SearchProfile.main(SearchProfile.java:31)
>>
>> I suspect this problem occur when the the program run train method in
>> SimpleDistribution class. Is this a bug or what?
>>
>> Any help will be deeply appreciated...
>>
>> =====
>> Aulia Rahma Amin
>> Undergraduate Student
>> School of Electrical Engineering and Informatics
>> Bandung Institute of Technology
>> Indonesia
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>


From markjschreiber at gmail.com  Tue Jul 10 02:54:51 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Jul 2007 14:54:51 +0800
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
	<4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>

Does the problem occur with the ProfileHMM example in the cookbook?
(http://biojava.org/wiki/BioJava:CookBook:DP:HMM)

- Mark

On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> I'm using BioJava 1.5. I didn't find this problem when using BioJava 1.4.
>
> -aulia-
>
> > Hi -
> >
> > What version of BioJava do yo have?
> >
> > - Mark
> >
> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> >> I have a problem when running demos/dp/SearchProfile.java. The program
> >> return an error message :
> >>
> >> classes\demos>java dp.SearchProfile fake.fasta
> >> Loading sequences
> >> Creating profile HMM
> >> Estimating alignment as having length 999
> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
> >> indexed member of the alphabet Transitions from i-791 despite being in
> >> the
> >> alphabet.
> >>         at
> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
> >>         at
> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
> >>         at
> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
> >>         at
> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
> >>         at dp.SearchProfile.randomize(SearchProfile.java:155)
> >>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
> >>         at dp.SearchProfile.main(SearchProfile.java:31)
> >>
> >> I suspect this problem occur when the the program run train method in
> >> SimpleDistribution class. Is this a bug or what?
> >>
> >> Any help will be deeply appreciated...
> >>
> >> =====
> >> Aulia Rahma Amin
> >> Undergraduate Student
> >> School of Electrical Engineering and Informatics
> >> Bandung Institute of Technology
> >> Indonesia
> >>
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From aulia at students.itb.ac.id  Tue Jul 10 03:02:08 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Tue, 10 Jul 2007 14:02:08 +0700 (WIT)
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> 
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> 
	<4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
	<93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>
Message-ID: <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id>

Yes, it happens, always end with org.biojava.bio.BioError: Assertion
Failure. But I have no problems when running the example using BioJava
1.4.

-aulia-

> Does the problem occur with the ProfileHMM example in the cookbook?
> (http://biojava.org/wiki/BioJava:CookBook:DP:HMM)
>
> - Mark
>
> On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>> I'm using BioJava 1.5. I didn't find this problem when using BioJava
>> 1.4.
>>
>> -aulia-
>>
>> > Hi -
>> >
>> > What version of BioJava do yo have?
>> >
>> > - Mark
>> >
>> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>> >> I have a problem when running demos/dp/SearchProfile.java. The
>> program
>> >> return an error message :
>> >>
>> >> classes\demos>java dp.SearchProfile fake.fasta
>> >> Loading sequences
>> >> Creating profile HMM
>> >> Estimating alignment as having length 999
>> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
>> >> indexed member of the alphabet Transitions from i-791 despite being
>> in
>> >> the
>> >> alphabet.
>> >>         at
>> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
>> >>         at
>> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
>> >>         at
>> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
>> >>         at
>> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
>> >>         at dp.SearchProfile.randomize(SearchProfile.java:155)
>> >>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
>> >>         at dp.SearchProfile.main(SearchProfile.java:31)
>> >>
>> >> I suspect this problem occur when the the program run train method in
>> >> SimpleDistribution class. Is this a bug or what?
>> >>
>> >> Any help will be deeply appreciated...
>> >>
>> >> =====
>> >> Aulia Rahma Amin
>> >> Undergraduate Student
>> >> School of Electrical Engineering and Informatics
>> >> Bandung Institute of Technology
>> >> Indonesia
>> >>
>> >>
>> >> _______________________________________________
>> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >>
>> >
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>


-- 
Aulia Rahma Amin
ARC05/IF03
Y! ID : aulia_ra
Skype ID : aulia_ra
MSN ID : aulia_ra at hotmail.com
AIM ID : auliara
ICQ ID : aulia_ra
Homepage : http://www.aulia-ra.org


From markjschreiber at gmail.com  Tue Jul 10 06:01:30 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Jul 2007 18:01:30 +0800
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
	<4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
	<93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>
	<4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707100301s61a8c522m5039b02447f0bd07@mail.gmail.com>

I have submitted this as a bug report. It seems to be a bug in all HMM
code. Some initial testing suggests it is a problem with Flyweight
symbols (States) not behaiving properly.

My tests of ProfileHMMs still worked about 7-8 months ago. According
to CVS the only thing that happened after that time to classes that
might be relevant was a semi-automated removal of crud from the code
(unused parameters etc). It is very hard to tell which change did the
damage.

I suspect I will have to write some unit tests for the DP classes.
Somehow I think this should have happened about 6 years ago (MRP, are
you listening!!) but better late than never : )

- Mark

On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> Yes, it happens, always end with org.biojava.bio.BioError: Assertion
> Failure. But I have no problems when running the example using BioJava
> 1.4.
>
> -aulia-
>
> > Does the problem occur with the ProfileHMM example in the cookbook?
> > (http://biojava.org/wiki/BioJava:CookBook:DP:HMM)
> >
> > - Mark
> >
> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> >> I'm using BioJava 1.5. I didn't find this problem when using BioJava
> >> 1.4.
> >>
> >> -aulia-
> >>
> >> > Hi -
> >> >
> >> > What version of BioJava do yo have?
> >> >
> >> > - Mark
> >> >
> >> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> >> >> I have a problem when running demos/dp/SearchProfile.java. The
> >> program
> >> >> return an error message :
> >> >>
> >> >> classes\demos>java dp.SearchProfile fake.fasta
> >> >> Loading sequences
> >> >> Creating profile HMM
> >> >> Estimating alignment as having length 999
> >> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
> >> >> indexed member of the alphabet Transitions from i-791 despite being
> >> in
> >> >> the
> >> >> alphabet.
> >> >>         at
> >> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
> >> >>         at
> >> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
> >> >>         at
> >> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
> >> >>         at
> >> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
> >> >>         at dp.SearchProfile.randomize(SearchProfile.java:155)
> >> >>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
> >> >>         at dp.SearchProfile.main(SearchProfile.java:31)
> >> >>
> >> >> I suspect this problem occur when the the program run train method in
> >> >> SimpleDistribution class. Is this a bug or what?
> >> >>
> >> >> Any help will be deeply appreciated...
> >> >>
> >> >> =====
> >> >> Aulia Rahma Amin
> >> >> Undergraduate Student
> >> >> School of Electrical Engineering and Informatics
> >> >> Bandung Institute of Technology
> >> >> Indonesia
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> >>
> >> >
> >>
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >
>
>
> --
> Aulia Rahma Amin
> ARC05/IF03
> Y! ID : aulia_ra
> Skype ID : aulia_ra
> MSN ID : aulia_ra at hotmail.com
> AIM ID : auliara
> ICQ ID : aulia_ra
> Homepage : http://www.aulia-ra.org
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From Jonathan.Warren at agresearch.co.nz  Fri Jul 13 00:37:56 2007
From: Jonathan.Warren at agresearch.co.nz (Warren, Jonathan)
Date: Fri, 13 Jul 2007 16:37:56 +1200
Subject: [Biojava-l] ACE parser
Message-ID: <D5DBA313349A4B458528BE63B387F36C0589E8BD@imail.agresearch.co.nz>

Hi

I've seen posts related to people writing an ace file format parser
(contig assembly output type
http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I
believe there is  not one available in biojava?

I am thinking of writing one and contributing it to biojava.

Thinking about the design of it - has anyone got any advice or pointers?
If I want to hide the data and mechanics from users I don't want to give
access to all the data it gathers - but not knowing how people are going
to use it implies that maybe I should give a lot of access to the data??

 
Cheers

Jonathan.


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From holland at ebi.ac.uk  Fri Jul 13 03:34:01 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Fri, 13 Jul 2007 08:34:01 +0100
Subject: [Biojava-l] ACE parser
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0589E8BD@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0589E8BD@imail.agresearch.co.nz>
Message-ID: <46972AE9.7000205@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Jon!

There is still no ACE parser in BioJava that I know about, so a new
parser would be most welcome. Thanks for volunteering!

The way we write parsers is to split the task into various stages:

    xxx : some BioJava object that can represent all the data in the
file (e.g. Sequence, or ABIChromatogram).

    xxxFormat : actually reads the file, accepts an xxxListener as a
parameter whilst doing so and signals events to that listener as it
processes various parts of the file. Also has a method for writing a new
file based on some existing xxx object. The xxxFormat input parts always
work from InputStreams, with convenience methods that accept Files (or
sometimes even URLs) and delegate to the main InputStream methods. Same
goes for the output parts - OutputStream by default, with appropriate
File/URL/etc. convenience methods.

    xxxListener : listens for 'events' - this is an interface (e.g.
startNewSequence(), addSequenceChunk(), startFeature(), addLocation(),
endSequence(), etc.).

    xxxBuilder : implements xxxListener and has an extra method to
retrieve an xxx object containing all the data it has received so far
(for instance, the builders that listen for events from sequence files
build Sequence objects).

The idea is that the xxxBuilder object will build a complete object with
as much relevant data from the file as possible, but if you don't want
that much information you can pass in your own xxxListener
implementation to xxxParser which only listens to events representing
bits of the file it is interested in. There is usually a default
xxxListener implementation for every xxxListener interface with empty
methods that ignore everything, which xxxBuilder or your own custom
implementation then extends, overriding the methods which supply the
data that it wants.

cheers,
Richard

Warren, Jonathan wrote:
> Hi
> 
> I've seen posts related to people writing an ace file format parser
> (contig assembly output type
> http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I
> believe there is  not one available in biojava?
> 
> I am thinking of writing one and contributing it to biojava.
> 
> Thinking about the design of it - has anyone got any advice or pointers?
> If I want to hide the data and mechanics from users I don't want to give
> access to all the data it gathers - but not knowing how people are going
> to use it implies that maybe I should give a lot of access to the data??
> 
>  
> Cheers
> 
> Jonathan.
> 
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlyrp4C5LeMEKA/QRArAsAKCZIOPFSpXv5a8WqtY3zE5caJpk4gCfSBLC
AW3L7kAWOFmEQ3zRN467qhA=
=qX7u
-----END PGP SIGNATURE-----

From ilangocal at yahoo.com  Tue Jul 17 23:09:48 2007
From: ilangocal at yahoo.com (ilango)
Date: Tue, 17 Jul 2007 20:09:48 -0700 (PDT)
Subject: [Biojava-l] newbie with just a Computer Science Background
Message-ID: <727899.49968.qm@web56103.mail.re3.yahoo.com>

Hi
I have a Master Degree in Computer Science. However I would like to develop in BioJava. I am wondering if I can do this, with my lack of a degree in Biology or the Life Sciences. 

Is it possible to contribute to the development of BioJava and if so, in what way.

thanks very much
ilango


---------------------------------
Got a little couch potato? 
Check out fun summer activities for kids.

From markjschreiber at gmail.com  Wed Jul 18 01:07:36 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 18 Jul 2007 13:07:36 +0800
Subject: [Biojava-l] newbie with just a Computer Science Background
In-Reply-To: <727899.49968.qm@web56103.mail.re3.yahoo.com>
References: <727899.49968.qm@web56103.mail.re3.yahoo.com>
Message-ID: <93b45ca50707172207odb8afbl48ce54df9b70883a@mail.gmail.com>

Hi -

If you have no background in biology there will be some limitations but you
may be interested in looking at things like HMMs in the DP package.  It
would also be interesting for someone to do some profilling of the code base
to find examples of poor code etc.

We always need more unit tests as well!

- Mark


On 7/18/07, ilango <ilangocal at yahoo.com> wrote:
>
> Hi
> I have a Master Degree in Computer Science. However I would like to
> develop in BioJava. I am wondering if I can do this, with my lack of a
> degree in Biology or the Life Sciences.
>
> Is it possible to contribute to the development of BioJava and if so, in
> what way.
>
> thanks very much
> ilango
>
>
>
>
>
> ---------------------------------
> Got a little couch potato?
> Check out fun summer activities for kids.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From markjschreiber at gmail.com  Wed Jul 18 06:19:42 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 18 Jul 2007 18:19:42 +0800
Subject: [Biojava-l] BOSC 2007 Biojava Presentation
Message-ID: <93b45ca50707180319y1a5fcdc1r170c22ae1fcb8a8d@mail.gmail.com>

Hi -

If you couldn't make it to BOSC 2007 this year then you can get a copy of
Richard's BioJava talk from the current events tab of www.biojava.org or
here http://www.biojava.org/download/files/bosc2007.pdf

- Mark

From dmitry.repchevski at bsc.es  Mon Jul 23 06:51:35 2007
From: dmitry.repchevski at bsc.es (Dmitry Repchevsky)
Date: Mon, 23 Jul 2007 12:51:35 +0200
Subject: [Biojava-l] Blast XML + XSL = HTML
Message-ID: <46A48837.5070405@bsc.es>

Hello!

I used biojava Blast2HTMLHandler, but found it unflexible and slow (?).
Finally I made an xsl stylesheet  to convert blast output into html 
<div> element.
Also I have a class BlastXML2HTML to make the transform , it's pretty 
simple.
May I contribute it?

Best regards,

Dmitry

From holland at ebi.ac.uk  Mon Jul 23 07:40:29 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Mon, 23 Jul 2007 12:40:29 +0100
Subject: [Biojava-l] Blast XML + XSL = HTML
In-Reply-To: <46A48837.5070405@bsc.es>
References: <46A48837.5070405@bsc.es>
Message-ID: <46A493AD.3000107@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dmitry.

Thanks for your efforts and offer to contribute your code to the BioJava
project.

You say that the existing Blast2HTMLHanlder handler is inflexible, which
is true enough. However I don't see how substituting the handler for an
XSL stylesheet would make it any more flexible - the user would still
have to live with an HTML format specified by the designer of the
stylesheet unless the wrapping code that calls the stylesheet could
somehow dynamically modify the XML based on method calls.

I'm also unsure as to whether the whole transformation is appropriate
for BioJava (the Blast2HTMLHandler itself is on borderline territory -
saved only by the fact that it creates the reports based on SAX events
that can potentially come from non-BlastXML sources). BioJava is a Java
toolkit, and the transformation from XML to HTML via an XSL stylesheet
doesn't require Java at all.

Mark - if you're reading this - guidance, please?

cheers,
Richard

Dmitry Repchevsky wrote:
> Hello!
> 
> I used biojava Blast2HTMLHandler, but found it unflexible and slow (?).
> Finally I made an xsl stylesheet  to convert blast output into html 
> <div> element.
> Also I have a class BlastXML2HTML to make the transform , it's pretty 
> simple.
> May I contribute it?
> 
> Best regards,
> 
> Dmitry
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGpJOs4C5LeMEKA/QRApF6AJ9kibj7mJ44W2/fTw/cYPHOx/O74gCfT3Zn
b90G56jji+Ro32fq/kuxbJA=
=X95V
-----END PGP SIGNATURE-----

From markjschreiber at gmail.com  Mon Jul 23 07:45:04 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 23 Jul 2007 19:45:04 +0800
Subject: [Biojava-l] Blast XML + XSL = HTML
In-Reply-To: <46A493AD.3000107@ebi.ac.uk>
References: <46A48837.5070405@bsc.es> <46A493AD.3000107@ebi.ac.uk>
Message-ID: <93b45ca50707230445p12a059f1n9aedf2ab7b887df@mail.gmail.com>

Hi Richard / Dmitry

A good home for this might be the biojava cookbook on the biojava wiki (
www.biojava.org). Although it isn't strictly biojava people may find it a
useful example.

- Mark

On 7/23/07, Richard Holland <holland at ebi.ac.uk> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Dmitry.
>
> Thanks for your efforts and offer to contribute your code to the BioJava
> project.
>
> You say that the existing Blast2HTMLHanlder handler is inflexible, which
> is true enough. However I don't see how substituting the handler for an
> XSL stylesheet would make it any more flexible - the user would still
> have to live with an HTML format specified by the designer of the
> stylesheet unless the wrapping code that calls the stylesheet could
> somehow dynamically modify the XML based on method calls.
>
> I'm also unsure as to whether the whole transformation is appropriate
> for BioJava (the Blast2HTMLHandler itself is on borderline territory -
> saved only by the fact that it creates the reports based on SAX events
> that can potentially come from non-BlastXML sources). BioJava is a Java
> toolkit, and the transformation from XML to HTML via an XSL stylesheet
> doesn't require Java at all.
>
> Mark - if you're reading this - guidance, please?
>
> cheers,
> Richard
>
> Dmitry Repchevsky wrote:
> > Hello!
> >
> > I used biojava Blast2HTMLHandler, but found it unflexible and slow (?).
> > Finally I made an xsl stylesheet  to convert blast output into html
> > <div> element.
> > Also I have a class BlastXML2HTML to make the transform , it's pretty
> > simple.
> > May I contribute it?
> >
> > Best regards,
> >
> > Dmitry
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGpJOs4C5LeMEKA/QRApF6AJ9kibj7mJ44W2/fTw/cYPHOx/O74gCfT3Zn
> b90G56jji+Ro32fq/kuxbJA=
> =X95V
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From dms700 at gmail.com  Wed Jul 25 16:10:11 2007
From: dms700 at gmail.com (dmitriy)
Date: Wed, 25 Jul 2007 16:10:11 -0400
Subject: [Biojava-l] Extracting 3' UTR, 5' UTR, exons, introns,
	CD sequence structure for NCBI NM RefSeq
Message-ID: <299614de0707251310p4d78c9d1p1360ebb3ee421ba6@mail.gmail.com>

Hi

Does anyone has the code which takes NCBI NM RefSeq number and NCBI NC
RefSeq number gets NC RefSeq  from NCBI and parses it in such way so
for specified NM RefSeq "gene table" object is build. "Gene table"
object should have information on 3' UTR, 5' UTR, exons, introns and
CD sequence. The data in "gene table" should be sufficient for example
to generate sequence string with 3' UTR, 5' UTR,  introns, non coding
exon(s)  or part(s) of exon(s)  in small letters and coding exon(s)
or part(s) of exon(s)  in capital letters.

Thanks
Dmitriy

From zt_2003 at 163.com  Mon Jul  2 05:35:08 2007
From: zt_2003 at 163.com (zt_2003)
Date: Mon, 2 Jul 2007 13:35:08 +0800 (CST)
Subject: [Biojava-l] Where can I find the demo of using svm in biojava?
Message-ID: <16701458.2297451183354508767.JavaMail.coremail@bj163app62.163.com>

Who can tell me, where can I find the demo of using svm in biojava? And will biojava support artificial network or bayesian network in future?  


From kavita_mbi at yahoo.com  Wed Jul  4 04:46:03 2007
From: kavita_mbi at yahoo.com (Kavita Agarwal)
Date: Tue, 3 Jul 2007 21:46:03 -0700 (PDT)
Subject: [Biojava-l] Fwd: biojava error
Message-ID: <520964.87338.qm@web39713.mail.mud.yahoo.com>


  Hi,
   
     Iam using biojava in an applet and I get the error :-
   
   Error: Unable to initialise DNATools
   
      but the biojava code runs fine when I use it in an application.
   
        I am running my applat in the appletviewer.  
   
   
    Can anyone tell me how should I exactly set my classpath for biojava and java files. I have these folders-
   
  jdk1.5.0 located  at C:\Program files\Java
  jre1.5.0 at the same location
   biojava -all 6 jar files at C:\Program files\biojava
   

---------------------------------
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.


From kavita_mbi at yahoo.com  Wed Jul  4 04:46:10 2007
From: kavita_mbi at yahoo.com (Kavita Agarwal)
Date: Tue, 3 Jul 2007 21:46:10 -0700 (PDT)
Subject: [Biojava-l] Fwd: biojava error
Message-ID: <823658.22799.qm@web39712.mail.mud.yahoo.com>


  Hi,
   
     Iam using biojava in an applet and I get the error :-
   
   Error: Unable to initialise DNATools
   
      but the biojava code runs fine when I use it in an application.
   
        I am running my applat in the appletviewer.  
   
   
    Can anyone tell me how should I exactly set my classpath for biojava and java files. I have these folders-
   
  jdk1.5.0 located  at C:\Program files\Java
  jre1.5.0 at the same location
   biojava -all 6 jar files at C:\Program files\biojava
   

---------------------------------
Pinpoint customers who are looking for what you sell. 


From holland at ebi.ac.uk  Wed Jul  4 08:06:19 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 04 Jul 2007 09:06:19 +0100
Subject: [Biojava-l] Request for help!
Message-ID: <468B54FB.3090606@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi guys.

I need help with a programming question!

In Java, you can find out the line-end symbol that the JRE is using by
calling:

   System.getProperty("line.separator");

On *nix this returns "\n", for instance.

Our file parsers all rely on this to return the symbol to break lines at
when parsing files. This usually works fine.

BUT... on Windows machines, for certain files, it does not appear to
work! I suspect that these text files were generated on a *nix machine
then transferred by copying files across file systems using native copy
commands, or using binary FTP so that the system retained the *nix
line-end symbols instead of replacing them for the local line-end
symbols as it would have done if they were transferred in text mode via
FTP.

I don't have access to a Windows machine I can test on, but I suspect
that the fix is quite a simple one and boils down to replacing the
System() call with something more intelligent.

Is there any regex or similar thing we can use to spot _all_ kinds of
line-end symbols in text files regardless of the platform the file was
created on or the platform the parser is being run on?

(For information, the only two users who have reported problems like
this are both using Nexus files - I'm not sure what tool generated them
though. The Nexus parser uses the same rules as all the other parsers in
BioJava so I don't think there's anything specifically wrong with it as
opposed to say the GenBank or FASTA parsers.)

cheers,
Richard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
3ppr3WRdJcQgzIAJdUoIX0U=
=Cboa
-----END PGP SIGNATURE-----


From hlapp at gmx.net  Wed Jul  4 12:55:28 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 4 Jul 2007 08:55:28 -0400
Subject: [Biojava-l] Request for help!
In-Reply-To: <468B54FB.3090606@ebi.ac.uk>
References: <468B54FB.3090606@ebi.ac.uk>
Message-ID: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>

In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g  
though I'm not sure this wouldn't incur too much overhead in Java.

You can certainly detect the eol character(s) by line.indexOf('\r');  
if found and the preceding character is '\n' you have DOS/Win-style  
line endings, and otherwise if found it is Mac-style.

However, this all seems like a lot of trouble to go through if all  
that one would need to ask of people is to make sure that the file  
matches the native eol style of the platform, which is really trivial  
to achieve.

For example, to convert Win-style line endings to  Unix:

	$ perl -pi -e 's/\r//g;' <your-files-here>

and from Mac to Unix:

	$ perl -pi -e 's/\r/\n/g;' <your-files-here>

I have these and other simple conversions defined as aliases in  
my .profile, and don't really ever worry about writing lots of code  
to accommodate arbitrary line endings :-)

-hilmar

On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi guys.
>
> I need help with a programming question!
>
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
>
>    System.getProperty("line.separator");
>
> On *nix this returns "\n", for instance.
>
> Our file parsers all rely on this to return the symbol to break  
> lines at
> when parsing files. This usually works fine.
>
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native  
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode  
> via
> FTP.
>
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
>
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
>
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated  
> them
> though. The Nexus parser uses the same rules as all the other  
> parsers in
> BioJava so I don't think there's anything specifically wrong with  
> it as
> opposed to say the GenBank or FASTA parsers.)
>
> cheers,
> Richard
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
> 3ppr3WRdJcQgzIAJdUoIX0U=
> =Cboa
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From markjschreiber at gmail.com  Wed Jul  4 14:10:12 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 4 Jul 2007 22:10:12 +0800
Subject: [Biojava-l] Request for help!
In-Reply-To: <1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
References: <468B54FB.3090606@ebi.ac.uk>
	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
Message-ID: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>

BufferedWriter provides a newLine() method that writes a line
separator but I'm not sure if that gives you a different result or
not.

This may be a JVM bug that needs to be submitted to Sun.

As a very ugly work around it is possible to determine the OS from the
System object as well.

- Mark

On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
> though I'm not sure this wouldn't incur too much overhead in Java.
>
> You can certainly detect the eol character(s) by line.indexOf('\r');
> if found and the preceding character is '\n' you have DOS/Win-style
> line endings, and otherwise if found it is Mac-style.
>
> However, this all seems like a lot of trouble to go through if all
> that one would need to ask of people is to make sure that the file
> matches the native eol style of the platform, which is really trivial
> to achieve.
>
> For example, to convert Win-style line endings to  Unix:
>
>         $ perl -pi -e 's/\r//g;' <your-files-here>
>
> and from Mac to Unix:
>
>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>
> I have these and other simple conversions defined as aliases in
> my .profile, and don't really ever worry about writing lots of code
> to accommodate arbitrary line endings :-)
>
> -hilmar
>
> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi guys.
> >
> > I need help with a programming question!
> >
> > In Java, you can find out the line-end symbol that the JRE is using by
> > calling:
> >
> >    System.getProperty("line.separator");
> >
> > On *nix this returns "\n", for instance.
> >
> > Our file parsers all rely on this to return the symbol to break
> > lines at
> > when parsing files. This usually works fine.
> >
> > BUT... on Windows machines, for certain files, it does not appear to
> > work! I suspect that these text files were generated on a *nix machine
> > then transferred by copying files across file systems using native
> > copy
> > commands, or using binary FTP so that the system retained the *nix
> > line-end symbols instead of replacing them for the local line-end
> > symbols as it would have done if they were transferred in text mode
> > via
> > FTP.
> >
> > I don't have access to a Windows machine I can test on, but I suspect
> > that the fix is quite a simple one and boils down to replacing the
> > System() call with something more intelligent.
> >
> > Is there any regex or similar thing we can use to spot _all_ kinds of
> > line-end symbols in text files regardless of the platform the file was
> > created on or the platform the parser is being run on?
> >
> > (For information, the only two users who have reported problems like
> > this are both using Nexus files - I'm not sure what tool generated
> > them
> > though. The Nexus parser uses the same rules as all the other
> > parsers in
> > BioJava so I don't think there's anything specifically wrong with
> > it as
> > opposed to say the GenBank or FASTA parsers.)
> >
> > cheers,
> > Richard
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
> > 3ppr3WRdJcQgzIAJdUoIX0U=
> > =Cboa
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From ayates at ebi.ac.uk  Wed Jul  4 14:33:28 2007
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 04 Jul 2007 15:33:28 +0100
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
References: <468B54FB.3090606@ebi.ac.uk>	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
Message-ID: <468BAFB8.708@ebi.ac.uk>

BufferedWriter will always use the value of 
System.getProperty("line.separator") however BufferedReader knows that 
an end of line can be \r\n, \r or \n so in Java land is perfectly legal 
to have any common line terminator & still write files in an OS specific 
manner.

I sent a regex to Rich which he improved on but the net result is the 
extraction of the EOL regardless of which one it is.

I'm not 100% sure on where the problem lies. So long as the parsers use 
BufferedReader for it's text file reading (which they all seem to do) 
this shouldn't have been a problem. In fact this is the line from the 
BufferedReader.readLine() in the JDK:

"Read a line of text. A line is considered to be terminated by any one 
of a line feed ('\n'), a carriage return ('\r'), or a carriage return 
followed immediately by a linefeed."

Very very strange but the regex sounds like it was a pragmatic solution

Andy

Mark Schreiber wrote:
> BufferedWriter provides a newLine() method that writes a line
> separator but I'm not sure if that gives you a different result or
> not.
> 
> This may be a JVM bug that needs to be submitted to Sun.
> 
> As a very ugly work around it is possible to determine the OS from the
> System object as well.
> 
> - Mark
> 
> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>> though I'm not sure this wouldn't incur too much overhead in Java.
>>
>> You can certainly detect the eol character(s) by line.indexOf('\r');
>> if found and the preceding character is '\n' you have DOS/Win-style
>> line endings, and otherwise if found it is Mac-style.
>>
>> However, this all seems like a lot of trouble to go through if all
>> that one would need to ask of people is to make sure that the file
>> matches the native eol style of the platform, which is really trivial
>> to achieve.
>>
>> For example, to convert Win-style line endings to  Unix:
>>
>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>
>> and from Mac to Unix:
>>
>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>
>> I have these and other simple conversions defined as aliases in
>> my .profile, and don't really ever worry about writing lots of code
>> to accommodate arbitrary line endings :-)
>>
>> -hilmar
>>
>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi guys.
>>>
>>> I need help with a programming question!
>>>
>>> In Java, you can find out the line-end symbol that the JRE is using by
>>> calling:
>>>
>>>    System.getProperty("line.separator");
>>>
>>> On *nix this returns "\n", for instance.
>>>
>>> Our file parsers all rely on this to return the symbol to break
>>> lines at
>>> when parsing files. This usually works fine.
>>>
>>> BUT... on Windows machines, for certain files, it does not appear to
>>> work! I suspect that these text files were generated on a *nix machine
>>> then transferred by copying files across file systems using native
>>> copy
>>> commands, or using binary FTP so that the system retained the *nix
>>> line-end symbols instead of replacing them for the local line-end
>>> symbols as it would have done if they were transferred in text mode
>>> via
>>> FTP.
>>>
>>> I don't have access to a Windows machine I can test on, but I suspect
>>> that the fix is quite a simple one and boils down to replacing the
>>> System() call with something more intelligent.
>>>
>>> Is there any regex or similar thing we can use to spot _all_ kinds of
>>> line-end symbols in text files regardless of the platform the file was
>>> created on or the platform the parser is being run on?
>>>
>>> (For information, the only two users who have reported problems like
>>> this are both using Nexus files - I'm not sure what tool generated
>>> them
>>> though. The Nexus parser uses the same rules as all the other
>>> parsers in
>>> BioJava so I don't think there's anything specifically wrong with
>>> it as
>>> opposed to say the GenBank or FASTA parsers.)
>>>
>>> cheers,
>>> Richard
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
>>> 3ppr3WRdJcQgzIAJdUoIX0U=
>>> =Cboa
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From holland at ebi.ac.uk  Wed Jul  4 15:04:41 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 04 Jul 2007 16:04:41 +0100
Subject: [Biojava-l] Request for help!
In-Reply-To: <93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
References: <468B54FB.3090606@ebi.ac.uk>	
	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
Message-ID: <468BB709.4010704@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks everyone for your replies. Turns out a regex of the various
combinations of \r and \n is the best way.

cheers,
Richard

Mark Schreiber wrote:
> BufferedWriter provides a newLine() method that writes a line
> separator but I'm not sure if that gives you a different result or
> not.
> 
> This may be a JVM bug that needs to be submitted to Sun.
> 
> As a very ugly work around it is possible to determine the OS from the
> System object as well.
> 
> - Mark
> 
> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>> though I'm not sure this wouldn't incur too much overhead in Java.
>>
>> You can certainly detect the eol character(s) by line.indexOf('\r');
>> if found and the preceding character is '\n' you have DOS/Win-style
>> line endings, and otherwise if found it is Mac-style.
>>
>> However, this all seems like a lot of trouble to go through if all
>> that one would need to ask of people is to make sure that the file
>> matches the native eol style of the platform, which is really trivial
>> to achieve.
>>
>> For example, to convert Win-style line endings to  Unix:
>>
>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>
>> and from Mac to Unix:
>>
>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>
>> I have these and other simple conversions defined as aliases in
>> my .profile, and don't really ever worry about writing lots of code
>> to accommodate arbitrary line endings :-)
>>
>> -hilmar
>>
>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>
> Hi guys.
> 
> I need help with a programming question!
> 
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
> 
>    System.getProperty("line.separator");
> 
> On *nix this returns "\n", for instance.
> 
> Our file parsers all rely on this to return the symbol to break
> lines at
> when parsing files. This usually works fine.
> 
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode
> via
> FTP.
> 
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
> 
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
> 
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated
> them
> though. The Nexus parser uses the same rules as all the other
> parsers in
> BioJava so I don't think there's anything specifically wrong with
> it as
> opposed to say the GenBank or FASTA parsers.)
> 
> cheers,
> Richard
> 
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi7cJ4C5LeMEKA/QRAumDAKCJ5yc8PoZ+sLhcBOkL2Jdp/unW+gCfZrxG
AoVCPngmYX3b/pxfiGJbzic=
=2cyA
-----END PGP SIGNATURE-----


From holland at ebi.ac.uk  Wed Jul  4 15:06:32 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Wed, 04 Jul 2007 16:06:32 +0100
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <468BAFB8.708@ebi.ac.uk>
References: <468B54FB.3090606@ebi.ac.uk>	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
	<468BAFB8.708@ebi.ac.uk>
Message-ID: <468BB778.2050704@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The problem was that I was using the newline in a tokenizer, which
needed to return and regcognize the newline symbols themselves (the
Nexus format is new-line sensitive). Hence I had to deal with files that
may not have the system new-line operator.

cheers,
Richard

Andy Yates wrote:
> BufferedWriter will always use the value of 
> System.getProperty("line.separator") however BufferedReader knows that 
> an end of line can be \r\n, \r or \n so in Java land is perfectly legal 
> to have any common line terminator & still write files in an OS specific 
> manner.
> 
> I sent a regex to Rich which he improved on but the net result is the 
> extraction of the EOL regardless of which one it is.
> 
> I'm not 100% sure on where the problem lies. So long as the parsers use 
> BufferedReader for it's text file reading (which they all seem to do) 
> this shouldn't have been a problem. In fact this is the line from the 
> BufferedReader.readLine() in the JDK:
> 
> "Read a line of text. A line is considered to be terminated by any one 
> of a line feed ('\n'), a carriage return ('\r'), or a carriage return 
> followed immediately by a linefeed."
> 
> Very very strange but the regex sounds like it was a pragmatic solution
> 
> Andy
> 
> Mark Schreiber wrote:
>> BufferedWriter provides a newLine() method that writes a line
>> separator but I'm not sure if that gives you a different result or
>> not.
>>
>> This may be a JVM bug that needs to be submitted to Sun.
>>
>> As a very ugly work around it is possible to determine the OS from the
>> System object as well.
>>
>> - Mark
>>
>> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>>> though I'm not sure this wouldn't incur too much overhead in Java.
>>>
>>> You can certainly detect the eol character(s) by line.indexOf('\r');
>>> if found and the preceding character is '\n' you have DOS/Win-style
>>> line endings, and otherwise if found it is Mac-style.
>>>
>>> However, this all seems like a lot of trouble to go through if all
>>> that one would need to ask of people is to make sure that the file
>>> matches the native eol style of the platform, which is really trivial
>>> to achieve.
>>>
>>> For example, to convert Win-style line endings to  Unix:
>>>
>>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>>
>>> and from Mac to Unix:
>>>
>>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>>
>>> I have these and other simple conversions defined as aliases in
>>> my .profile, and don't really ever worry about writing lots of code
>>> to accommodate arbitrary line endings :-)
>>>
>>> -hilmar
>>>
>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>>
> Hi guys.
> 
> I need help with a programming question!
> 
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
> 
>    System.getProperty("line.separator");
> 
> On *nix this returns "\n", for instance.
> 
> Our file parsers all rely on this to return the symbol to break
> lines at
> when parsing files. This usually works fine.
> 
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode
> via
> FTP.
> 
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
> 
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
> 
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated
> them
> though. The Nexus parser uses the same rules as all the other
> parsers in
> BioJava so I don't think there's anything specifically wrong with
> it as
> opposed to say the GenBank or FASTA parsers.)
> 
> cheers,
> Richard
> 
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp
ctRVPlj5VA0oDzMsoxP4Ohs=
=6wg0
-----END PGP SIGNATURE-----


From markjschreiber at gmail.com  Thu Jul  5 01:29:35 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 5 Jul 2007 09:29:35 +0800
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <468BB778.2050704@ebi.ac.uk>
References: <468B54FB.3090606@ebi.ac.uk>
	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>
	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>
	<468BAFB8.708@ebi.ac.uk> <468BB778.2050704@ebi.ac.uk>
Message-ID: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com>

Slightly related to this ...

It might be worth making a quick check of the biojava code base to see
how often a "\n" appears in the source code.

- Mark

On 7/4/07, Richard Holland <holland at ebi.ac.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> The problem was that I was using the newline in a tokenizer, which
> needed to return and regcognize the newline symbols themselves (the
> Nexus format is new-line sensitive). Hence I had to deal with files that
> may not have the system new-line operator.
>
> cheers,
> Richard
>
> Andy Yates wrote:
> > BufferedWriter will always use the value of
> > System.getProperty("line.separator") however BufferedReader knows that
> > an end of line can be \r\n, \r or \n so in Java land is perfectly legal
> > to have any common line terminator & still write files in an OS specific
> > manner.
> >
> > I sent a regex to Rich which he improved on but the net result is the
> > extraction of the EOL regardless of which one it is.
> >
> > I'm not 100% sure on where the problem lies. So long as the parsers use
> > BufferedReader for it's text file reading (which they all seem to do)
> > this shouldn't have been a problem. In fact this is the line from the
> > BufferedReader.readLine() in the JDK:
> >
> > "Read a line of text. A line is considered to be terminated by any one
> > of a line feed ('\n'), a carriage return ('\r'), or a carriage return
> > followed immediately by a linefeed."
> >
> > Very very strange but the regex sounds like it was a pragmatic solution
> >
> > Andy
> >
> > Mark Schreiber wrote:
> >> BufferedWriter provides a newLine() method that writes a line
> >> separator but I'm not sure if that gives you a different result or
> >> not.
> >>
> >> This may be a JVM bug that needs to be submitted to Sun.
> >>
> >> As a very ugly work around it is possible to determine the OS from the
> >> System object as well.
> >>
> >> - Mark
> >>
> >> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
> >>> though I'm not sure this wouldn't incur too much overhead in Java.
> >>>
> >>> You can certainly detect the eol character(s) by line.indexOf('\r');
> >>> if found and the preceding character is '\n' you have DOS/Win-style
> >>> line endings, and otherwise if found it is Mac-style.
> >>>
> >>> However, this all seems like a lot of trouble to go through if all
> >>> that one would need to ask of people is to make sure that the file
> >>> matches the native eol style of the platform, which is really trivial
> >>> to achieve.
> >>>
> >>> For example, to convert Win-style line endings to  Unix:
> >>>
> >>>         $ perl -pi -e 's/\r//g;' <your-files-here>
> >>>
> >>> and from Mac to Unix:
> >>>
> >>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
> >>>
> >>> I have these and other simple conversions defined as aliases in
> >>> my .profile, and don't really ever worry about writing lots of code
> >>> to accommodate arbitrary line endings :-)
> >>>
> >>> -hilmar
> >>>
> >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
> >>>
> > Hi guys.
> >
> > I need help with a programming question!
> >
> > In Java, you can find out the line-end symbol that the JRE is using by
> > calling:
> >
> >    System.getProperty("line.separator");
> >
> > On *nix this returns "\n", for instance.
> >
> > Our file parsers all rely on this to return the symbol to break
> > lines at
> > when parsing files. This usually works fine.
> >
> > BUT... on Windows machines, for certain files, it does not appear to
> > work! I suspect that these text files were generated on a *nix machine
> > then transferred by copying files across file systems using native
> > copy
> > commands, or using binary FTP so that the system retained the *nix
> > line-end symbols instead of replacing them for the local line-end
> > symbols as it would have done if they were transferred in text mode
> > via
> > FTP.
> >
> > I don't have access to a Windows machine I can test on, but I suspect
> > that the fix is quite a simple one and boils down to replacing the
> > System() call with something more intelligent.
> >
> > Is there any regex or similar thing we can use to spot _all_ kinds of
> > line-end symbols in text files regardless of the platform the file was
> > created on or the platform the parser is being run on?
> >
> > (For information, the only two users who have reported problems like
> > this are both using Nexus files - I'm not sure what tool generated
> > them
> > though. The Nexus parser uses the same rules as all the other
> > parsers in
> > BioJava so I don't think there's anything specifically wrong with
> > it as
> > opposed to say the GenBank or FASTA parsers.)
> >
> > cheers,
> > Richard
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>> --
> >>> ===========================================================
> >>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>> ===========================================================
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp
> ctRVPlj5VA0oDzMsoxP4Ohs=
> =6wg0
> -----END PGP SIGNATURE-----
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at ebi.ac.uk  Thu Jul  5 07:40:14 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Thu, 05 Jul 2007 08:40:14 +0100
Subject: [Biojava-l] [Biojava-dev]  Request for help!
In-Reply-To: <93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com>
References: <468B54FB.3090606@ebi.ac.uk>	<1B4834D5-68FC-4981-BEB0-0E1F76A5D7B1@gmx.net>	<93b45ca50707040710v7299eb1cp6892e0b7e875404c@mail.gmail.com>	<468BAFB8.708@ebi.ac.uk>
	<468BB778.2050704@ebi.ac.uk>
	<93b45ca50707041829j337118c5t7adbcb9717a0a715@mail.gmail.com>
Message-ID: <468CA05E.6070308@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"\n" is used 262 times in 76 different locations:

src/org/biojava/bio/alignment/NeedlemanWunsch.java
src/org/biojava/bio/alignment/SequenceAlignment.java
src/org/biojava/bio/alignment/SmithWaterman.java
src/org/biojava/bio/alignment/SubstitutionMatrix.java
src/org/biojava/bio/chromatogram/graphic/ChromatogramGraphic.java
src/org/biojava/bio/dist/AbstractDistribution.java
src/org/biojava/bio/dp/onehead/SingleDP.java
src/org/biojava/bio/dp/twohead/DPInterpreter.java
src/org/biojava/bio/dp/XmlMarkovModel.java
src/org/biojava/bio/gui/sequence/ImageMap.java
src/org/biojava/bio/program/abi/ABIFParser.java
src/org/biojava/bio/program/blast2html/AbstractAlignmentStyler.java
src/org/biojava/bio/program/blast2html/HTMLRenderer.java
src/org/biojava/bio/program/das/dasalignment/Alignment.java
src/org/biojava/bio/program/das/FeatureRequestManager.java
src/org/biojava/bio/program/sax/BlastLikeAlignmentSAXParser.java
src/org/biojava/bio/program/sax/ClustalWAlignmentSAXParser.java
src/org/biojava/bio/program/sax/FastaSequenceSAXParser.java
src/org/biojava/bio/program/sax/NeedleAlignmentSAXParser.java
src/org/biojava/bio/search/KnuthMorrisPrattSearch.java
src/org/biojava/bio/seq/db/BioIndex.java
src/org/biojava/bio/seq/db/GenbankSequenceDB.java
src/org/biojava/bio/seq/db/TabIndexStore.java
src/org/biojava/bio/seq/io/agave/AGAVEBioSeqHandler.java
src/org/biojava/bio/seq/io/agave/AGAVEContigHandler.java
src/org/biojava/bio/seq/io/agave/AGAVEDbId.java
src/org/biojava/bio/seq/io/agave/AGAVEKeywordPropHandler.java
src/org/biojava/bio/seq/io/agave/AGAVEMapLocation.java
src/org/biojava/bio/seq/io/agave/AGAVEMapPosition.java
src/org/biojava/bio/seq/io/agave/AGAVEMatchRegion.java
src/org/biojava/bio/seq/io/agave/AGAVEProperty.java
src/org/biojava/bio/seq/io/agave/AGAVEQueryRegion.java
src/org/biojava/bio/seq/io/agave/AGAVERelatedAnnot.java
src/org/biojava/bio/seq/io/agave/AGAVESeqPropHandler.java
src/org/biojava/bio/seq/io/agave/AgaveWriter.java
src/org/biojava/bio/seq/io/agave/AGAVEXref.java
src/org/biojava/bio/seq/io/agave/AGAVEXrefs.java
src/org/biojava/bio/seq/io/agave/Embl2AgaveAnnotFilter.java
src/org/biojava/bio/seq/io/FastaFormat.java
src/org/biojava/bio/seq/io/GenbankFileFormer.java
src/org/biojava/bio/seq/io/ParseException.java
src/org/biojava/bio/structure/align/pairwise/AlternativeAlignment.java
src/org/biojava/bio/structure/ChainImpl.java
src/org/biojava/bio/structure/io/FileConvert.java
src/org/biojava/bio/structure/StructureImpl.java
src/org/biojava/bio/symbol/AbstractSimpleBasisSymbol.java
src/org/biojava/bio/symbol/AlphabetManager.java
src/org/biojava/bio/symbol/DoubleAlphabet.java
src/org/biojava/bio/symbol/IntegerAlphabet.java
src/org/biojava/bio/symbol/SimpleAlignment.java
src/org/biojava/stats/svm/tools/TrainRegression.java
src/org/biojava/utils/automata/DfaBuilder.java
src/org/biojava/utils/automata/FiniteAutomaton.java
src/org/biojava/utils/automata/PatternMaker.java
src/org/biojava/utils/candy/CandyEntry.java
src/org/biojava/utils/ChangeSupport.java
src/org/biojava/utils/ExecRunner.java
src/org/biojava/utils/io/CountedBufferedReader.java
src/org/biojava/utils/ParserException.java
src/org/biojava/utils/StaticMemberPlaceHolder.java
src/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.java
src/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.java
src/org/biojavax/bio/phylo/io/nexus/CharactersBlockParser.java
src/org/biojavax/bio/phylo/io/nexus/DistancesBlockParser.java
src/org/biojavax/bio/phylo/io/nexus/NexusFileFormat.java
src/org/biojavax/bio/phylo/MultipleHitCorrection.java
src/org/biojavax/bio/seq/io/DebuggingRichSeqIOListener.java
src/org/biojavax/bio/seq/io/EMBLFormat.java
src/org/biojavax/bio/seq/io/FastaFormat.java
src/org/biojavax/bio/seq/io/GenbankFormat.java
src/org/biojavax/bio/seq/io/UniProtCommentParser.java
src/org/biojavax/bio/seq/io/UniProtFormat.java
src/org/biojavax/bio/taxa/SimpleNCBITaxonName.java
src/org/biojavax/utils/StringTools.java
src/org/biojavax/utils/XMLTools.java

Not all of these are 'bad' newlines - but still, it's a lot to search
through. I've put it on my list of to-do things for when I'm bored.

cheers,
Richard


Mark Schreiber wrote:
> Slightly related to this ...
> 
> It might be worth making a quick check of the biojava code base to see
> how often a "\n" appears in the source code.
> 
> - Mark
> 
> On 7/4/07, Richard Holland <holland at ebi.ac.uk> wrote:
> The problem was that I was using the newline in a tokenizer, which
> needed to return and regcognize the newline symbols themselves (the
> Nexus format is new-line sensitive). Hence I had to deal with files that
> may not have the system new-line operator.
> 
> cheers,
> Richard
> 
> Andy Yates wrote:
>>>> BufferedWriter will always use the value of
>>>> System.getProperty("line.separator") however BufferedReader knows that
>>>> an end of line can be \r\n, \r or \n so in Java land is perfectly legal
>>>> to have any common line terminator & still write files in an OS specific
>>>> manner.
>>>>
>>>> I sent a regex to Rich which he improved on but the net result is the
>>>> extraction of the EOL regardless of which one it is.
>>>>
>>>> I'm not 100% sure on where the problem lies. So long as the parsers use
>>>> BufferedReader for it's text file reading (which they all seem to do)
>>>> this shouldn't have been a problem. In fact this is the line from the
>>>> BufferedReader.readLine() in the JDK:
>>>>
>>>> "Read a line of text. A line is considered to be terminated by any one
>>>> of a line feed ('\n'), a carriage return ('\r'), or a carriage return
>>>> followed immediately by a linefeed."
>>>>
>>>> Very very strange but the regex sounds like it was a pragmatic solution
>>>>
>>>> Andy
>>>>
>>>> Mark Schreiber wrote:
>>>>> BufferedWriter provides a newLine() method that writes a line
>>>>> separator but I'm not sure if that gives you a different result or
>>>>> not.
>>>>>
>>>>> This may be a JVM bug that needs to be submitted to Sun.
>>>>>
>>>>> As a very ugly work around it is possible to determine the OS from the
>>>>> System object as well.
>>>>>
>>>>> - Mark
>>>>>
>>>>> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>>>>>> though I'm not sure this wouldn't incur too much overhead in Java.
>>>>>>
>>>>>> You can certainly detect the eol character(s) by line.indexOf('\r');
>>>>>> if found and the preceding character is '\n' you have DOS/Win-style
>>>>>> line endings, and otherwise if found it is Mac-style.
>>>>>>
>>>>>> However, this all seems like a lot of trouble to go through if all
>>>>>> that one would need to ask of people is to make sure that the file
>>>>>> matches the native eol style of the platform, which is really trivial
>>>>>> to achieve.
>>>>>>
>>>>>> For example, to convert Win-style line endings to  Unix:
>>>>>>
>>>>>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>>>>>
>>>>>> and from Mac to Unix:
>>>>>>
>>>>>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>>>>>
>>>>>> I have these and other simple conversions defined as aliases in
>>>>>> my .profile, and don't really ever worry about writing lots of code
>>>>>> to accommodate arbitrary line endings :-)
>>>>>>
>>>>>> -hilmar
>>>>>>
>>>>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>>>>>
>>>> Hi guys.
>>>>
>>>> I need help with a programming question!
>>>>
>>>> In Java, you can find out the line-end symbol that the JRE is using by
>>>> calling:
>>>>
>>>>    System.getProperty("line.separator");
>>>>
>>>> On *nix this returns "\n", for instance.
>>>>
>>>> Our file parsers all rely on this to return the symbol to break
>>>> lines at
>>>> when parsing files. This usually works fine.
>>>>
>>>> BUT... on Windows machines, for certain files, it does not appear to
>>>> work! I suspect that these text files were generated on a *nix machine
>>>> then transferred by copying files across file systems using native
>>>> copy
>>>> commands, or using binary FTP so that the system retained the *nix
>>>> line-end symbols instead of replacing them for the local line-end
>>>> symbols as it would have done if they were transferred in text mode
>>>> via
>>>> FTP.
>>>>
>>>> I don't have access to a Windows machine I can test on, but I suspect
>>>> that the fix is quite a simple one and boils down to replacing the
>>>> System() call with something more intelligent.
>>>>
>>>> Is there any regex or similar thing we can use to spot _all_ kinds of
>>>> line-end symbols in text files regardless of the platform the file was
>>>> created on or the platform the parser is being run on?
>>>>
>>>> (For information, the only two users who have reported problems like
>>>> this are both using Nexus files - I'm not sure what tool generated
>>>> them
>>>> though. The Nexus parser uses the same rules as all the other
>>>> parsers in
>>>> BioJava so I don't think there's anything specifically wrong with
>>>> it as
>>>> opposed to say the GenBank or FASTA parsers.)
>>>>
>>>> cheers,
>>>> Richard
>>>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjKBd4C5LeMEKA/QRAuARAJsGmSZpdOEuNyYDNn0Xn1rBA6KBjgCeLr8s
qkMnk1CwoMnqBT0RCwQjuSI=
=X9+G
-----END PGP SIGNATURE-----


From aulia at students.itb.ac.id  Mon Jul  9 07:08:39 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Mon, 9 Jul 2007 14:08:39 +0700 (WIT)
Subject: [Biojava-l] How to read and write a ProfileHMM into file
Message-ID: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>


How to read and write a ProfileHMM into file?
I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM
when I read the file.

-- 
Aulia Rahma Amin
ARC05/IF03
Y! ID : aulia_ra
Skype ID : aulia_ra
MSN ID : aulia_ra at hotmail.com
AIM ID : auliara
ICQ ID : aulia_ra
Homepage : http://www.aulia-ra.org


From holland at ebi.ac.uk  Mon Jul  9 07:46:02 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Mon, 09 Jul 2007 08:46:02 +0100
Subject: [Biojava-l] How to read and write a ProfileHMM into file
In-Reply-To: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
References: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
Message-ID: <4691E7BA.9030209@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Have you tried the classes in org.biojava.bio.program.hmmer ? There is a
parser in there which will read the output from HMMER.

cheers,
Richard

Aulia Rahma Amin wrote:
> How to read and write a ProfileHMM into file?
> I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM
> when I read the file.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkee64C5LeMEKA/QRAvevAKCVYFeUNByQwew6a900oj2MJjHnmACdHE8M
lSJgI+HuhRAjEngMlxI+JVo=
=Ft98
-----END PGP SIGNATURE-----


From markjschreiber at gmail.com  Mon Jul  9 15:59:49 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 9 Jul 2007 23:59:49 +0800
Subject: [Biojava-l] How to read and write a ProfileHMM into file
In-Reply-To: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
References: <3312.167.205.35.81.1183964919.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707090859m13c7ff89wd942f838cf6bdbea@mail.gmail.com>

Hi -

The best possible solution would be to extend XMLMarkovModel so that
it can attempt to construct a ProfileHMM from an XML file.

- Mark

On 7/9/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>
> How to read and write a ProfileHMM into file?
> I try XMLMarkovModel but I can't cast SimpleMarkovModel into ProfileHMM
> when I read the file.
>
> --
> Aulia Rahma Amin
> ARC05/IF03
> Y! ID : aulia_ra
> Skype ID : aulia_ra
> MSN ID : aulia_ra at hotmail.com
> AIM ID : auliara
> ICQ ID : aulia_ra
> Homepage : http://www.aulia-ra.org
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From aulia at students.itb.ac.id  Tue Jul 10 06:05:22 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Tue, 10 Jul 2007 13:05:22 +0700 (WIT)
Subject: [Biojava-l] Problem with SearchProfile demo
Message-ID: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>

I have a problem when running demos/dp/SearchProfile.java. The program
return an error message :

classes\demos>java dp.SearchProfile fake.fasta
Loading sequences
Creating profile HMM
Estimating alignment as having length 999
org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
indexed member of the alphabet Transitions from i-791 despite being in the
alphabet.
        at
org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
        at
org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
        at
org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
        at
org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
        at dp.SearchProfile.randomize(SearchProfile.java:155)
        at dp.SearchProfile.createProfile(SearchProfile.java:104)
        at dp.SearchProfile.main(SearchProfile.java:31)

I suspect this problem occur when the the program run train method in
SimpleDistribution class. Is this a bug or what?

Any help will be deeply appreciated...

=====
Aulia Rahma Amin
Undergraduate Student
School of Electrical Engineering and Informatics
Bandung Institute of Technology
Indonesia


From markjschreiber at gmail.com  Tue Jul 10 06:13:55 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Jul 2007 14:13:55 +0800
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>

Hi -

What version of BioJava do yo have?

- Mark

On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> I have a problem when running demos/dp/SearchProfile.java. The program
> return an error message :
>
> classes\demos>java dp.SearchProfile fake.fasta
> Loading sequences
> Creating profile HMM
> Estimating alignment as having length 999
> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
> indexed member of the alphabet Transitions from i-791 despite being in the
> alphabet.
>         at
> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
>         at
> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
>         at
> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
>         at
> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
>         at dp.SearchProfile.randomize(SearchProfile.java:155)
>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
>         at dp.SearchProfile.main(SearchProfile.java:31)
>
> I suspect this problem occur when the the program run train method in
> SimpleDistribution class. Is this a bug or what?
>
> Any help will be deeply appreciated...
>
> =====
> Aulia Rahma Amin
> Undergraduate Student
> School of Electrical Engineering and Informatics
> Bandung Institute of Technology
> Indonesia
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From aulia at students.itb.ac.id  Tue Jul 10 06:29:48 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Tue, 10 Jul 2007 13:29:48 +0700 (WIT)
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
Message-ID: <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>

I'm using BioJava 1.5. I didn't find this problem when using BioJava 1.4.

-aulia-

> Hi -
>
> What version of BioJava do yo have?
>
> - Mark
>
> On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>> I have a problem when running demos/dp/SearchProfile.java. The program
>> return an error message :
>>
>> classes\demos>java dp.SearchProfile fake.fasta
>> Loading sequences
>> Creating profile HMM
>> Estimating alignment as having length 999
>> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
>> indexed member of the alphabet Transitions from i-791 despite being in
>> the
>> alphabet.
>>         at
>> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
>>         at
>> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
>>         at
>> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
>>         at
>> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
>>         at dp.SearchProfile.randomize(SearchProfile.java:155)
>>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
>>         at dp.SearchProfile.main(SearchProfile.java:31)
>>
>> I suspect this problem occur when the the program run train method in
>> SimpleDistribution class. Is this a bug or what?
>>
>> Any help will be deeply appreciated...
>>
>> =====
>> Aulia Rahma Amin
>> Undergraduate Student
>> School of Electrical Engineering and Informatics
>> Bandung Institute of Technology
>> Indonesia
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>


From markjschreiber at gmail.com  Tue Jul 10 06:54:51 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Jul 2007 14:54:51 +0800
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
	<4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>

Does the problem occur with the ProfileHMM example in the cookbook?
(http://biojava.org/wiki/BioJava:CookBook:DP:HMM)

- Mark

On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> I'm using BioJava 1.5. I didn't find this problem when using BioJava 1.4.
>
> -aulia-
>
> > Hi -
> >
> > What version of BioJava do yo have?
> >
> > - Mark
> >
> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> >> I have a problem when running demos/dp/SearchProfile.java. The program
> >> return an error message :
> >>
> >> classes\demos>java dp.SearchProfile fake.fasta
> >> Loading sequences
> >> Creating profile HMM
> >> Estimating alignment as having length 999
> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
> >> indexed member of the alphabet Transitions from i-791 despite being in
> >> the
> >> alphabet.
> >>         at
> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
> >>         at
> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
> >>         at
> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
> >>         at
> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
> >>         at dp.SearchProfile.randomize(SearchProfile.java:155)
> >>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
> >>         at dp.SearchProfile.main(SearchProfile.java:31)
> >>
> >> I suspect this problem occur when the the program run train method in
> >> SimpleDistribution class. Is this a bug or what?
> >>
> >> Any help will be deeply appreciated...
> >>
> >> =====
> >> Aulia Rahma Amin
> >> Undergraduate Student
> >> School of Electrical Engineering and Informatics
> >> Bandung Institute of Technology
> >> Indonesia
> >>
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From aulia at students.itb.ac.id  Tue Jul 10 07:02:08 2007
From: aulia at students.itb.ac.id (Aulia Rahma Amin)
Date: Tue, 10 Jul 2007 14:02:08 +0700 (WIT)
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id> 
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com> 
	<4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
	<93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>
Message-ID: <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id>

Yes, it happens, always end with org.biojava.bio.BioError: Assertion
Failure. But I have no problems when running the example using BioJava
1.4.

-aulia-

> Does the problem occur with the ProfileHMM example in the cookbook?
> (http://biojava.org/wiki/BioJava:CookBook:DP:HMM)
>
> - Mark
>
> On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>> I'm using BioJava 1.5. I didn't find this problem when using BioJava
>> 1.4.
>>
>> -aulia-
>>
>> > Hi -
>> >
>> > What version of BioJava do yo have?
>> >
>> > - Mark
>> >
>> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
>> >> I have a problem when running demos/dp/SearchProfile.java. The
>> program
>> >> return an error message :
>> >>
>> >> classes\demos>java dp.SearchProfile fake.fasta
>> >> Loading sequences
>> >> Creating profile HMM
>> >> Estimating alignment as having length 999
>> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
>> >> indexed member of the alphabet Transitions from i-791 despite being
>> in
>> >> the
>> >> alphabet.
>> >>         at
>> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
>> >>         at
>> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
>> >>         at
>> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
>> >>         at
>> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
>> >>         at dp.SearchProfile.randomize(SearchProfile.java:155)
>> >>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
>> >>         at dp.SearchProfile.main(SearchProfile.java:31)
>> >>
>> >> I suspect this problem occur when the the program run train method in
>> >> SimpleDistribution class. Is this a bug or what?
>> >>
>> >> Any help will be deeply appreciated...
>> >>
>> >> =====
>> >> Aulia Rahma Amin
>> >> Undergraduate Student
>> >> School of Electrical Engineering and Informatics
>> >> Bandung Institute of Technology
>> >> Indonesia
>> >>
>> >>
>> >> _______________________________________________
>> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >>
>> >
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>


-- 
Aulia Rahma Amin
ARC05/IF03
Y! ID : aulia_ra
Skype ID : aulia_ra
MSN ID : aulia_ra at hotmail.com
AIM ID : auliara
ICQ ID : aulia_ra
Homepage : http://www.aulia-ra.org


From markjschreiber at gmail.com  Tue Jul 10 10:01:30 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 10 Jul 2007 18:01:30 +0800
Subject: [Biojava-l] Problem with SearchProfile demo
In-Reply-To: <4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id>
References: <4321.167.205.35.81.1184047522.squirrel@students.itb.ac.id>
	<93b45ca50707092313s60c8a7d8qb0cc4b77ae0f04dd@mail.gmail.com>
	<4455.167.205.35.81.1184048988.squirrel@students.itb.ac.id>
	<93b45ca50707092354wbb2d6caka601ccb2251d1771@mail.gmail.com>
	<4677.167.205.35.81.1184050928.squirrel@students.itb.ac.id>
Message-ID: <93b45ca50707100301s61a8c522m5039b02447f0bd07@mail.gmail.com>

I have submitted this as a bug report. It seems to be a bug in all HMM
code. Some initial testing suggests it is a problem with Flyweight
symbols (States) not behaiving properly.

My tests of ProfileHMMs still worked about 7-8 months ago. According
to CVS the only thing that happened after that time to classes that
might be relevant was a semi-automated removal of crud from the code
(unused parameters etc). It is very hard to tell which change did the
damage.

I suspect I will have to write some unit tests for the DP classes.
Somehow I think this should have happened about 6 years ago (MRP, are
you listening!!) but better late than never : )

- Mark

On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> Yes, it happens, always end with org.biojava.bio.BioError: Assertion
> Failure. But I have no problems when running the example using BioJava
> 1.4.
>
> -aulia-
>
> > Does the problem occur with the ProfileHMM example in the cookbook?
> > (http://biojava.org/wiki/BioJava:CookBook:DP:HMM)
> >
> > - Mark
> >
> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> >> I'm using BioJava 1.5. I didn't find this problem when using BioJava
> >> 1.4.
> >>
> >> -aulia-
> >>
> >> > Hi -
> >> >
> >> > What version of BioJava do yo have?
> >> >
> >> > - Mark
> >> >
> >> > On 7/10/07, Aulia Rahma Amin <aulia at students.itb.ac.id> wrote:
> >> >> I have a problem when running demos/dp/SearchProfile.java. The
> >> program
> >> >> return an error message :
> >> >>
> >> >> classes\demos>java dp.SearchProfile fake.fasta
> >> >> Loading sequences
> >> >> Creating profile HMM
> >> >> Estimating alignment as having length 999
> >> >> org.biojava.bio.BioError: Assertion Failure: Symbol i-791 was not an
> >> >> indexed member of the alphabet Transitions from i-791 despite being
> >> in
> >> >> the
> >> >> alphabet.
> >> >>         at
> >> >> org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:118)
> >> >>         at
> >> >> org.biojava.bio.dist.IndexedCount.increaseCount(IndexedCount.java:98)
> >> >>         at
> >> >> org.biojava.bio.dist.SimpleDistribution$Trainer.addCount(SimpleDistribution.java:273)
> >> >>         at
> >> >> org.biojava.bio.dist.SimpleDistributionTrainerContext.addCount(SimpleDistributionTrainerContext.java:85)
> >> >>         at dp.SearchProfile.randomize(SearchProfile.java:155)
> >> >>         at dp.SearchProfile.createProfile(SearchProfile.java:104)
> >> >>         at dp.SearchProfile.main(SearchProfile.java:31)
> >> >>
> >> >> I suspect this problem occur when the the program run train method in
> >> >> SimpleDistribution class. Is this a bug or what?
> >> >>
> >> >> Any help will be deeply appreciated...
> >> >>
> >> >> =====
> >> >> Aulia Rahma Amin
> >> >> Undergraduate Student
> >> >> School of Electrical Engineering and Informatics
> >> >> Bandung Institute of Technology
> >> >> Indonesia
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> >>
> >> >
> >>
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >
>
>
> --
> Aulia Rahma Amin
> ARC05/IF03
> Y! ID : aulia_ra
> Skype ID : aulia_ra
> MSN ID : aulia_ra at hotmail.com
> AIM ID : auliara
> ICQ ID : aulia_ra
> Homepage : http://www.aulia-ra.org
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From Jonathan.Warren at agresearch.co.nz  Fri Jul 13 04:37:56 2007
From: Jonathan.Warren at agresearch.co.nz (Warren, Jonathan)
Date: Fri, 13 Jul 2007 16:37:56 +1200
Subject: [Biojava-l] ACE parser
Message-ID: <D5DBA313349A4B458528BE63B387F36C0589E8BD@imail.agresearch.co.nz>

Hi

I've seen posts related to people writing an ace file format parser
(contig assembly output type
http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I
believe there is  not one available in biojava?

I am thinking of writing one and contributing it to biojava.

Thinking about the design of it - has anyone got any advice or pointers?
If I want to hide the data and mechanics from users I don't want to give
access to all the data it gathers - but not knowing how people are going
to use it implies that maybe I should give a lot of access to the data??

 
Cheers

Jonathan.


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From holland at ebi.ac.uk  Fri Jul 13 07:34:01 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Fri, 13 Jul 2007 08:34:01 +0100
Subject: [Biojava-l] ACE parser
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0589E8BD@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0589E8BD@imail.agresearch.co.nz>
Message-ID: <46972AE9.7000205@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Jon!

There is still no ACE parser in BioJava that I know about, so a new
parser would be most welcome. Thanks for volunteering!

The way we write parsers is to split the task into various stages:

    xxx : some BioJava object that can represent all the data in the
file (e.g. Sequence, or ABIChromatogram).

    xxxFormat : actually reads the file, accepts an xxxListener as a
parameter whilst doing so and signals events to that listener as it
processes various parts of the file. Also has a method for writing a new
file based on some existing xxx object. The xxxFormat input parts always
work from InputStreams, with convenience methods that accept Files (or
sometimes even URLs) and delegate to the main InputStream methods. Same
goes for the output parts - OutputStream by default, with appropriate
File/URL/etc. convenience methods.

    xxxListener : listens for 'events' - this is an interface (e.g.
startNewSequence(), addSequenceChunk(), startFeature(), addLocation(),
endSequence(), etc.).

    xxxBuilder : implements xxxListener and has an extra method to
retrieve an xxx object containing all the data it has received so far
(for instance, the builders that listen for events from sequence files
build Sequence objects).

The idea is that the xxxBuilder object will build a complete object with
as much relevant data from the file as possible, but if you don't want
that much information you can pass in your own xxxListener
implementation to xxxParser which only listens to events representing
bits of the file it is interested in. There is usually a default
xxxListener implementation for every xxxListener interface with empty
methods that ignore everything, which xxxBuilder or your own custom
implementation then extends, overriding the methods which supply the
data that it wants.

cheers,
Richard

Warren, Jonathan wrote:
> Hi
> 
> I've seen posts related to people writing an ace file format parser
> (contig assembly output type
> http://bioportal.cgb.indiana.edu/docs/tools/cap3/aceform) but as yet I
> believe there is  not one available in biojava?
> 
> I am thinking of writing one and contributing it to biojava.
> 
> Thinking about the design of it - has anyone got any advice or pointers?
> If I want to hide the data and mechanics from users I don't want to give
> access to all the data it gathers - but not knowing how people are going
> to use it implies that maybe I should give a lot of access to the data??
> 
>  
> Cheers
> 
> Jonathan.
> 
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlyrp4C5LeMEKA/QRArAsAKCZIOPFSpXv5a8WqtY3zE5caJpk4gCfSBLC
AW3L7kAWOFmEQ3zRN467qhA=
=qX7u
-----END PGP SIGNATURE-----


From ilangocal at yahoo.com  Wed Jul 18 03:09:48 2007
From: ilangocal at yahoo.com (ilango)
Date: Tue, 17 Jul 2007 20:09:48 -0700 (PDT)
Subject: [Biojava-l] newbie with just a Computer Science Background
Message-ID: <727899.49968.qm@web56103.mail.re3.yahoo.com>

Hi
I have a Master Degree in Computer Science. However I would like to develop in BioJava. I am wondering if I can do this, with my lack of a degree in Biology or the Life Sciences. 

Is it possible to contribute to the development of BioJava and if so, in what way.

thanks very much
ilango


---------------------------------
Got a little couch potato? 
Check out fun summer activities for kids.


From markjschreiber at gmail.com  Wed Jul 18 05:07:36 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 18 Jul 2007 13:07:36 +0800
Subject: [Biojava-l] newbie with just a Computer Science Background
In-Reply-To: <727899.49968.qm@web56103.mail.re3.yahoo.com>
References: <727899.49968.qm@web56103.mail.re3.yahoo.com>
Message-ID: <93b45ca50707172207odb8afbl48ce54df9b70883a@mail.gmail.com>

Hi -

If you have no background in biology there will be some limitations but you
may be interested in looking at things like HMMs in the DP package.  It
would also be interesting for someone to do some profilling of the code base
to find examples of poor code etc.

We always need more unit tests as well!

- Mark


On 7/18/07, ilango <ilangocal at yahoo.com> wrote:
>
> Hi
> I have a Master Degree in Computer Science. However I would like to
> develop in BioJava. I am wondering if I can do this, with my lack of a
> degree in Biology or the Life Sciences.
>
> Is it possible to contribute to the development of BioJava and if so, in
> what way.
>
> thanks very much
> ilango
>
>
>
>
>
> ---------------------------------
> Got a little couch potato?
> Check out fun summer activities for kids.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From markjschreiber at gmail.com  Wed Jul 18 10:19:42 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 18 Jul 2007 18:19:42 +0800
Subject: [Biojava-l] BOSC 2007 Biojava Presentation
Message-ID: <93b45ca50707180319y1a5fcdc1r170c22ae1fcb8a8d@mail.gmail.com>

Hi -

If you couldn't make it to BOSC 2007 this year then you can get a copy of
Richard's BioJava talk from the current events tab of www.biojava.org or
here http://www.biojava.org/download/files/bosc2007.pdf

- Mark


From dmitry.repchevski at bsc.es  Mon Jul 23 10:51:35 2007
From: dmitry.repchevski at bsc.es (Dmitry Repchevsky)
Date: Mon, 23 Jul 2007 12:51:35 +0200
Subject: [Biojava-l] Blast XML + XSL = HTML
Message-ID: <46A48837.5070405@bsc.es>

Hello!

I used biojava Blast2HTMLHandler, but found it unflexible and slow (?).
Finally I made an xsl stylesheet  to convert blast output into html 
<div> element.
Also I have a class BlastXML2HTML to make the transform , it's pretty 
simple.
May I contribute it?

Best regards,

Dmitry


From holland at ebi.ac.uk  Mon Jul 23 11:40:29 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Mon, 23 Jul 2007 12:40:29 +0100
Subject: [Biojava-l] Blast XML + XSL = HTML
In-Reply-To: <46A48837.5070405@bsc.es>
References: <46A48837.5070405@bsc.es>
Message-ID: <46A493AD.3000107@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dmitry.

Thanks for your efforts and offer to contribute your code to the BioJava
project.

You say that the existing Blast2HTMLHanlder handler is inflexible, which
is true enough. However I don't see how substituting the handler for an
XSL stylesheet would make it any more flexible - the user would still
have to live with an HTML format specified by the designer of the
stylesheet unless the wrapping code that calls the stylesheet could
somehow dynamically modify the XML based on method calls.

I'm also unsure as to whether the whole transformation is appropriate
for BioJava (the Blast2HTMLHandler itself is on borderline territory -
saved only by the fact that it creates the reports based on SAX events
that can potentially come from non-BlastXML sources). BioJava is a Java
toolkit, and the transformation from XML to HTML via an XSL stylesheet
doesn't require Java at all.

Mark - if you're reading this - guidance, please?

cheers,
Richard

Dmitry Repchevsky wrote:
> Hello!
> 
> I used biojava Blast2HTMLHandler, but found it unflexible and slow (?).
> Finally I made an xsl stylesheet  to convert blast output into html 
> <div> element.
> Also I have a class BlastXML2HTML to make the transform , it's pretty 
> simple.
> May I contribute it?
> 
> Best regards,
> 
> Dmitry
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGpJOs4C5LeMEKA/QRApF6AJ9kibj7mJ44W2/fTw/cYPHOx/O74gCfT3Zn
b90G56jji+Ro32fq/kuxbJA=
=X95V
-----END PGP SIGNATURE-----


From markjschreiber at gmail.com  Mon Jul 23 11:45:04 2007
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 23 Jul 2007 19:45:04 +0800
Subject: [Biojava-l] Blast XML + XSL = HTML
In-Reply-To: <46A493AD.3000107@ebi.ac.uk>
References: <46A48837.5070405@bsc.es> <46A493AD.3000107@ebi.ac.uk>
Message-ID: <93b45ca50707230445p12a059f1n9aedf2ab7b887df@mail.gmail.com>

Hi Richard / Dmitry

A good home for this might be the biojava cookbook on the biojava wiki (
www.biojava.org). Although it isn't strictly biojava people may find it a
useful example.

- Mark

On 7/23/07, Richard Holland <holland at ebi.ac.uk> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Dmitry.
>
> Thanks for your efforts and offer to contribute your code to the BioJava
> project.
>
> You say that the existing Blast2HTMLHanlder handler is inflexible, which
> is true enough. However I don't see how substituting the handler for an
> XSL stylesheet would make it any more flexible - the user would still
> have to live with an HTML format specified by the designer of the
> stylesheet unless the wrapping code that calls the stylesheet could
> somehow dynamically modify the XML based on method calls.
>
> I'm also unsure as to whether the whole transformation is appropriate
> for BioJava (the Blast2HTMLHandler itself is on borderline territory -
> saved only by the fact that it creates the reports based on SAX events
> that can potentially come from non-BlastXML sources). BioJava is a Java
> toolkit, and the transformation from XML to HTML via an XSL stylesheet
> doesn't require Java at all.
>
> Mark - if you're reading this - guidance, please?
>
> cheers,
> Richard
>
> Dmitry Repchevsky wrote:
> > Hello!
> >
> > I used biojava Blast2HTMLHandler, but found it unflexible and slow (?).
> > Finally I made an xsl stylesheet  to convert blast output into html
> > <div> element.
> > Also I have a class BlastXML2HTML to make the transform , it's pretty
> > simple.
> > May I contribute it?
> >
> > Best regards,
> >
> > Dmitry
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGpJOs4C5LeMEKA/QRApF6AJ9kibj7mJ44W2/fTw/cYPHOx/O74gCfT3Zn
> b90G56jji+Ro32fq/kuxbJA=
> =X95V
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From dms700 at gmail.com  Wed Jul 25 20:10:11 2007
From: dms700 at gmail.com (dmitriy)
Date: Wed, 25 Jul 2007 16:10:11 -0400
Subject: [Biojava-l] Extracting 3' UTR, 5' UTR, exons, introns,
	CD sequence structure for NCBI NM RefSeq
Message-ID: <299614de0707251310p4d78c9d1p1360ebb3ee421ba6@mail.gmail.com>

Hi

Does anyone has the code which takes NCBI NM RefSeq number and NCBI NC
RefSeq number gets NC RefSeq  from NCBI and parses it in such way so
for specified NM RefSeq "gene table" object is build. "Gene table"
object should have information on 3' UTR, 5' UTR, exons, introns and
CD sequence. The data in "gene table" should be sufficient for example
to generate sequence string with 3' UTR, 5' UTR,  introns, non coding
exon(s)  or part(s) of exon(s)  in small letters and coding exon(s)
or part(s) of exon(s)  in capital letters.

Thanks
Dmitriy