[Biojava-l] [Biojava-dev] Request for help!
Richard Holland
holland at ebi.ac.uk
Wed Jul 4 11:06:32 EDT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
The problem was that I was using the newline in a tokenizer, which
needed to return and regcognize the newline symbols themselves (the
Nexus format is new-line sensitive). Hence I had to deal with files that
may not have the system new-line operator.
cheers,
Richard
Andy Yates wrote:
> BufferedWriter will always use the value of
> System.getProperty("line.separator") however BufferedReader knows that
> an end of line can be \r\n, \r or \n so in Java land is perfectly legal
> to have any common line terminator & still write files in an OS specific
> manner.
>
> I sent a regex to Rich which he improved on but the net result is the
> extraction of the EOL regardless of which one it is.
>
> I'm not 100% sure on where the problem lies. So long as the parsers use
> BufferedReader for it's text file reading (which they all seem to do)
> this shouldn't have been a problem. In fact this is the line from the
> BufferedReader.readLine() in the JDK:
>
> "Read a line of text. A line is considered to be terminated by any one
> of a line feed ('\n'), a carriage return ('\r'), or a carriage return
> followed immediately by a linefeed."
>
> Very very strange but the regex sounds like it was a pragmatic solution
>
> Andy
>
> Mark Schreiber wrote:
>> BufferedWriter provides a newLine() method that writes a line
>> separator but I'm not sure if that gives you a different result or
>> not.
>>
>> This may be a JVM bug that needs to be submitted to Sun.
>>
>> As a very ugly work around it is possible to determine the OS from the
>> System object as well.
>>
>> - Mark
>>
>> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>>> though I'm not sure this wouldn't incur too much overhead in Java.
>>>
>>> You can certainly detect the eol character(s) by line.indexOf('\r');
>>> if found and the preceding character is '\n' you have DOS/Win-style
>>> line endings, and otherwise if found it is Mac-style.
>>>
>>> However, this all seems like a lot of trouble to go through if all
>>> that one would need to ask of people is to make sure that the file
>>> matches the native eol style of the platform, which is really trivial
>>> to achieve.
>>>
>>> For example, to convert Win-style line endings to Unix:
>>>
>>> $ perl -pi -e 's/\r//g;' <your-files-here>
>>>
>>> and from Mac to Unix:
>>>
>>> $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>>
>>> I have these and other simple conversions defined as aliases in
>>> my .profile, and don't really ever worry about writing lots of code
>>> to accommodate arbitrary line endings :-)
>>>
>>> -hilmar
>>>
>>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>>
> Hi guys.
>
> I need help with a programming question!
>
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
>
> System.getProperty("line.separator");
>
> On *nix this returns "\n", for instance.
>
> Our file parsers all rely on this to return the symbol to break
> lines at
> when parsing files. This usually works fine.
>
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode
> via
> FTP.
>
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
>
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
>
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated
> them
> though. The Nexus parser uses the same rules as all the other
> parsers in
> BioJava so I don't think there's anything specifically wrong with
> it as
> opposed to say the GenBank or FASTA parsers.)
>
> cheers,
> Richard
>
_______________________________________________
Biojava-l mailing list - Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> --
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp
ctRVPlj5VA0oDzMsoxP4Ohs=
=6wg0
-----END PGP SIGNATURE-----
More information about the Biojava-l
mailing list