[Biojava-l] Request for help!
    Richard Holland 
    holland at ebi.ac.uk
       
    Wed Jul  4 15:04:41 UTC 2007
    
    
  
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Thanks everyone for your replies. Turns out a regex of the various
combinations of \r and \n is the best way.
cheers,
Richard
Mark Schreiber wrote:
> BufferedWriter provides a newLine() method that writes a line
> separator but I'm not sure if that gives you a different result or
> not.
> 
> This may be a JVM bug that needs to be submitted to Sun.
> 
> As a very ugly work around it is possible to determine the OS from the
> System object as well.
> 
> - Mark
> 
> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
>> though I'm not sure this wouldn't incur too much overhead in Java.
>>
>> You can certainly detect the eol character(s) by line.indexOf('\r');
>> if found and the preceding character is '\n' you have DOS/Win-style
>> line endings, and otherwise if found it is Mac-style.
>>
>> However, this all seems like a lot of trouble to go through if all
>> that one would need to ask of people is to make sure that the file
>> matches the native eol style of the platform, which is really trivial
>> to achieve.
>>
>> For example, to convert Win-style line endings to  Unix:
>>
>>         $ perl -pi -e 's/\r//g;' <your-files-here>
>>
>> and from Mac to Unix:
>>
>>         $ perl -pi -e 's/\r/\n/g;' <your-files-here>
>>
>> I have these and other simple conversions defined as aliases in
>> my .profile, and don't really ever worry about writing lots of code
>> to accommodate arbitrary line endings :-)
>>
>> -hilmar
>>
>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
>>
> Hi guys.
> 
> I need help with a programming question!
> 
> In Java, you can find out the line-end symbol that the JRE is using by
> calling:
> 
>    System.getProperty("line.separator");
> 
> On *nix this returns "\n", for instance.
> 
> Our file parsers all rely on this to return the symbol to break
> lines at
> when parsing files. This usually works fine.
> 
> BUT... on Windows machines, for certain files, it does not appear to
> work! I suspect that these text files were generated on a *nix machine
> then transferred by copying files across file systems using native
> copy
> commands, or using binary FTP so that the system retained the *nix
> line-end symbols instead of replacing them for the local line-end
> symbols as it would have done if they were transferred in text mode
> via
> FTP.
> 
> I don't have access to a Windows machine I can test on, but I suspect
> that the fix is quite a simple one and boils down to replacing the
> System() call with something more intelligent.
> 
> Is there any regex or similar thing we can use to spot _all_ kinds of
> line-end symbols in text files regardless of the platform the file was
> created on or the platform the parser is being run on?
> 
> (For information, the only two users who have reported problems like
> this are both using Nexus files - I'm not sure what tool generated
> them
> though. The Nexus parser uses the same rules as all the other
> parsers in
> BioJava so I don't think there's anything specifically wrong with
> it as
> opposed to say the GenBank or FASTA parsers.)
> 
> cheers,
> Richard
> 
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGi7cJ4C5LeMEKA/QRAumDAKCJ5yc8PoZ+sLhcBOkL2Jdp/unW+gCfZrxG
AoVCPngmYX3b/pxfiGJbzic=
=2cyA
-----END PGP SIGNATURE-----
    
    
More information about the Biojava-l
mailing list