[Biojava-l] Request for help!

Richard Holland holland at ebi.ac.uk
Wed Jul 4 04:06:19 EDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi guys.

I need help with a programming question!

In Java, you can find out the line-end symbol that the JRE is using by
calling:

   System.getProperty("line.separator");

On *nix this returns "\n", for instance.

Our file parsers all rely on this to return the symbol to break lines at
when parsing files. This usually works fine.

BUT... on Windows machines, for certain files, it does not appear to
work! I suspect that these text files were generated on a *nix machine
then transferred by copying files across file systems using native copy
commands, or using binary FTP so that the system retained the *nix
line-end symbols instead of replacing them for the local line-end
symbols as it would have done if they were transferred in text mode via
FTP.

I don't have access to a Windows machine I can test on, but I suspect
that the fix is quite a simple one and boils down to replacing the
System() call with something more intelligent.

Is there any regex or similar thing we can use to spot _all_ kinds of
line-end symbols in text files regardless of the platform the file was
created on or the platform the parser is being run on?

(For information, the only two users who have reported problems like
this are both using Nexus files - I'm not sure what tool generated them
though. The Nexus parser uses the same rules as all the other parsers in
BioJava so I don't think there's anything specifically wrong with it as
opposed to say the GenBank or FASTA parsers.)

cheers,
Richard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGi1T74C5LeMEKA/QRAqoeAKCf311nLYPqysNfUVLMy28H0FBMTgCcDaVh
3ppr3WRdJcQgzIAJdUoIX0U=
=Cboa
-----END PGP SIGNATURE-----


More information about the Biojava-l mailing list