[Biojava-l] [Biojava-dev] Request for help!
Mark Schreiber
markjschreiber at gmail.com
Wed Jul 4 21:29:35 EDT 2007
Slightly related to this ...
It might be worth making a quick check of the biojava code base to see
how often a "\n" appears in the source code.
- Mark
On 7/4/07, Richard Holland <holland at ebi.ac.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> The problem was that I was using the newline in a tokenizer, which
> needed to return and regcognize the newline symbols themselves (the
> Nexus format is new-line sensitive). Hence I had to deal with files that
> may not have the system new-line operator.
>
> cheers,
> Richard
>
> Andy Yates wrote:
> > BufferedWriter will always use the value of
> > System.getProperty("line.separator") however BufferedReader knows that
> > an end of line can be \r\n, \r or \n so in Java land is perfectly legal
> > to have any common line terminator & still write files in an OS specific
> > manner.
> >
> > I sent a regex to Rich which he improved on but the net result is the
> > extraction of the EOL regardless of which one it is.
> >
> > I'm not 100% sure on where the problem lies. So long as the parsers use
> > BufferedReader for it's text file reading (which they all seem to do)
> > this shouldn't have been a problem. In fact this is the line from the
> > BufferedReader.readLine() in the JDK:
> >
> > "Read a line of text. A line is considered to be terminated by any one
> > of a line feed ('\n'), a carriage return ('\r'), or a carriage return
> > followed immediately by a linefeed."
> >
> > Very very strange but the regex sounds like it was a pragmatic solution
> >
> > Andy
> >
> > Mark Schreiber wrote:
> >> BufferedWriter provides a newLine() method that writes a line
> >> separator but I'm not sure if that gives you a different result or
> >> not.
> >>
> >> This may be a JVM bug that needs to be submitted to Sun.
> >>
> >> As a very ugly work around it is possible to determine the OS from the
> >> System object as well.
> >>
> >> - Mark
> >>
> >> On 7/4/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>> In Perl it is easy enough to regex-replace s/\n\r/\n/g and s/\r//g
> >>> though I'm not sure this wouldn't incur too much overhead in Java.
> >>>
> >>> You can certainly detect the eol character(s) by line.indexOf('\r');
> >>> if found and the preceding character is '\n' you have DOS/Win-style
> >>> line endings, and otherwise if found it is Mac-style.
> >>>
> >>> However, this all seems like a lot of trouble to go through if all
> >>> that one would need to ask of people is to make sure that the file
> >>> matches the native eol style of the platform, which is really trivial
> >>> to achieve.
> >>>
> >>> For example, to convert Win-style line endings to Unix:
> >>>
> >>> $ perl -pi -e 's/\r//g;' <your-files-here>
> >>>
> >>> and from Mac to Unix:
> >>>
> >>> $ perl -pi -e 's/\r/\n/g;' <your-files-here>
> >>>
> >>> I have these and other simple conversions defined as aliases in
> >>> my .profile, and don't really ever worry about writing lots of code
> >>> to accommodate arbitrary line endings :-)
> >>>
> >>> -hilmar
> >>>
> >>> On Jul 4, 2007, at 4:06 AM, Richard Holland wrote:
> >>>
> > Hi guys.
> >
> > I need help with a programming question!
> >
> > In Java, you can find out the line-end symbol that the JRE is using by
> > calling:
> >
> > System.getProperty("line.separator");
> >
> > On *nix this returns "\n", for instance.
> >
> > Our file parsers all rely on this to return the symbol to break
> > lines at
> > when parsing files. This usually works fine.
> >
> > BUT... on Windows machines, for certain files, it does not appear to
> > work! I suspect that these text files were generated on a *nix machine
> > then transferred by copying files across file systems using native
> > copy
> > commands, or using binary FTP so that the system retained the *nix
> > line-end symbols instead of replacing them for the local line-end
> > symbols as it would have done if they were transferred in text mode
> > via
> > FTP.
> >
> > I don't have access to a Windows machine I can test on, but I suspect
> > that the fix is quite a simple one and boils down to replacing the
> > System() call with something more intelligent.
> >
> > Is there any regex or similar thing we can use to spot _all_ kinds of
> > line-end symbols in text files regardless of the platform the file was
> > created on or the platform the parser is being run on?
> >
> > (For information, the only two users who have reported problems like
> > this are both using Nexus files - I'm not sure what tool generated
> > them
> > though. The Nexus parser uses the same rules as all the other
> > parsers in
> > BioJava so I don't think there's anything specifically wrong with
> > it as
> > opposed to say the GenBank or FASTA parsers.)
> >
> > cheers,
> > Richard
> >
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>> --
> >>> ===========================================================
> >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> >>> ===========================================================
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGi7d34C5LeMEKA/QRAktwAKCJM43x9MlBZx2expYYAiVy8NCFKwCbBkYp
> ctRVPlj5VA0oDzMsoxP4Ohs=
> =6wg0
> -----END PGP SIGNATURE-----
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
More information about the Biojava-l
mailing list