[Bioperl-l] Linefeed mix-up (bug #903)

Hilmar Lapp lapp@gnf.org
Wed, 21 Feb 2001 10:44:18 -0800


A similar problem has been reported by Todd I think, or someone else
on a Mac trying to parse a file from a non-Mac.

You obviously call for trouble if you parse a file from a foreign OS
with a different code for LF than your native system, although I often
had only little problems when trying to parse a MSWin-file on Unix.
This is especially true if you set $/ to an expression containing
newlines as "\n". Of the format parsers, presently only fasta.pm does
this I thought (maybe gcg.pm does so, too); you may wish to test the
fasta parser whether it runs into trouble when you feed it a
MSWin-LF-formatted fasta file on Unix.

A solution could be to add the expression for $/ as an argument to
SeqIO::_readline() (which presently takes no arguments). $/ would then
be set on each _readline() call (so you could even switch between
calls). In addition, _readline() would replace every occurrence of
"\n" in the passed expression by a more general expression, possibly
made up of explicit raw values (hex, octal) catching newline from the
most common systems (e.g. (\012|\015\012|\015)).

What do you think? Is this situation of mixed file-source/native OS
sufficiently relevant to day-to-day reality?

	Hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------