[BioPython] Whitespace in sequences

Iddo Friedberg idoerg at pines2.ljcrf.edu
Tue Feb 18 03:39:45 EST 2003


Hi,

I guess you were using biopython on a Mac/Windows box, where '\r' or 
'\r\n' is a
newline. Also, it looks like you were using the Bio.Fasta package to
read... the bug shouldn't occur within Bio.SeqIO.FASTA.FastaReader
(although it will within SeqIO.FASTA.FastaWriter!)

Basically, all occurences of the Linux/Unix-centric '\n' should be
replaced with os.linesep. In all modules.

(a few minutes later)

Hmmm... sorry, but I can't seem to commit the bugfix, probably something
to do with snow in Boston, or a Hackathon in Singapore. Take your pick. :)

I'll recheck this in the morning (Pacific time).

Best,

Iddo

--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://bioinformatics.ljcrf.edu/~iddo

On Tue, 18 Feb 2003, Paul-Michael Agapow wrote:

> 
> Possibly a known bug or even a behaviour that makes sense but ...
> 
> While recently writing a biopython script to extract subsequences from 
> a fasta file, I was surprised to find that whitespace was retained 
> within the sequence after it was read into a SeqRecord. Specifically, 
> carriage returns ('\r') were left embedded in the sequence, which then 
> made the sequence lengths inaccurate and meant I extracted the wrong 
> regions.
> 
> So, any ideas about this behaviour? I solved it with a simple re to 
> remove whitespace, but I can't think of any format in which whitespace 
> is significant within a sequence, so surely it should all be cleaned up.
> 
> --
> Dr Paul-Michael Agapow (p.agapow at ucl.ac.uk)
> Dept. Biology, University College London
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 




More information about the BioPython mailing list