[BioPython] Whitespace in sequences

Paul-Michael Agapow biopython at agapow.net
Tue Feb 18 10:45:50 EST 2003


Possibly a known bug or even a behaviour that makes sense but ...

While recently writing a biopython script to extract subsequences from 
a fasta file, I was surprised to find that whitespace was retained 
within the sequence after it was read into a SeqRecord. Specifically, 
carriage returns ('\r') were left embedded in the sequence, which then 
made the sequence lengths inaccurate and meant I extracted the wrong 
regions.

So, any ideas about this behaviour? I solved it with a simple re to 
remove whitespace, but I can't think of any format in which whitespace 
is significant within a sequence, so surely it should all be cleaned up.

--
Dr Paul-Michael Agapow (p.agapow at ucl.ac.uk)
Dept. Biology, University College London



More information about the BioPython mailing list