[BioPython] Whitespace in sequences

Jeffrey Chang jchang at smi.stanford.edu
Tue Feb 18 19:05:23 EST 2003


On Tue, Feb 18, 2003 at 10:45:50AM +0000, Paul-Michael Agapow wrote:
> 
> Possibly a known bug or even a behaviour that makes sense but ...

Nope, neither of those!  :)

> While recently writing a biopython script to extract subsequences from 
> a fasta file, I was surprised to find that whitespace was retained 
> within the sequence after it was read into a SeqRecord. Specifically, 
> carriage returns ('\r') were left embedded in the sequence, which then 
> made the sequence lengths inaccurate and meant I extracted the wrong 
> regions.

What are you using the parse the fasta-formatted sequence file?  The
code in Bio.Fasta looks like it uses string.rstrip to strip the EOL
characters from the end of the lines, which should get rid of the '\r'
characters as well.  it's odd that they're getting left behind...

Jeff


More information about the BioPython mailing list