[BioPython] Bio.SeqIO and Clustal aka Clustalw files

Peter biopython at maubp.freeserve.co.uk
Sun Feb 4 14:55:10 UTC 2007


Hello list,

I've been working on new Bio.SeqIO code for reading and writing clustal 
alignments.

For more details about Bio.SeqIO, see here:
http://www.biopython.org/wiki/SeqIO

One issue that has recently come to my attention is how to deal with 
clustal alignments with repeated sequence identifiers.

Clustalw 1.83 will reject any file where the first 30 characters of the 
identifier are not unique (regardless of the file format).

However, there is nothing in the clustal file format which prevents 
this.  For example, BioEdit 5.0.7 will happily read and write clustal 
format alignments with repeated entries.

Should Bio.SeqIO also be tolerant like this?  Its not quite as concise 
as the current code, but I have got a rough version of the parser ready 
which copes with such files.

Any views?

Peter




More information about the Biopython mailing list