[BioPython] Bio.SeqIO and Clustal aka Clustalw files
Peter
biopython at maubp.freeserve.co.uk
Sun Feb 4 14:55:10 UTC 2007
Hello list,
I've been working on new Bio.SeqIO code for reading and writing clustal
alignments.
For more details about Bio.SeqIO, see here:
http://www.biopython.org/wiki/SeqIO
One issue that has recently come to my attention is how to deal with
clustal alignments with repeated sequence identifiers.
Clustalw 1.83 will reject any file where the first 30 characters of the
identifier are not unique (regardless of the file format).
However, there is nothing in the clustal file format which prevents
this. For example, BioEdit 5.0.7 will happily read and write clustal
format alignments with repeated entries.
Should Bio.SeqIO also be tolerant like this? Its not quite as concise
as the current code, but I have got a rough version of the parser ready
which copes with such files.
Any views?
Peter
More information about the Biopython
mailing list