[BioPython] Bio.SeqIO and Clustal aka Clustalw files
    Peter 
    biopython at maubp.freeserve.co.uk
       
    Sun Feb  4 14:55:10 UTC 2007
    
    
  
Hello list,
I've been working on new Bio.SeqIO code for reading and writing clustal 
alignments.
For more details about Bio.SeqIO, see here:
http://www.biopython.org/wiki/SeqIO
One issue that has recently come to my attention is how to deal with 
clustal alignments with repeated sequence identifiers.
Clustalw 1.83 will reject any file where the first 30 characters of the 
identifier are not unique (regardless of the file format).
However, there is nothing in the clustal file format which prevents 
this.  For example, BioEdit 5.0.7 will happily read and write clustal 
format alignments with repeated entries.
Should Bio.SeqIO also be tolerant like this?  Its not quite as concise 
as the current code, but I have got a rough version of the parser ready 
which copes with such files.
Any views?
Peter
    
    
More information about the Biopython
mailing list