[BioPython] Bio.SeqIO and Clustal aka Clustalw files

Michiel De Hoon mdehoon at c2b2.columbia.edu
Sun Feb 4 18:50:14 UTC 2007


> Clustalw 1.83 will reject any file where the first 30 characters of the 
> identifier are not unique (regardless of the file format).
> 
> However, there is nothing in the clustal file format which prevents 
> this.  For example, BioEdit 5.0.7 will happily read and write clustal 
> format alignments with repeated entries.
>
> Should Bio.SeqIO also be tolerant like this?

Yes, I think so. Some users may want to write a file in the Clustal format to
use it with some program other Clustal. Also, assuming that clustal gives a
clear error message when the file contains longer identifiers, that should be
sufficient to enable the user to fix the problem.

By the way, let us know when you feel that the Bio.SeqIO code is ready to be
included in the next Biopython release (code-named Bronx).

--Michiel.



Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032



-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3185 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20070204/250fe714/attachment-0002.bin>


More information about the Biopython mailing list