[BioPython] Bio.SeqIO and Clustal aka Clustalw files

Julius Lucks lucks at fas.harvard.edu
Sun Feb 4 19:24:25 UTC 2007


What about throwing some sort of error message if there are non- 
unique id's?  Something like what happens when you use NCBIWWW.qblast  
(from Bio.Blast), where it warns you that qblast only works with  
certain databases.  That way users unaware of this issue in Clustalw  
will learn about it, and Bio.SeqIO will still permit you the freedom  
to have non-unique id's.  Maybe also make this warning message easy  
to turn off so it doesn't get annoying.

Julius
-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------



On Feb 4, 2007, at 1:50 PM, Michiel De Hoon wrote:

>> Clustalw 1.83 will reject any file where the first 30 characters  
>> of the
>> identifier are not unique (regardless of the file format).
>>
>> However, there is nothing in the clustal file format which prevents
>> this.  For example, BioEdit 5.0.7 will happily read and write clustal
>> format alignments with repeated entries.
>>
>> Should Bio.SeqIO also be tolerant like this?
>
> Yes, I think so. Some users may want to write a file in the Clustal  
> format to
> use it with some program other Clustal. Also, assuming that clustal  
> gives a
> clear error message when the file contains longer identifiers, that  
> should be
> sufficient to enable the user to fix the problem.
>
> By the way, let us know when you feel that the Bio.SeqIO code is  
> ready to be
> included in the next Biopython release (code-named Bronx).
>
> --Michiel.
>
>
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> <winmail.dat>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython




More information about the Biopython mailing list