[Biopython-dev] Clustal alignment format header line

Peter biopython at maubp.freeserve.co.uk
Tue May 12 16:16:35 UTC 2009


On Tue, May 12, 2009 at 4:43 PM, Cymon Cox <cy at cymon.org> wrote:
>Peter wrote:
>> Also I have a vague memory of some tool using something like "CLUSTAL
>> ... from ToolX" but I don't recall the details.
>
> T-COFFEE for one:
> "CLUSTAL FORMAT for T-COFFEE Version_6.92 [http://www.tcoffee.org] [MODE:
> ], CPU=0.00 sec, SCORE=100, Nseq=2, Len=601"

Yes - that is almost certainly the example I was thinking of.

> Is it so bad to let it fail on the structure of the data - effectively
> ignore the header? Maybe have a general "this doesnt look like clustal
> formatted data" error based on the data structure...

Some of the current error messages are a little cryptic to an end
user, I guess they could have "Are you sure this is a Clustal format
file?" appended to them.

I'd be happy with a whitelist of variant headers, i.e. must start with
"CLUSTAL", "MUSCLE" or "PROBCONS" (assuming these tools don't write
their own file formats which also start that way!).  If people find
new cases and report them, it also gives us notice about another tool
we may want to include in our command line wrappers, and/or obtain
sample output files for the unit tests.

Peter



More information about the Biopython-dev mailing list