[Bioperl-l] Re: Bioperl and matcher

Keith James kdj@sanger.ac.uk
26 Nov 2002 17:53:54 +0000


>>>>> "Peter" == Peter Rice <peter.rice@uk.lionbioscience.com> writes:

[...]

    Peter> BioPerl seems to be having trouble with the EMBOSS MSF
    Peter> format output. It could be something about the naming of
    Peter> the sequences?

Something else which is going on with EMBOSS MSF format output is that
EMBOSS is padding gapped termini with whitespace (instead of
"."). It's also padding the termini of some of the other alignment
formats with whitespace (instead of "-"). A notable exception is Fasta
alignment format in which both terminal and internal gaps are treated
the same.

The end result is that alignments may be read in without error, but
the parser shifts everything left until the whitespace disappears,
consequently unaligning the sequences.

As this EMBOSS behaviour is at odds with the documentation, I reported
it to EMBOSS bug (with examples) a few days ago. (EMBOSS version
2.5.0, by the way).

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -