[Bioperl-l] Phylip format error

Fields, Christopher J cjfields at illinois.edu
Thu May 23 14:05:32 UTC 2013


On May 23, 2013, at 3:30 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, May 23, 2013 at 8:22 AM, Alexey Morozov
> <alexeymorozov1991 at gmail.com> wrote:
>> Which is also worsened by the fact that there is relaxed phylip format,
>> which allows up to 250 chars for taxon name. They are separated from a
>> sequence by single space, which creates problems if names were extended to
>> 10 chars in strict Felsenstein's format by whitespaces. On the whole,
>> phylip is as messily defined format as one can make from a plain textfile
>> with information content of fasta.
>> Bioperl documentation says nothing about whether Bio::SeqIO accepts relaxed
>> phylip and how does it tell dialects from one another. Even if code support
>> is OK, it may be worthwile to explain it somewhere at bioperl.org
> 
> Biopython's AlignIO defines both a (strict) "phylip" and "relaxed-phylip"
> as two separate formats (or variants, like the "fastq" variants). Doing
> the same in BioPerl would seem sensible since auto-detection is not
> easy.
> 
> http://biopython.org/wiki/AlignIO#File_Formats
> 
> Peter
> 
> P.S. Where does that 250 characters for the taxon name limit come from?
> The trouble with relaxed phylip is that some tools are more relaxed than
> others ;)

As Adam pointed out, prior to the introduction of 'relaxed phylip' we had an alternative solution that didn't require a modified format but still allowed one to use PHYLIP and other tools requesting the format.  I think 'relaxed phylip' was introduced by CIPRES a few years back.  Frankly, this is the first time I have seen this mentioned on the list; yay, yet another format variation :)

The variant format parsing (as implemented for SeqIO::fastq, as you know) deals with variant names like 'fastq-sanger', where the main format name is first, the variant of the format second.  The order in this case is reversed (relaxed-phylip), which I'm pretty sure will not work.  Not impossible to allow, but we would probably allow support like this initially:

my $in = Bio::AlignIO->new(-format => 'phylip',
                           -variant => 'relaxed',
                             …);

chris



More information about the Bioperl-l mailing list