[Bioperl-l] alignIO::fasta bug
Jason Stajich
jason at bioperl.org
Thu Jan 22 16:33:15 EST 2009
FYI - it appears that if the last sequence in a FASTA MSA is all gaps
(which happens in some whole genome synteny+multiple alignment
chunking like Mercator) no alignment is returned.
yuck. It basically comes down to this bit of code where $end would
equal sequence length.
# If $end <= 0, we have either reached the end of
# file in <> or we have encountered some other error
if ( $end <= 0 && ) {
undef $aln;
return $aln;
}
This start/end requirement of locatable seq is nice but kind of a pain
where I am managing the map of sequences outside of alignment chunk.
Why not just check to see that the number of seq characters is 0 - an
all-gapped sequence as the last sequence of the file should still be
legal.
Instead:
if ( length($seqchar) == 0 ) {
undef $aln;
return $aln;
}
Although that would invalidate an empty alignment like this -- do we
want to still permit these?
>A
>B
>C
Also I've locally implemented possibility of parsing start/end from
the header line that is part of Mercator output and I think a variant
of UCSC headers that
look like this for MFA.
>Cp scaffold_1.3086:644980-660265+
I can commit this to main-trunk so as to not interfere with branch
release. Do any of the LocatableSeq/AlignIO::Fasta maintainers want
to comment?
-jason
More information about the Bioperl-l
mailing list