[Biojava-l] ah hah. Now I know why: it's a bug, perhaps in MSFAlignment
Guoneng Zhong
Guoneng.Zhong@med.nyu.edu
Fri, 1 Mar 2002 15:43:59 -0500
Relating to the previous email I posted, I believe this might be a bug.
In MSFAlignmentFormat, between lines 137 and 155 is the test to see if
the given report is a DNA, Protein, or RNA. Its method is to go through
the entire report, find out how many a's, t's, g's, c's, and u's there
are. If the number of these nucleotide looking things to the number of
monomers is greater than 90% (line 157), then it is a polynucleotide;
otherwise it is a protein. That makes sense if this were not an
alignment report. In this alignment report, there are many gaps and
they are represented by a dash or a dot. So in my instance, I have
about 60% dots, making the nucleotide only 30% of the whole collection
of Symbols, even though they occupy 100% of the non-gap symbols.
So is this a correct interpretation? If so, is this a bug? Why doesn't
the parser just check the "Type" keyword in the report, where, at least
on mine, it says "N". I suppose if that doesn't work then one could use
the methodology above to guess. But I think the guess is flawed, no?
G