[Biojava-dev] Case sensitivity in Alignment

Andreas Prlic andreas at sdsc.edu
Wed Nov 20 16:39:41 UTC 2013


The problem is that the substitution matrices are all upper case. We can
probably fix this by making the NucleotideCompound.equals method case
insensitive...

Does anybody see an issue with that?

A


On Wed, Nov 20, 2013 at 8:22 AM, Michael Heuer <heuermh at gmail.com> wrote:

> Hello Andreas, David
>
> Lower case is the convention for soft-masking sequences from alignment
>
> http://www.ncbi.nlm.nih.gov/books/NBK1763/
>
> http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.Create_a_masked_BLAST
>
> If we are using this convention, perhaps it should be more clearly
> documented.  What happens if you use mixed case?
>
>    michael
>
>
> On Wed, Nov 20, 2013 at 5:29 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> > Hi David,
> >
> > not sure if we should consider this a bug or a feature: It should be easy
> > to work around this by calling toUppercase on your strings. We could of
> > course internally convert all nucleotides to upper case, but that would
> > remove the possibility for people to use mixed upper case and lower case
> > sequences to represent e.g. alignment conservation.
> >
> > Any opinions by other people on this? Is anybody using mixed case
> sequences?
> >
> > Andreas
> >
> >
> > On Mon, Nov 18, 2013 at 11:43 AM, Waring, David A <dwaring at fhcrc.org>
> wrote:
> >
> >>
> >> There seems to be a bug in the alignment package. If DNA sequences are
> >> created using lower case letters, the alignment methods don't work.
> Looks
> >> like the the default substitution matrix is coded in upper case, and the
> >> underlying case of the DNA sequence is being used in the alignment.
> Seems
> >> like a bug to me.
> >>
> >>  This problem occurs when the DNA Sequence is create either using the
> >> DNASequence constructor, or reading from a fasta which is in lower case.
> >>
> >>
> >> The code below shows the problem.
> >>
> >>
> >>     static SimpleGapPenalty gapP;
> >>     static SubstitutionMatrix<NucleotideCompound> matrix;
> >>
> >>     public static void main(String[] args)throws Exception{
> >>        matrix = SubstitutionMatrixHelper.getNuc4_4();
> >>         gapP = new SimpleGapPenalty();
> >>         gapP.setOpenPenalty((short)5);
> >>         gapP.setExtensionPenalty((short)2);
> >>         testHardcoded();
> >>     }
> >>
> >>     public static void testHardcoded()throws Exception{
> >>        Sequence<NucleotideCompound> seq1 = new
> >> DNASequence("AGGGCTTTACCCCGGTTAA");
> >>         Sequence<NucleotideCompound> seq2 = new
> >> DNASequence("ACCCCGGTTTAATATTTTT");
> >>         Sequence<NucleotideCompound> seq3 = new
> >> DNASequence("agggctttaccccggttaa");
> >>         Sequence<NucleotideCompound> seq4 = new
> >> DNASequence("accccggtttaatattttt");
> >>         alignPair(seq1,seq2);
> >>         alignPair(seq1,seq4);
> >>         alignPair(seq3,seq4);
> >>
> >>     }
> >>
> >>
> >>     public static void alignPair(Sequence<NucleotideCompound> seq1,
> >> Sequence<NucleotideCompound> seq2){
> >>                 SequencePair<Sequence<NucleotideCompound>,
> >> NucleotideCompound> pair =
> >>                         Alignments.getPairwiseAlignment(seq1,seq2,
> >>                         Alignments.PairwiseSequenceAlignerType.GLOBAL,
> >> gapP, matrix);
> >>
> >>         System.out.printf("%s", pair);
> >>         System.out.println();
> >>     }
> >>
> >>
> >>
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



More information about the biojava-dev mailing list