[Biojava-dev] Case sensitivity in Alignment

Andreas Prlic andreas at sdsc.edu
Wed Nov 20 11:29:42 UTC 2013


Hi David,

not sure if we should consider this a bug or a feature: It should be easy
to work around this by calling toUppercase on your strings. We could of
course internally convert all nucleotides to upper case, but that would
remove the possibility for people to use mixed upper case and lower case
sequences to represent e.g. alignment conservation.

Any opinions by other people on this? Is anybody using mixed case sequences?

Andreas


On Mon, Nov 18, 2013 at 11:43 AM, Waring, David A <dwaring at fhcrc.org> wrote:

>
> There seems to be a bug in the alignment package. If DNA sequences are
> created using lower case letters, the alignment methods don't work. Looks
> like the the default substitution matrix is coded in upper case, and the
> underlying case of the DNA sequence is being used in the alignment. Seems
> like a bug to me.
>
>  This problem occurs when the DNA Sequence is create either using the
> DNASequence constructor, or reading from a fasta which is in lower case.
>
>
> The code below shows the problem.
>
>
>     static SimpleGapPenalty gapP;
>     static SubstitutionMatrix<NucleotideCompound> matrix;
>
>     public static void main(String[] args)throws Exception{
>        matrix = SubstitutionMatrixHelper.getNuc4_4();
>         gapP = new SimpleGapPenalty();
>         gapP.setOpenPenalty((short)5);
>         gapP.setExtensionPenalty((short)2);
>         testHardcoded();
>     }
>
>     public static void testHardcoded()throws Exception{
>        Sequence<NucleotideCompound> seq1 = new
> DNASequence("AGGGCTTTACCCCGGTTAA");
>         Sequence<NucleotideCompound> seq2 = new
> DNASequence("ACCCCGGTTTAATATTTTT");
>         Sequence<NucleotideCompound> seq3 = new
> DNASequence("agggctttaccccggttaa");
>         Sequence<NucleotideCompound> seq4 = new
> DNASequence("accccggtttaatattttt");
>         alignPair(seq1,seq2);
>         alignPair(seq1,seq4);
>         alignPair(seq3,seq4);
>
>     }
>
>
>     public static void alignPair(Sequence<NucleotideCompound> seq1,
> Sequence<NucleotideCompound> seq2){
>                 SequencePair<Sequence<NucleotideCompound>,
> NucleotideCompound> pair =
>                         Alignments.getPairwiseAlignment(seq1,seq2,
>                         Alignments.PairwiseSequenceAlignerType.GLOBAL,
> gapP, matrix);
>
>         System.out.printf("%s", pair);
>         System.out.println();
>     }
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



More information about the biojava-dev mailing list