[Biojava-dev] Case sensitivity in Alignment

Waring, David A dwaring at fhcrc.org
Mon Nov 18 19:43:11 UTC 2013


There seems to be a bug in the alignment package. If DNA sequences are created using lower case letters, the alignment methods don't work. Looks like the the default substitution matrix is coded in upper case, and the underlying case of the DNA sequence is being used in the alignment. Seems like a bug to me.

 This problem occurs when the DNA Sequence is create either using the DNASequence constructor, or reading from a fasta which is in lower case.


The code below shows the problem.


    static SimpleGapPenalty gapP;
    static SubstitutionMatrix<NucleotideCompound> matrix;

    public static void main(String[] args)throws Exception{
       matrix = SubstitutionMatrixHelper.getNuc4_4();
	gapP = new SimpleGapPenalty();
	gapP.setOpenPenalty((short)5);
	gapP.setExtensionPenalty((short)2);
        testHardcoded();
    }

    public static void testHardcoded()throws Exception{
       Sequence<NucleotideCompound> seq1 = new DNASequence("AGGGCTTTACCCCGGTTAA");
        Sequence<NucleotideCompound> seq2 = new DNASequence("ACCCCGGTTTAATATTTTT");
        Sequence<NucleotideCompound> seq3 = new DNASequence("agggctttaccccggttaa");
        Sequence<NucleotideCompound> seq4 = new DNASequence("accccggtttaatattttt");
        alignPair(seq1,seq2);
        alignPair(seq1,seq4);
        alignPair(seq3,seq4);

    }


    public static void alignPair(Sequence<NucleotideCompound> seq1, Sequence<NucleotideCompound> seq2){
		SequencePair<Sequence<NucleotideCompound>, NucleotideCompound> pair =
			Alignments.getPairwiseAlignment(seq1,seq2,
			Alignments.PairwiseSequenceAlignerType.GLOBAL, gapP, matrix);

        System.out.printf("%s", pair);
        System.out.println();
    }






More information about the biojava-dev mailing list