[Biojava-dev] Case sensitivity in Alignment

Michael Heuer heuermh at gmail.com
Wed Nov 20 16:57:29 UTC 2013


Sorry, I may not be keeping up with you both here, but the code in
question is in the alignment package, and if the substitution matrices
are all upper case they won't match lower case soft masked sequence;
wouldn't that be the intent?  (A feature not a bug)

   michael

On Wed, Nov 20, 2013 at 10:39 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> The problem is that the substitution matrices are all upper case. We can
> probably fix this by making the NucleotideCompound.equals method case
> insensitive...
>
> Does anybody see an issue with that?
>
> A
>
>
> On Wed, Nov 20, 2013 at 8:22 AM, Michael Heuer <heuermh at gmail.com> wrote:
>>
>> Hello Andreas, David
>>
>> Lower case is the convention for soft-masking sequences from alignment
>>
>> http://www.ncbi.nlm.nih.gov/books/NBK1763/
>>
>> http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.Create_a_masked_BLAST
>>
>> If we are using this convention, perhaps it should be more clearly
>> documented.  What happens if you use mixed case?
>>
>>    michael
>>
>>
>> On Wed, Nov 20, 2013 at 5:29 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> > Hi David,
>> >
>> > not sure if we should consider this a bug or a feature: It should be
>> > easy
>> > to work around this by calling toUppercase on your strings. We could of
>> > course internally convert all nucleotides to upper case, but that would
>> > remove the possibility for people to use mixed upper case and lower case
>> > sequences to represent e.g. alignment conservation.
>> >
>> > Any opinions by other people on this? Is anybody using mixed case
>> > sequences?
>> >
>> > Andreas
>> >
>> >
>> > On Mon, Nov 18, 2013 at 11:43 AM, Waring, David A <dwaring at fhcrc.org>
>> > wrote:
>> >
>> >>
>> >> There seems to be a bug in the alignment package. If DNA sequences are
>> >> created using lower case letters, the alignment methods don't work.
>> >> Looks
>> >> like the the default substitution matrix is coded in upper case, and
>> >> the
>> >> underlying case of the DNA sequence is being used in the alignment.
>> >> Seems
>> >> like a bug to me.
>> >>
>> >>  This problem occurs when the DNA Sequence is create either using the
>> >> DNASequence constructor, or reading from a fasta which is in lower
>> >> case.
>> >>
>> >>
>> >> The code below shows the problem.
>> >>
>> >>
>> >>     static SimpleGapPenalty gapP;
>> >>     static SubstitutionMatrix<NucleotideCompound> matrix;
>> >>
>> >>     public static void main(String[] args)throws Exception{
>> >>        matrix = SubstitutionMatrixHelper.getNuc4_4();
>> >>         gapP = new SimpleGapPenalty();
>> >>         gapP.setOpenPenalty((short)5);
>> >>         gapP.setExtensionPenalty((short)2);
>> >>         testHardcoded();
>> >>     }
>> >>
>> >>     public static void testHardcoded()throws Exception{
>> >>        Sequence<NucleotideCompound> seq1 = new
>> >> DNASequence("AGGGCTTTACCCCGGTTAA");
>> >>         Sequence<NucleotideCompound> seq2 = new
>> >> DNASequence("ACCCCGGTTTAATATTTTT");
>> >>         Sequence<NucleotideCompound> seq3 = new
>> >> DNASequence("agggctttaccccggttaa");
>> >>         Sequence<NucleotideCompound> seq4 = new
>> >> DNASequence("accccggtttaatattttt");
>> >>         alignPair(seq1,seq2);
>> >>         alignPair(seq1,seq4);
>> >>         alignPair(seq3,seq4);
>> >>
>> >>     }
>> >>
>> >>
>> >>     public static void alignPair(Sequence<NucleotideCompound> seq1,
>> >> Sequence<NucleotideCompound> seq2){
>> >>                 SequencePair<Sequence<NucleotideCompound>,
>> >> NucleotideCompound> pair =
>> >>                         Alignments.getPairwiseAlignment(seq1,seq2,
>> >>                         Alignments.PairwiseSequenceAlignerType.GLOBAL,
>> >> gapP, matrix);
>> >>
>> >>         System.out.printf("%s", pair);
>> >>         System.out.println();
>> >>     }
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> biojava-dev mailing list
>> >> biojava-dev at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >>
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>



More information about the biojava-dev mailing list