[Biojava-dev] Case sensitivity in Alignment

Michael Heuer heuermh at gmail.com
Wed Nov 20 16:22:06 UTC 2013


Hello Andreas, David

Lower case is the convention for soft-masking sequences from alignment

http://www.ncbi.nlm.nih.gov/books/NBK1763/
http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.Create_a_masked_BLAST

If we are using this convention, perhaps it should be more clearly
documented.  What happens if you use mixed case?

   michael


On Wed, Nov 20, 2013 at 5:29 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Hi David,
>
> not sure if we should consider this a bug or a feature: It should be easy
> to work around this by calling toUppercase on your strings. We could of
> course internally convert all nucleotides to upper case, but that would
> remove the possibility for people to use mixed upper case and lower case
> sequences to represent e.g. alignment conservation.
>
> Any opinions by other people on this? Is anybody using mixed case sequences?
>
> Andreas
>
>
> On Mon, Nov 18, 2013 at 11:43 AM, Waring, David A <dwaring at fhcrc.org> wrote:
>
>>
>> There seems to be a bug in the alignment package. If DNA sequences are
>> created using lower case letters, the alignment methods don't work. Looks
>> like the the default substitution matrix is coded in upper case, and the
>> underlying case of the DNA sequence is being used in the alignment. Seems
>> like a bug to me.
>>
>>  This problem occurs when the DNA Sequence is create either using the
>> DNASequence constructor, or reading from a fasta which is in lower case.
>>
>>
>> The code below shows the problem.
>>
>>
>>     static SimpleGapPenalty gapP;
>>     static SubstitutionMatrix<NucleotideCompound> matrix;
>>
>>     public static void main(String[] args)throws Exception{
>>        matrix = SubstitutionMatrixHelper.getNuc4_4();
>>         gapP = new SimpleGapPenalty();
>>         gapP.setOpenPenalty((short)5);
>>         gapP.setExtensionPenalty((short)2);
>>         testHardcoded();
>>     }
>>
>>     public static void testHardcoded()throws Exception{
>>        Sequence<NucleotideCompound> seq1 = new
>> DNASequence("AGGGCTTTACCCCGGTTAA");
>>         Sequence<NucleotideCompound> seq2 = new
>> DNASequence("ACCCCGGTTTAATATTTTT");
>>         Sequence<NucleotideCompound> seq3 = new
>> DNASequence("agggctttaccccggttaa");
>>         Sequence<NucleotideCompound> seq4 = new
>> DNASequence("accccggtttaatattttt");
>>         alignPair(seq1,seq2);
>>         alignPair(seq1,seq4);
>>         alignPair(seq3,seq4);
>>
>>     }
>>
>>
>>     public static void alignPair(Sequence<NucleotideCompound> seq1,
>> Sequence<NucleotideCompound> seq2){
>>                 SequencePair<Sequence<NucleotideCompound>,
>> NucleotideCompound> pair =
>>                         Alignments.getPairwiseAlignment(seq1,seq2,
>>                         Alignments.PairwiseSequenceAlignerType.GLOBAL,
>> gapP, matrix);
>>
>>         System.out.printf("%s", pair);
>>         System.out.println();
>>     }
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev



More information about the biojava-dev mailing list