[Biojava-dev] Case sensitivity in Alignment
Michael Heuer
heuermh at gmail.com
Wed Nov 20 16:22:06 UTC 2013
Hello Andreas, David
Lower case is the convention for soft-masking sequences from alignment
http://www.ncbi.nlm.nih.gov/books/NBK1763/
http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.Create_a_masked_BLAST
If we are using this convention, perhaps it should be more clearly
documented. What happens if you use mixed case?
michael
On Wed, Nov 20, 2013 at 5:29 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Hi David,
>
> not sure if we should consider this a bug or a feature: It should be easy
> to work around this by calling toUppercase on your strings. We could of
> course internally convert all nucleotides to upper case, but that would
> remove the possibility for people to use mixed upper case and lower case
> sequences to represent e.g. alignment conservation.
>
> Any opinions by other people on this? Is anybody using mixed case sequences?
>
> Andreas
>
>
> On Mon, Nov 18, 2013 at 11:43 AM, Waring, David A <dwaring at fhcrc.org> wrote:
>
>>
>> There seems to be a bug in the alignment package. If DNA sequences are
>> created using lower case letters, the alignment methods don't work. Looks
>> like the the default substitution matrix is coded in upper case, and the
>> underlying case of the DNA sequence is being used in the alignment. Seems
>> like a bug to me.
>>
>> This problem occurs when the DNA Sequence is create either using the
>> DNASequence constructor, or reading from a fasta which is in lower case.
>>
>>
>> The code below shows the problem.
>>
>>
>> static SimpleGapPenalty gapP;
>> static SubstitutionMatrix<NucleotideCompound> matrix;
>>
>> public static void main(String[] args)throws Exception{
>> matrix = SubstitutionMatrixHelper.getNuc4_4();
>> gapP = new SimpleGapPenalty();
>> gapP.setOpenPenalty((short)5);
>> gapP.setExtensionPenalty((short)2);
>> testHardcoded();
>> }
>>
>> public static void testHardcoded()throws Exception{
>> Sequence<NucleotideCompound> seq1 = new
>> DNASequence("AGGGCTTTACCCCGGTTAA");
>> Sequence<NucleotideCompound> seq2 = new
>> DNASequence("ACCCCGGTTTAATATTTTT");
>> Sequence<NucleotideCompound> seq3 = new
>> DNASequence("agggctttaccccggttaa");
>> Sequence<NucleotideCompound> seq4 = new
>> DNASequence("accccggtttaatattttt");
>> alignPair(seq1,seq2);
>> alignPair(seq1,seq4);
>> alignPair(seq3,seq4);
>>
>> }
>>
>>
>> public static void alignPair(Sequence<NucleotideCompound> seq1,
>> Sequence<NucleotideCompound> seq2){
>> SequencePair<Sequence<NucleotideCompound>,
>> NucleotideCompound> pair =
>> Alignments.getPairwiseAlignment(seq1,seq2,
>> Alignments.PairwiseSequenceAlignerType.GLOBAL,
>> gapP, matrix);
>>
>> System.out.printf("%s", pair);
>> System.out.println();
>> }
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list