[Biojava-l] Setting anchors in AnchoredPairwiseSequenceAligner
Andreas Prlic
andreas at sdsc.edu
Wed Nov 27 04:34:15 UTC 2013
needleman -wunsch is doing a global alignment, try smith waterman instead.
can you file a bug report on github?
https://github.com/biojava/biojava/issues
Thanks,
Andreas
On Tue, Nov 26, 2013 at 7:54 PM, Daniel Cameron <cameron.d at wehi.edu.au>wrote:
> That API does make sense but unfortunately, it crashes when I try it.
> Having traced through the source & test cases from github, it appears that
> there isn't actually any test coverage for the (anchor != null) path in
> align().
>
> Additionally, align() also explicitly forces the final base of the query
> to align to the final base of the target which in my case is not supposed
> to be an anchor. I've copied my test cases with expected behaviour. Is
> there a bug track system I should be raising this with?
>
> Cheers
> Daniel
>
> @Test
> public void testNeedlemanWunschDNAAlignment() {
> DNASequence query = new DNASequence("ACGTAACCGGTT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> // 123456789012345678900
> // Expected: |----------|
> NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new
> NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target, new
> SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4());
> AlignedSequence<DNASequence, NucleotideCompound> aligned =
> nw.getPair().getQuery();
> assertEquals(2, (int)aligned.getStart().getPosition());
> assertEquals(13, (int)aligned.getEnd().getPosition());
> }
> @Test
> public void testAnchoredDNAAlignment() {
> DNASequence query = new DNASequence("ACGTAACCGGTT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> // 123456789012345678900
> // Expected: |_----------|
> AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>
> aligner = new AnchoredPairwiseSequenceAligner<DNASequence,
> NucleotideCompound>(query, target, new SimpleGapPenalty((short)5,
> (short)2), SubstitutionMatrixHelper.getNuc4_4());
> int[] anchors = new int[query.getLength()];
> for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
> anchors[0] = 0;
> AlignedSequence<DNASequence, NucleotideCompound> aligned =
> aligner.getPair().getQuery();
> assertEquals(1, (int)aligned.getStart().getPosition());
> assertEquals(13, (int)aligned.getEnd().getPosition());
>
> }
>
> On 27/11/2013 12:00 PM, Andreas Prlic wrote:
>
> Hi Daniel,
>
> Clearly we need some documentation for this.
>
> Looking at the source: if you get to AbstractMatrixAligner.align() you
> can see how the anchors are being used.
>
> I just took a brief look, so I might be wrong, but I think the int[]
> array should have the length of the query sequence and each position
> indicates the counterpart in the target sequence. Positions that are <=0
> should be considered not to be anchored. If this is right, then this should
> be very close to what you were expecting.
>
> Can you take a closer look and confirm if this works for you?
>
> Thanks,
>
> Andreas
>
> On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron <cameron.d at wehi.edu.au>wrote:
>
>> Hello all
>>
>> I'm looking the BioJava API for anchored alignment and I'm unsure of how
>> to set alignment anchors. The API exposes int[] get/setAnchor() which is
>> confusing me somewhat and I'm unsure of what a BioJava anchor actually is.
>> My use case is the comparison of two sequences which I have a priori
>> knowledge of particular positions so I was expecting an API where I could
>> specify base positions in the query to correspond different positions in
>> the target. Eg: I was query base 4 to align to target base 10 and query
>> base 100 to align to target base 80. Is this possible?
>>
>> With anchor being only an int[] and the source looking like this is not
>> an array of paired values, how would one go about performing the above
>> alignment?
>>
>>
>> Thanks
>> Daniel Cameron
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
> ______________________________________________________________________
> The information in this email is confidential and intended solely for the
> addressee.
> You must not disclose, forward, print or use it without the permission of
> the sender.
> ______________________________________________________________________
>
More information about the Biojava-l
mailing list