[Biojava-l] Setting anchors in AnchoredPairwiseSequenceAligner
Daniel Cameron
cameron.d at wehi.EDU.AU
Wed Nov 27 05:28:17 UTC 2013
I was using the (passing) NW test as a baseline to compare the (failing)
anchored test to so as to make sure I was using the BioJava API
correctly. In my particular use case, I want to perform global alignment
with only the first base anchored but unfortunately, NW does not expose
get/set to anchor (and the meaning of anchoring a base for SW/local
alignment is a bit ambiguous).
A bug has been raised with a test case for ease of reproduction.
Cheers
Daniel
On 27/11/2013 3:34 PM, Andreas Prlic wrote:
> needleman -wunsch is doing a global alignment, try smith waterman
> instead.
>
> can you file a bug report on github?
> https://github.com/biojava/biojava/issues
>
> Thanks,
>
> Andreas
>
>
> On Tue, Nov 26, 2013 at 7:54 PM, Daniel Cameron <cameron.d at wehi.edu.au
> <mailto:cameron.d at wehi.edu.au>> wrote:
>
> That API does make sense but unfortunately, it crashes when I try
> it. Having traced through the source & test cases from github, it
> appears that there isn't actually any test coverage for the
> (anchor != null) path in align().
>
> Additionally, align() also explicitly forces the final base of the
> query to align to the final base of the target which in my case is
> not supposed to be an anchor. I've copied my test cases with
> expected behaviour. Is there a bug track system I should be
> raising this with?
>
> Cheers
> Daniel
>
> @Test
> public void testNeedlemanWunschDNAAlignment() {
> DNASequence query = new DNASequence("ACGTAACCGGTT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> DNASequence target = new
> DNASequence("AACGTAACCGGTTACGTACGT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> // 123456789012345678900
> // Expected: |----------|
> NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new
> NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target,
> new SimpleGapPenalty((short)5, (short)2),
> SubstitutionMatrixHelper.getNuc4_4());
> AlignedSequence<DNASequence, NucleotideCompound> aligned =
> nw.getPair().getQuery();
> assertEquals(2, (int)aligned.getStart().getPosition());
> assertEquals(13, (int)aligned.getEnd().getPosition());
> }
> @Test
> public void testAnchoredDNAAlignment() {
> DNASequence query = new DNASequence("ACGTAACCGGTT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> DNASequence target = new
> DNASequence("AACGTAACCGGTTACGTACGT",
> AmbiguityDNACompoundSet.getDNACompoundSet());
> // 123456789012345678900
> // Expected: |_----------|
> AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>
> aligner = new AnchoredPairwiseSequenceAligner<DNASequence,
> NucleotideCompound>(query, target, new SimpleGapPenalty((short)5,
> (short)2), SubstitutionMatrixHelper.getNuc4_4());
> int[] anchors = new int[query.getLength()];
> for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
> anchors[0] = 0;
> AlignedSequence<DNASequence, NucleotideCompound> aligned =
> aligner.getPair().getQuery();
> assertEquals(1, (int)aligned.getStart().getPosition());
> assertEquals(13, (int)aligned.getEnd().getPosition());
>
> }
>
> On 27/11/2013 12:00 PM, Andreas Prlic wrote:
>> Hi Daniel,
>>
>> Clearly we need some documentation for this.
>>
>> Looking at the source: if you get to
>> AbstractMatrixAligner.align() you can see how the anchors are
>> being used.
>>
>> I just took a brief look, so I might be wrong, but I think the
>> int[] array should have the length of the query sequence and each
>> position indicates the counterpart in the target sequence.
>> Positions that are <=0 should be considered not to be anchored.
>> If this is right, then this should be very close to what you were
>> expecting.
>>
>> Can you take a closer look and confirm if this works for you?
>>
>> Thanks,
>>
>> Andreas
>>
>> On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron
>> <cameron.d at wehi.edu.au <mailto:cameron.d at wehi.edu.au>> wrote:
>>
>> Hello all
>>
>> I'm looking the BioJava API for anchored alignment and I'm
>> unsure of how to set alignment anchors. The API exposes int[]
>> get/setAnchor() which is confusing me somewhat and I'm unsure
>> of what a BioJava anchor actually is. My use case is the
>> comparison of two sequences which I have a priori knowledge
>> of particular positions so I was expecting an API where I
>> could specify base positions in the query to correspond
>> different positions in the target. Eg: I was query base 4 to
>> align to target base 10 and query base 100 to align to target
>> base 80. Is this possible?
>>
>> With anchor being only an int[] and the source looking like
>> this is not an array of paired values, how would one go about
>> performing the above alignment?
>>
>>
>> Thanks
>> Daniel Cameron
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended
>> solely for the addressee.
>> You must not disclose, forward, print or use it without the
>> permission of the sender.
>> ______________________________________________________________________
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> <mailto:Biojava-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
> ______________________________________________________________________
> The information in this email is confidential and intended solely
> for the addressee.
> You must not disclose, forward, print or use it without the
> permission of the sender.
> ______________________________________________________________________
>
>
>
>
______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
More information about the Biojava-l
mailing list