[Biojava-l] Setting anchors in AnchoredPairwiseSequenceAligner
Daniel Cameron
cameron.d at wehi.EDU.AU
Wed Nov 27 03:54:56 UTC 2013
That API does make sense but unfortunately, it crashes when I try it.
Having traced through the source & test cases from github, it appears
that there isn't actually any test coverage for the (anchor != null)
path in align().
Additionally, align() also explicitly forces the final base of the query
to align to the final base of the target which in my case is not
supposed to be an anchor. I've copied my test cases with expected
behaviour. Is there a bug track system I should be raising this with?
Cheers
Daniel
@Test
public void testNeedlemanWunschDNAAlignment() {
DNASequence query = new DNASequence("ACGTAACCGGTT",
AmbiguityDNACompoundSet.getDNACompoundSet());
DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT",
AmbiguityDNACompoundSet.getDNACompoundSet());
// 123456789012345678900
// Expected: |----------|
NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new
NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target, new
SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4());
AlignedSequence<DNASequence, NucleotideCompound> aligned =
nw.getPair().getQuery();
assertEquals(2, (int)aligned.getStart().getPosition());
assertEquals(13, (int)aligned.getEnd().getPosition());
}
@Test
public void testAnchoredDNAAlignment() {
DNASequence query = new DNASequence("ACGTAACCGGTT",
AmbiguityDNACompoundSet.getDNACompoundSet());
DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT",
AmbiguityDNACompoundSet.getDNACompoundSet());
// 123456789012345678900
// Expected: |_----------|
AnchoredPairwiseSequenceAligner<DNASequence,
NucleotideCompound> aligner = new
AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>(query,
target, new SimpleGapPenalty((short)5, (short)2),
SubstitutionMatrixHelper.getNuc4_4());
int[] anchors = new int[query.getLength()];
for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
anchors[0] = 0;
AlignedSequence<DNASequence, NucleotideCompound> aligned =
aligner.getPair().getQuery();
assertEquals(1, (int)aligned.getStart().getPosition());
assertEquals(13, (int)aligned.getEnd().getPosition());
}
On 27/11/2013 12:00 PM, Andreas Prlic wrote:
> Hi Daniel,
>
> Clearly we need some documentation for this.
>
> Looking at the source: if you get to AbstractMatrixAligner.align() you
> can see how the anchors are being used.
>
> I just took a brief look, so I might be wrong, but I think the int[]
> array should have the length of the query sequence and each position
> indicates the counterpart in the target sequence. Positions that are
> <=0 should be considered not to be anchored. If this is right, then
> this should be very close to what you were expecting.
>
> Can you take a closer look and confirm if this works for you?
>
> Thanks,
>
> Andreas
>
> On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron
> <cameron.d at wehi.edu.au <mailto:cameron.d at wehi.edu.au>> wrote:
>
> Hello all
>
> I'm looking the BioJava API for anchored alignment and I'm unsure
> of how to set alignment anchors. The API exposes int[]
> get/setAnchor() which is confusing me somewhat and I'm unsure of
> what a BioJava anchor actually is. My use case is the comparison
> of two sequences which I have a priori knowledge of particular
> positions so I was expecting an API where I could specify base
> positions in the query to correspond different positions in the
> target. Eg: I was query base 4 to align to target base 10 and
> query base 100 to align to target base 80. Is this possible?
>
> With anchor being only an int[] and the source looking like this
> is not an array of paired values, how would one go about
> performing the above alignment?
>
>
> Thanks
> Daniel Cameron
>
> ______________________________________________________________________
> The information in this email is confidential and intended solely
> for the addressee.
> You must not disclose, forward, print or use it without the
> permission of the sender.
> ______________________________________________________________________
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> <mailto:Biojava-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
More information about the Biojava-l
mailing list