[Biojava-l] How to find a sequence within a larger sequence and flip it
Doug Swisher
big.swish at gmail.com
Fri Sep 19 03:37:59 UTC 2008
Hi,
I'm pretty new to BioJava, and I'm a bit stuck. I'm hoping someone can help
out a bit...even if it's just a hint as to where to look next.
I have a long DNA sequence and a shorter sequence that exists within the
larger one. I want to find the location of the smaller sequence within the
larger one, and then create a new sequence with the small one flipped
end-for-end. That's confusing, so let me give an example.
Long sequence: aaaagacttttt
Short sequence: gact
Goal sequence: aaaatcagtttt
To find the location of the short sequence within the larger one, I could
certainly do some string manipulation:
SymbolList bigDNA = DNATools.createDNA("aaaagacttttt");
SymbolList subDNA = DNATools.createDNA("gact");
int start = bigDNA.seqString().indexOf(subDNA.seqString());
While that would work, I'm wondering if there is a more efficient method
that avoids the conversion to strings (in my real code, I start with
Sequences, not strings; I used SymbolLists here for simplicity).
To "excise" the short sequence, flip it around, and construct a new
SymbolList, I could also do some string manipulation, as in the following:
StringBuilder middle = new StringBuilder(subDNA.seqString());
String leftPart = bigDNA.seqString().substring(0, subDNA.length());
String rightPart = bigDNA.seqString().substring(start + subDNA.length(),
bigDNA.length());
SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() +
rightPart);
Looking at the documentation, such as ProjectionUtils or SymbolList.edit(),
it appears there might be some support for manipulating the sequence
directly. Is there a way to do it, without again dropping "down" to
strings?
Thanks in advance for any assistance.
Cheers,
-Doug
P.S. Yeah, the second code snippet is pretty inefficient; I was trying to be
clear rather than efficient.
More information about the Biojava-l
mailing list