[Biojava-l] Sequence Iteration in BioJava(x)
Mark Fortner
m.fortner at sbcglobal.net
Thu Dec 15 21:36:11 EST 2005
Richard,
Thanks for the example. Your approach is very similar to a non-BioJava
approach that I had worked out earlier. I was wondering if the
BioJava(x) API provides any performance benefit over simply running a
window along a character stream?
The work that we're doing involves iterating through the human genome,
(and in a number of cases, metagenomic sequences) and we're trying to
squeeze as much performance out of it as possible while minimizing the
memory footprint.
Thanks,
Mark
Richard HOLLAND wrote:
>orderNSymbolList splits the sequence into non-overlapping chunks. What
>is required here is chunks that are only one base different (further on)
>than the previous chunk.
>
>The simplest way would be this:
>
> SymbolList mySeq; // this is your sequence from somewhere else
> for (int i = 1 ; i <= mySeq.length()-2; i++) {
> SymbolList trimer = mySeq.subSeq(i,i+2); // coords are
>inclusive so i to i+2 = 3 bases
> // do something with your trimer here
> }
>
>Note that the index starts at 1 and goes right up to and including
>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
>
>cheers,
>Richard
>
>Richard Holland
>Bioinformatics Specialist
>GIS extension 8199
>---------------------------------------------
>This email is confidential and may be privileged. If you are not the
>intended recipient, please delete it and notify us immediately. Please
>do not copy or use it for any purpose, or disclose its content to any
>other person. Thank you.
>---------------------------------------------
>
>
>
>
>>-----Original Message-----
>>From: biojava-l-bounces at portal.open-bio.org
>>[mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of David Huen
>>Sent: Friday, December 16, 2005 7:34 AM
>>To: m.fortner at sbcglobal.net
>>Cc: biojava-list
>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
>>
>>
>>On Dec 15 2005, Mark Fortner wrote:
>>I think what you want is the SymbolListViews.orderNSymbolList method.
>>
>>It will take a SymbolList and turn it into another where it
>>is viewed in
>>another compound alphabet of the required order.
>>
>>
>>
>>
>>>I'm looking for the best way to iterate through all
>>>nmers within a given sequence. For example, given a
>>>sequence that looks like this:
>>>
>>>ACTGACTGACTG
>>>
>>>If I extract all trimers from this I should get:
>>>
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>
>>>Is there an existing class that will allow me to
>>>iterate through a sequence this way, or am I on my
>>>own?
>>>
>>>
>>>
>>_______________________________________________
>>Biojava-l mailing list - Biojava-l at biojava.org
>>http://biojava.org/mailman/listinfo/biojava-l
>>
>>
>>
>
>
>
More information about the Biojava-l
mailing list