[Biojava-dev] SimpleGappedSymbolList problem, wierd "String seqString()" results.

Matthew Pocock matthew_pocock at yahoo.co.uk
Thu Feb 20 02:34:34 EST 2003


Eugh. Well spotted. I'll take a look tomorrow.

Matthew

Kalle Näslund wrote:
> Hi!
> 
> I noticed that if you insert leading or trailing gaps, and then call the 
> seqString() you get "n" instead of "-". To illustrate it a bit better. 
> the following set of gap operations on a SimpleGappedSymbolList :
> 
> 
> 
> Alphabet           dna         =   DNATools.getDNA();
> SymbolTokenization dnaParser   =   dna.getTokenization( "token" );
> 
> SymbolList       symList1    =   new SimpleSymbolList( dnaParser, new 
> String( "TTCCTTCCGGGTCGTC" ) );
> GappedSymbolList gl1         =   new SimpleGappedSymbolList( symList1 );
> 
> System.out.println( gl1.seqString() );
> gl1.addGapsInSource( 1, 4 );
> System.out.println( gl1.seqString() );
> gl1.addGapsInSource( 10, 2 );
> System.out.println( gl1.seqString() );
> gl1.addGapsInSource( 17, 4 );
> System.out.println( gl1.seqString() );
> 
> gives this result :
> 
> ttccttccgggtcgtc
> nnnnttccttccgggtcgtc
> nnnnttccttccg--ggtcgtc
> nnnnttccttccg--ggtcgtcnnnn
> 
> 
> I havent manage to fully understand why this happens, but the start of 
> the story goes like this :
> 
> 1) SimpleGappedSymbolList's symbolAt method returns different gap 
> symbols depending on if the gap symbol is an "internal" gap or a 
> leading/trailing gap. the relevant piece of code in the symbolAt method 
> is :
>     if( (indx < firstNonGap()) || (indx > lastNonGap()) ) {
>             return Alphabet.EMPTY_ALPHABET.getGapSymbol();
>           }
>     else {
>             return getAlphabet().getGapSymbol();
>           }
> 
> 2) When one call seqString on a SimpleGappedSymbolList it simple uses 
> the method it inherited from AbstractSymbolList,that looks like this.
> 
> 
>     public String seqString() {
>           try {
>         SymbolTokenization toke =                         
> getAlphabet().getTokenization("token");
>         return toke.tokenizeSymbolList(this);
>           }
>     catch (BioException ex) {
>           throw new BioRuntimeException(ex, "Couldn't 
> tokenize                                  sequence");
>           }
>       }
> 
> so, what happens is that all symbols, get fed to the SymbolTokenization 
> object, that you get from whatever the default alphabet a DNA 
> SimpleGappedSequence uses. if you feed the gapsymbol you get from 
> Alphabet.EMPTY_ALPHABET.getGapSymbol() to this SymbolTokenizer it 
> returns a "n" and not a "-".
> 
> At this point my limited knowledge of the black arts of Alphabets in 
> biojava stoped me from writing the end of the story, and was hoping that 
>  someone else might end it for me =),
> 
> regards Kalle
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com



More information about the biojava-dev mailing list