[Biojava-l] case-sensitive sequences

Wed Feb 28 07:48:46 UTC 2007

i've changed my code and called the RestrictionSiteFinder with the new
sequence. it's throwed this exception.

Exception in thread "Thread-25"
java.lang.UnsupportedOperationException: Ambiguity should be handled
at the level of the wrapped Alphabet
        at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183)
        at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223)
        at org.biojava.bio.seq.io.SymbolListCharSequence.<init>(SymbolListCharSequence.java:75)
        at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73)
        at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295)

i understand why it didn't work (lower case symbol 'a' and upper
symbol 'A'), but i can't find a solution. Any idea?

On 2/28/07, ilhami visne <ilhami.visne at gmail.com> wrote:
> Thank you. it does now. i should able to find it myself, but i am really
> not a bioinformaticians yet.
>
> my code (maybe there is someone, who has the same problem like me)
>
> BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
>
> Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA());
> SymbolTokenization dnaParser = dna.getTokenization("token");
>
> RichSequenceIterator iter =
> RichSequence.IOTools.readFasta(br,dnaParser,null);
> RichSequence rs = iter.nextRichSequence();
>
> Mark Schreiber wrote:
> > Hi -
> >
> > There are also the classes: SoftMaskedAlphabet and
> > SoftMaskedAlphabet.CaseSensitiveTokenization and
> > SoftMaskedAlphabet.MaskingDetector. Together these classes let you
> > read a sequence that contains case sensitive information and (if you
> > wish) make use of that information. You can also write out the
> > sequence in the original case sensitive format.
> >
> > It was originally designed for reading data that had been 'softmasked'
> > for low complexity regions (eg lower case regions are low complexity
> > and would be ignored in subsequent analysis) but it would be used for
> > quality or any other distinction.
> >
> > - Mark
> >
> > On 2/28/07, ilhami visne <ilhami.visne at gmail.com> wrote:
> >> Thank you for quick answer. Here is the part of my code:
> >>
> >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
> >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null);
> >> RichSequence rs = iter.nextRichSequence();
> >>
> >> Richard Holland wrote:
> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> > Hash: SHA1
> >> >
> >> > DNA is not case-sensitive. What I suspect you are parsing is the
> >> output
> >> > of some sequencing software which is using case as a rough
> >> indicator of
> >> > base calling quality?
> >> >
> >> > The case will have been lost when the file was parsed, not at the
> >> moment
> >> > you iterate over the resulting sequences. This means that you have to
> >> > modify your file parsing method to become case-sensitive.
> >> >
> >> > The default DNA alphabet is not case-sensitive. It makes no
> >> distinction
> >> > between the two, and will convert everything to one case.
> >> >
> >> > If you need to preserve case, you will need to use a custom alphabet
> >> > which treats the cases differently, and also specify a tokenizer which
> >> > is case-sensitive. See the help pages at http://biojava.org/ for
> >> help on
> >> > creating new alphabets. Or, have a look at the ABITools.QUALITY
> >> alphabet
> >> > in BioJava, which interprets the case and stores the quality scores
> >> > separately.
> >> >
> >> > Note however that your custom alphabet is NOT the same as the original
> >> > DNA alphabet, and so you may not be able to use it in all the standard
> >> > transforms (RNA etc.). If you do want to use these then you will
> >> have to
> >> > make a second copy of each sequence using the normal DNA alphabet and
> >> > pass that copy to the routines.
> >> >
> >> > If you post to this list the code you are using to read the file,
> >> then I
> >> > can show you where to insert the reference to this new alphabet.
> >> >
> >> > cheers,
> >> > Richard
> >> >
> >> > Ilhami Visne wrote:
> >> >
> >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg)
> >> and i am
> >> >> using now RichSequenceIterator to iterate over the sequences.
> >> >>
> >> >> How can i tell biojava that it should parse it case-sensitive? if
> >> i call
> >> >> seq.seqString() method, it should return exactly like it was in
> >> the file
> >> >> with upper- and lower-case.
> >> >>
> >> >> thanx.
> >> >> _______________________________________________
> >> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> >>
> >> >>
> >> > -----BEGIN PGP SIGNATURE-----
> >> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >> >
> >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv
> >> > uZKlrdE8y6vMfKcOlm9yBZA=
> >> > =2VZC
> >> > -----END PGP SIGNATURE-----
> >> >
> >> >
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >
>
>