[Biojava-l] How to create a SymbolList with a String that contains
illegal Char
Tao Xu
taoxu at bioinformatics.ubc.ca
Mon Dec 8 21:01:36 EST 2003
Hi there,
Does anyone know how to create a SymbolList with a String that
contains illegal symbol?
I encountered IllegalSymbolException when I tried to retrieve
sequences from a sequence database. The sequence that gave me the
trouble was a refseq sequence, accession number NT_039621, Mus
musculus chromosome 15 genomic contig. I firsted used
DNATools.createDNA(String dna), and got IllegalSymbolException that
indicated there was at least one 'u' in the sequence. I then used
NucleotideTools.createNucleotide(String nucleotide), this time the 'u'
did not cause any problem, but however I sitll got
IllegalSymbolException that inidicated there was 'l' in the sequence.
I am afraid there must be lots of illegal symbols in GenBank's
sequences, I am wondering if there is a way to create error-tolerate
SymbolList object. If not, I am afraid I have to create an Alphabet
object that contains Symbols that covers all char in java and use this
Alphabet object to create a CharacterTokenization using
CharacterTokenization(Alphabet alpha, boolean caseSensitive)
constructor, and then use the resulting CharacterTokenization object
to call SimpleSymbolList(SymbolTokenization st, String seqString) to
get a SimpleSymbolList object. I guess there must be a better way in
Biojava to do this. Your help is highly appreciated.
If I have to create an Alphatebet that covers all char in Java, how
can I do it? I originally thought merge NUCLEOTIDE and PROTEIN
Alphabet to create a new Alphabet would be able to cover all the
Symboles in GenBank sequences, but I noticed there was no method to
merge to Alphabets in AlphabetManager. Is there a way to merge two
Alphabets? If not, probably it is worth to implement one. It will be
useful not only to handle IllegalSymbols exist in the databases, but
also other applications like using non-standard symbols to generate
blastable MSBlast database.
Thanks a lot for your help.
Regards,
Tao
More information about the Biojava-l
mailing list