[BioSQL-l] Recording "nucleotide" in the sequence table?
Hilmar Lapp
hlapp at gmx.net
Sat May 16 16:48:40 UTC 2009
On May 16, 2009, at 7:53 AM, Peter wrote:
> In a recent bug report (Bug 2829) it was pointed out that we
> (Biopython) don't attempt to record nucleotide alphabets in BioSQL
> (i.e. a sequence which could be DNA or RNA but we don't know which),
> they just get "unknown" as their biosequence.alphabet entry.
I'm assuming that you do know that it's not protein, right? I.e.,
assigning alphabet "unknown" isn't exactly right.
> Is there any precedent in BioPerl, BioJava or BioRuby for how to
> handle this? If not, I'd like to introduce and agree on "nucleotide"
> for this situation.
So which letters (symbols) does the "nucleotide" alphabet contain?
Getting back to Mark's question, how do you know that it's either dna
or rna but not protein? Is the problem that the user can't tell you
whether it's dna or rna but they know it's not protein, or is it that
the user doesn't say anything and all you have is the symbols of the
sequence, which are a, c, g, and t only.
In BioPerl we'll guess the alphabet if the user doesn't say what it
is, and at present if what we're seeing are the symbols a, c, g, and t
only, then the guess is dna. If we're seeing u rather than t, we guess
it's rna. An "unknown" alphabet would be for the user to expressly
choose.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list