[Biopython-dev] [Bug 2829] BioSQL does not record a generic nucleotide alphabet

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sat May 16 11:37:52 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2829


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
            Summary|Biosequence.alphabet can be |BioSQL does not record a
                   |set to unknown after loading|generic nucleotide alphabet
                   |a nucleotide SeqRecord      |




------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-05-16 07:37 EST -------
Biopython has a relatively rich range of alphabets, including IUPAC ambiguous
and unambiguous alphabets, plus ways to indicate gap characters and stop
symbols.  The BioSQL range is much simpler, so some information is inevitably
lost.

In BioSQL, all we store is a simple string, "dna", "rna", "protein" or
"unknown" (although BioJava used uppercase, so that is effectively allowed
too). See:
http://www.biosql.org/wiki/Enhancement_Requests#Check_constraint_on_biosequence.alphabet

This means if your sequence was using "IUPAC extended protein with a * stop
codon", all we can record is "protein". i.e. On retrieval from a BioSQL
database, the alphabet is simply a generic protein.  Likewise "ambiguous IUAC
DNA with minus as the gap character" just becomes generic DNA.

Note that as far as I know, currently none of the Bio* languages attempt to
record "nucleotide" (i.e. "dna" or "rna").  This is something we should discuss
on the BioSQL mailing list as a possible enhancement.

So in answer to your question "Is this expected?", yes, a generic nucleotide
alphabet isn't "dna", "rna" or "protein" so is currently recorded in the BioSQL
database as "unknown".  This gets turned into the SingleLetterAlphabet on
retrieval.

Changing title to "BioSQL does not record a generic nucleotide alphabet" and
marking this as an enhancement.

Peter

P.S. Are you just testing here, or do you really not know if your sequence is
DNA or RNA?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list