[Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects

Mon Dec 22 17:33:33 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2532

------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:33 EST -------
Created an attachment (id=1174)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1174&action=view)
Patch for Bio/Nexus/Nexus.py (non IUPAC) alphabet handling

(In reply to comment #2)
> I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for
> everyone (instead creating their own uppercase-lowercase variants of those
> terribly complicated biopython alphabet classes), and easy to change for all
> other modules if lowercase-uppercase is what they want (or need).

I'm not saying we shouldn't add mixed (and even lower) case variants of the
IUPAC alphabets, however, even if we had them, NEXUS still uses extra
characters like "-" for gaps (easily handled via a Gapped alphabet encoder) and
"?" (for a missing character).  Are there any other extra characters?

Under the current alphabet schema, we'd have to use a (mixed case) IUPAC
alphabet, then add a Gapped AlphabetEncoder (easy) then add a new alphabet
encoder for any misc letters non-IUPAC characters like "?".  This could be done
with the generic AlphabetEncoder, or we could add additional encoder objects
for special meanings.  This starts to get complicated (dealing with
AlphabetEncoders is nasty).

This attached patch is a variation on my "plan (a)" from comment 0. It makes
Bio.Nexus create its own alphabet objects (based on the generic DNA/RNA/Protein
classes) with the precise list of valid letters required for that file.  Using
this patch should allow us to press ahead with Bug 2597.

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.