[Biopython-dev] [Bug 2597] Enforce alphabet letters in Seq objects

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Nov 23 06:37:46 EST 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2597





------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-11-23 06:37 EST -------
As recently noted on the mailing list, making the Seq alphabet check strict
could be useful for file format validation with Bio.SeqIO or Bio.AlignIO, since
the parse and read functions can be given an alphabet.

e.g. While this would be allowed:

from Bio import SeqIO
from Bio.Alphabet.IUPAC import extended_protein
from Bio.Alphabet import Gapped
from StringIO import StringIO
fasta_str = "\n\n\n>ID\nABCDEFGH-IPX\n"
record = SeqIO.read(StringIO(fasta_str), "fasta", Gapped(extended_protein,
"-"))

If the Seq object checked the alphabet letters, this would fail due to the
minus sign:

>>> record = SeqIO.read(StringIO(fasta_str), "fasta", extended_protein)

If the user doesn't care about the precise letters, they can use the default
generic alphabet, e.g.

>>> record = SeqIO.read(StringIO(fasta_str), "fasta")

or, to at least specify this is a protein sequence:

>>> from Bio.Alphabet import generic_protein
>>> record = SeqIO.read(StringIO(fasta_str), "fasta", generic_protein)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list