[Biopython-dev] [Bug 2443] New: Specifying the alphabet in Bio.SeqIO.parse()

Tue Feb 5 13:36:16 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2443

           Summary: Specifying the alphabet in Bio.SeqIO.parse()
           Product: Biopython
           Version: 1.44
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk

Currently when reading sequences using Bio.SeqIO, unless the alphabet can be
determined from the file format, all the records have a generic alphabet.

This can be a handicap if later on you want to work with "strict" functions
which check for a particular alphabet (e.g. a gapped alphabet when working with
alignments), or perhaps the Bio.Translate module.

For an example of this, see Dalloliogm's question on the SeqIO wiki talk page,
http://biopython.org/wiki/Talk:SeqIO

Currently the user may need to use a tedious work around to override the
alphabet of each sequence, e.g.

from Bio import SeqIO
from Bio.Alphabet import generic_dna
records = list(SeqIO.parse(open("data.txt"), "fasta"))
for record in records :
    record.seq.alphabet = generic_dna
record_dict = SeqIO.to_dict(records)

Instead, I want to add an optional argument to the parse() and read()
functions, allowing this example to be shortened:

from Bio import SeqIO
from Bio.Alphabet import generic_dna
record_dict = SeqIO.to_dict(SeqIO.parse(open("data.txt"), "fasta",
generic_dna))

Suggested patch to follow...

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.