[Biopython-dev] [Bug 2443] New: Specifying the alphabet in Bio.SeqIO.parse()
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Feb 5 13:36:16 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2443
Summary: Specifying the alphabet in Bio.SeqIO.parse()
Product: Biopython
Version: 1.44
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Currently when reading sequences using Bio.SeqIO, unless the alphabet can be
determined from the file format, all the records have a generic alphabet.
This can be a handicap if later on you want to work with "strict" functions
which check for a particular alphabet (e.g. a gapped alphabet when working with
alignments), or perhaps the Bio.Translate module.
For an example of this, see Dalloliogm's question on the SeqIO wiki talk page,
http://biopython.org/wiki/Talk:SeqIO
Currently the user may need to use a tedious work around to override the
alphabet of each sequence, e.g.
from Bio import SeqIO
from Bio.Alphabet import generic_dna
records = list(SeqIO.parse(open("data.txt"), "fasta"))
for record in records :
record.seq.alphabet = generic_dna
record_dict = SeqIO.to_dict(records)
Instead, I want to add an optional argument to the parse() and read()
functions, allowing this example to be shortened:
from Bio import SeqIO
from Bio.Alphabet import generic_dna
record_dict = SeqIO.to_dict(SeqIO.parse(open("data.txt"), "fasta",
generic_dna))
Suggested patch to follow...
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list