[Biopython-dev] [Bug 2425] New: Fasta ID parsing error
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Dec 28 16:18:54 UTC 2007
http://bugzilla.open-bio.org/show_bug.cgi?id=2425
Summary: Fasta ID parsing error
Product: Biopython
Version: 1.44
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: dtomso at athenixcorp.com
Loader.py will give an error as follows when presented with an unusual FASTA
header line:
>region1.fasta.screen.Contig1
ACAGGATAGGCGGGAGCCATTGAAACCGGAGCGCTAGCTTCGGTGGAGGC
GCTGGTGGGATACCGCCCTGACTGTATTGAAATTCTAACCTACGGGTCTT
Traceback (most recent call last):
File "biosql_driver.py", line 28, in <module>
db.load(SeqIO.parse(sfile, 'fasta'))
File
"/home/dtomso/repository/biopython/build/lib.linux-i686-2.5/BioSQL/BioSeqDatabase.py",
line 412, in load
db_loader.load_seqrecord(cur_record)
File "/usr/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File "/usr/lib/python2.5/site-packages/BioSQL/Loader.py", line 214, in
_load_bioentry_table
accession, version = record.id.split('.')
ValueError: too many values to unpack
It appears to be looking for any '.' in the file, assuming that is a version
number, and splitting to obtain that number. However, this only works on
NCBI-type header lines. Files that deviate from this (e.g. those produced by
phrap, which produced the file above) cause this issue.
I bolted on an inelegant fix by having the code check for multiple '.'
characters, in which case the version defaults to zero. Other solutions may be
preferable.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list