[Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular
qualifier structure
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Feb 9 13:52:27 EST 2006
http://bugzilla.open-bio.org/show_bug.cgi?id=1942
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2006-02-09 13:52 -------
This does seem to work for me using a freshly downloaded NC_007633.gbk that
starts:
LOCUS NC_007633 1010023 bp DNA circular BCT 18-JAN-2006
It has the blank line 7114 you reported in locus MCAP_0327
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_007633.gbk'))
WARNING - Ignoring an unknown line type, PROJECT found:
PROJECT GenomeProject:16208
>>> print record.features[644]
CDS 391217..391771
/locus_tag="MCAP_0327"
/note="Similar non-mycoplasma proteins have and additional
120 amino acids at the COOH end; identified by similarity
to SP:P54575; match to protein family HMM PF06574"
/codon_start=1
/transl_table=4
/product="riboflavin kinase (flavokinase) domain protein"
/protein_id="YP_424312.1"
/db_xref="GI:83319941"
/db_xref="GeneID:3828958"
/translation="MIYINESFNKLKKLNIKKAIITIGNFDGFHIYHQKIINKVIQIA
KQENLTSIVMSFDKKIKDNITYTNLATKKQKLDFINNNLSDLDYFFDIKVDDSLIKTT
KDQFIDVLINKLNVIKIVEGQDFKFGYLSQGNIDDLIKAFSKKNVIIFKRDNDISSTK
IKKLLDENLVDKAQELLGIDLKLK"
The warning about the PROJECT line is a recent change, see bug 1946
I am using the latest version of Bio/GenBank/__init__.py which is revision 1.57
checked in 6 Feb 2006. This should be the same as yours if you downloaded it
on 8 Feb...
Assuming you have the same genbank file (same date in the LOCUS line) and the
same Bio/GenBank/__init__.py as me, then maybe there is something else
different between our machines, maybe in another part of BioPython.
Or, it could be a Windows/Unix line ending problem? Or worse, LF vs CR vs
CRLF. Did you download the file by FTP or via the website? This might make a
difference if the original file contained a mixture of CR and CRLF.
So far I have only tried this on Windows (and I download the file via the NCBI
website), and BioPython copes with the GenBank file in either windows or unix
format.
I have not (yet) tried it on Linux...
Could you check what happens if you use dos2unix and/or unix2dos on your
GenBank file?
Thanks
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list