[Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular qualifier structure

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Feb 9 13:52:27 EST 2006


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2006-02-09 13:52 -------
This does seem to work for me using a freshly downloaded NC_007633.gbk that

LOCUS       NC_007633            1010023 bp    DNA     circular BCT 18-JAN-2006

It has the blank line 7114 you reported in locus MCAP_0327

Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_007633.gbk'))
WARNING - Ignoring an unknown line type, PROJECT found:
PROJECT     GenomeProject:16208

>>> print record.features[644]
     CDS             391217..391771
                     /note="Similar non-mycoplasma proteins have and additional
                     120 amino acids at the COOH end; identified by similarity
                     to SP:P54575; match to protein family HMM PF06574"
                     /product="riboflavin kinase (flavokinase) domain protein"

The warning about the PROJECT line is a recent change, see bug 1946

I am using the latest version of Bio/GenBank/__init__.py which is revision 1.57
checked in 6 Feb 2006.  This should be the same as yours if you downloaded it
on 8 Feb...

Assuming you have the same genbank file (same date in the LOCUS line) and the
same Bio/GenBank/__init__.py as me, then maybe there is something else
different between our machines, maybe in another part of BioPython.

Or, it could be a Windows/Unix line ending problem?  Or worse, LF vs CR vs
CRLF.  Did you download the file by FTP or via the website?  This might make a
difference if the original file contained a mixture of CR and CRLF.

So far I have only tried this on Windows (and I download the file via the NCBI
website), and BioPython copes with the GenBank file in either windows or unix

I have not (yet) tried it on Linux...

Could you check what happens if you use dos2unix and/or unix2dos on your
GenBank file?


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list