[Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line
type PROJECT
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Sun Feb 5 07:00:15 EST 2006
http://bugzilla.open-bio.org/show_bug.cgi?id=1946
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Component|Martel/Mindy |Main Distribution
OS/Version|Mac OS |All
Summary|Parsing GenBank Files - |Parsing GenBank Files -
|ParserPositionException: |unknown line type PROJECT
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2006-02-05 07:00 -------
The non-martel GenBank parser in CVS is also unaware of the project line in
GenBank files.
I would expect it to fail with an assertion error:
Unknown line type, PROJECT found:
PROJECT GenomeProject:14204
This looks like an easy fix, however we need to decide how to store the project
information. Maybe a simple string for now, "GenomeProject:14204"
Also maybe unknown line types in the header should trigger warnings rather than
errors that stop the parsing...
---------------------------------------
Quoting from
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
---------------------------------------
1.4.1 New Linetype for Genome Project Identifier
DDBJ, EMBL, and GenBank are working to create a collaborative system that
will assign a unique numeric identifier to genome projects. The purpose of
this new identifier is to provide a link among sequence records that pertain
to a specific genome sequencing project.
At GenBank, this new identifier will be presented in the flatfile format
via a new linetype : PROJECT . Here is a mocked-up example demonstrating
the new linetype's use:
LOCUS CH476840 1669278 bp DNA linear CON 05-OCT-2005
DEFINITION Magnaporthe grisea 70-15 supercont5.200 genomic scaffold, whole
genome shotgun sequence.
ACCESSION CH476840 AACU02000000
VERSION CH476840.1 GI:77022292
PROJECT GENOME_PROJECT:12345
The integer 12345 represents the value of a possible genome project
identifier.
There is a possibility that the contents of the PROJECT line might change
somewhat from this example by the time the new identifier is implemented.
We will keep you posted of any such changes via these release notes and the
GenBank listserv.
These Genome Project identifiers will be searchable within NCBI's
Entrez: Genome-Project database:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj
The earliest date on which this new linetype will appear in the GenBank
flatfile format is February 15 2006.
---------------------------------------
Looks like they are ahead of shedule in releasing this new type line.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list