[BioPython] Re: GenBank Format & Parsing (was: Why would this
GenBank file choke the GB parser?)
Chris Lasher
chris.lasher at gmail.com
Fri Aug 26 10:17:09 EDT 2005
That was tremendously helpful! Thank you very much, Dr. Kim! Should
this change be added to the CVS of Bio/expressions/genbank.py, and if
so, is that something I should do, or something one of the active
developers should do?
Thanks again, very much,
Chris Lasher
On 8/26/05, Jan T. Kim <jtk at cmp.uea.ac.uk> wrote:
> I've run into similar problems a while ago, the parser is rather picky
> about certain things.
>
> In your case, AY499671 gives "ENV" as the division in the DEFINITION line
> (first line of the file), and it turns out that BioPython doesn't know
> about this division. Specifically, this is in Bio/expressions/genbank.py:
>
> valid_divisions = ["PRI", "ROD", "MAM", "VRT", "INV", "PLN", "BCT", "RNA",
> "VRL", "PHG", "SYN", "UNA", "EST", "PAT", "STS", "GSS",
> "HTG", "HTC", "CON"]
>
> Chances are very good that by adding "ENV" to that list, you'll fix your
> problem. I've tried changing ENV to BCT in the GenBank file and that
> fixed it.
>
> While we're at this: Yeast chromosome GenBank files which I downloaded
> recently have
>
> ACCESSION NC_001133 REGION: 1..230208
>
> which the GenBank parser doesn't like either. I've patched my
> Bio/expressions/genbank.py to accept this, but I haven't been able to
> find any documentation of this -- I just checked the GenBank release
> notes (ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt) again. Can anyone comment
> on this?
>
> Personally, I can't help but wonder whether it would not be possible for
> the GenBank format to converge to stability after so many years...
>
> Best regards, Jan
> --
> +- Jan T. Kim -------------------------------------------------------+
> | *NEW* email: jtk at cmp.uea.ac.uk |
> | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk |
> *-----=< hierarchical systems are for files, not for humans >=-----*
>
More information about the BioPython
mailing list