[BioPython] Re: GenBank Format & Parsing (was: Why would this GenBank file choke the GB parser?)

Chris Lasher chris.lasher at gmail.com
Fri Aug 26 10:17:09 EDT 2005


That was tremendously helpful! Thank you very much, Dr. Kim! Should
this change be added to the CVS of Bio/expressions/genbank.py, and if
so, is that something I should do, or something one of the active
developers should do?

Thanks again, very much,
Chris Lasher

On 8/26/05, Jan T. Kim <jtk at cmp.uea.ac.uk> wrote:
> I've run into similar problems a while ago, the parser is rather picky
> about certain things.
> 
> In your case, AY499671 gives "ENV" as the division in the DEFINITION line
> (first line of the file), and it turns out that BioPython doesn't know
> about this division. Specifically, this is in Bio/expressions/genbank.py:
> 
>     valid_divisions = ["PRI", "ROD", "MAM", "VRT", "INV", "PLN", "BCT", "RNA",
>                        "VRL", "PHG", "SYN", "UNA", "EST", "PAT", "STS", "GSS",
>                        "HTG", "HTC", "CON"]
> 
> Chances are very good that by adding "ENV" to that list, you'll fix your
> problem. I've tried changing ENV to BCT in the GenBank file and that
> fixed it.
> 
> While we're at this: Yeast chromosome GenBank files which I downloaded
> recently have
> 
>     ACCESSION   NC_001133 REGION: 1..230208
> 
> which the GenBank parser doesn't like either. I've patched my
> Bio/expressions/genbank.py to accept this, but I haven't been able to
> find any documentation of this -- I just checked the GenBank release
> notes (ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt) again. Can anyone comment
> on this?
> 
> Personally, I can't help but wonder whether it would not be possible for
> the GenBank format to converge to stability after so many years...
> 
> Best regards, Jan
> --
>  +- Jan T. Kim -------------------------------------------------------+
>  |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
>  |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
>  *-----=<  hierarchical systems are for files, not for humans  >=-----*
>



More information about the BioPython mailing list