[Biopython-dev] [Patch] Genbank Parser

Peter Cock p.j.a.cock at googlemail.com
Thu Oct 4 05:11:01 EDT 2012


On Mon, Oct 1, 2012 at 10:44 PM, Björn Grüning <bjoern at gruenings.eu> wrote:
> Hi Peter,
>
>> >
>> > the tbl2asn tool from the ncbi creates genbank files that did not have a
>> > version number. Unfortunately that version number is used to fill
>> > consumer.data.id.
>> > I implemented the following fall-back:
>> > If there is no version information available than it takes the
>> > consumer.data.name for the consumer.data.id. Does that makes sense?
>> >
>> > Thanks!
>> > Bjoern
>>
>> Can you share some example output from tbl2asn that shows
>> this problem? Ideally something small we could include as a
>> unit test.
>
> please find attached a small, stripped version of such an genbank file.
>
> Thanks,
> Bjoern

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> r = SeqIO.read("tbl1asn_output.gb", "gb")
/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1158:
BiopythonParserWarning: Expected sequence length 300246, found 2220
().
  BiopythonParserWarning)
>>> r.id
''
>>> r.name
'Seq1'
>>> r.description
'Glarea strain lozoyensis.'
>>> quit()

That warning is because this test file has only the start of the sequence
present, yet the LOCUS line still gives the original length.

$ head tbl1asn_output.gb
LOCUS       Seq1                  300246 bp    DNA     linear       10-MAY-2012
DEFINITION  Glarea strain lozoyensis.
ACCESSION
VERSION
KEYWORDS    .
SOURCE      Glarea
  ORGANISM  Glarea
            Unclassified.
REFERENCE   1
  AUTHORS   Test

I didn't use your patch - looking over the code, it was already intended
that if there was no record.id that record.name would be used. Sadly
this was a bit too strict about None versus an empty string, fixed:
https://github.com/biopython/biopython/commit/e67d22e4b4f344a5a3c15b6e939c82f58986d87f

Thanks for your help,

Peter



More information about the Biopython-dev mailing list