[Biopython-dev] SwissProt fails to parse the current uniprot_sprot data file?
Peter Cock
p.j.a.cock at googlemail.com
Tue Oct 21 06:16:29 UTC 2014
Hi Jinghua,
Yes, this problem has already been reported but not fixed yet:
https://github.com/biopython/biopython/issues/369
It shouldn't be too complicated to modify the code to
cope with both the old and new style lines - do you want
to try?
Thanks for reporting this,
Peter
On Tue, Oct 21, 2014 at 4:45 AM, Jinghua (Frank) Feng
<Jinghua.Feng at adelaide.edu.au> wrote:
> Hello,
>
> It looks like SwissProt can parse old version uniprot_sprot data file, but
> fails with the current version data file. Below is how to replicate the
> error (Biopython version is '1.64').
>
> Regards,
>
> Jinghua
> ----------------------
>
> First download the current uniprot_sprot data file (~72 MB in size) at
> ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_human.dat.gz
>
> Then in IPython, using SwissProt to parse the downloaded data file:
>
> In [1]: from Bio import SwissProt
>
> In [2]: import gzip
>
> In [3]: inhandle = gzip.open('./uniprot_sprot_human.dat.gz')
>
> In [4]: reader = SwissProt.parse(inhandle)
>
> In [5]: for r in reader:
> ...: pass
> ...:
> ---------------------------------------------------------------------------
> AssertionError Traceback (most recent call last)
> <ipython-input-5-c04351d992d2> in <module>()
> ----> 1 for r in reader:
> 2 pass
> 3
>
> /usr/local/lib/python2.7/dist-packages/Bio/SwissProt/__init__.pyc in
> parse(handle)
> 115 def parse(handle):
> 116 while True:
> --> 117 record = _read(handle)
> 118 if not record:
> 119 return
>
> /usr/local/lib/python2.7/dist-packages/Bio/SwissProt/__init__.pyc in
> _read(handle)
> 182 elif key == 'RN':
> 183 reference = Reference()
> --> 184 _read_rn(reference, value)
> 185 record.references.append(reference)
> 186 elif key == 'RP':
>
> /usr/local/lib/python2.7/dist-packages/Bio/SwissProt/__init__.pyc in
> _read_rn(reference, rn)
> 407
> 408 def _read_rn(reference, rn):
> --> 409 assert rn[0] == '[' and rn[-1] == ']', "Missing brackets %s" %
> rn
> 410 reference.number = int(rn[1:-1])
> 411
>
> AssertionError: Missing brackets [1] {ECO:0000305,
> ECO:0000312|EMBL:AAK11482.1}
>
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
More information about the Biopython-dev
mailing list