[Biopython-dev] Problem with SeqIO uniprot-xml on older XML files?

Peter Cock p.j.a.cock at googlemail.com
Fri Sep 27 15:47:11 UTC 2013


Hi all,

There seems to be a problem parsing older UniProt XML files,
see http://seqanswers.com/forums/showthread.php?t=33921

Could anyone have a look at this? Somehow the start/end
of each record does not seem to be recognised here,

>>> from Bio import SeqIO
>>> r = next(SeqIO.parse("uniref90.xml", "uniprot-xml"))
(takes ages, presumably scanning whole file)

Note the indexing code also breaks:

>>> from Bio import SeqIO
>>> d = SeqIO.index("uniref90.xml", "uniprot-xml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pc40583/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
line 808, in index
    key_function, repr, "SeqRecord")
  File "/home/pc40583/lib/python2.7/site-packages/Bio/File.py", line
250, in __init__
    for key, offset, length in offset_iter:
  File "/home/pc40583/lib/python2.7/site-packages/Bio/SeqIO/_index.py",
line 401, in __iter__
    % (start_offset, end_offset))
ValueError: Did not find <accession> line in bytes 283 to 38649

Thanks,

Peter



More information about the Biopython-dev mailing list