[Biopython-dev] Problem with SeqIO uniprot-xml on older XML files?
Peter Cock
p.j.a.cock at googlemail.com
Fri Sep 27 15:47:11 UTC 2013
Hi all,
There seems to be a problem parsing older UniProt XML files,
see http://seqanswers.com/forums/showthread.php?t=33921
Could anyone have a look at this? Somehow the start/end
of each record does not seem to be recognised here,
>>> from Bio import SeqIO
>>> r = next(SeqIO.parse("uniref90.xml", "uniprot-xml"))
(takes ages, presumably scanning whole file)
Note the indexing code also breaks:
>>> from Bio import SeqIO
>>> d = SeqIO.index("uniref90.xml", "uniprot-xml")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pc40583/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
line 808, in index
key_function, repr, "SeqRecord")
File "/home/pc40583/lib/python2.7/site-packages/Bio/File.py", line
250, in __init__
for key, offset, length in offset_iter:
File "/home/pc40583/lib/python2.7/site-packages/Bio/SeqIO/_index.py",
line 401, in __iter__
% (start_offset, end_offset))
ValueError: Did not find <accession> line in bytes 283 to 38649
Thanks,
Peter
More information about the Biopython-dev
mailing list