[Biopython] KOBAS - KEGG Orthology Based Annotation System XML file empty problem

Peter biopython at maubp.freeserve.co.uk
Wed Oct 28 10:48:24 UTC 2009


On Tue, Oct 27, 2009 at 10:23 PM, Laszlo Kun <laszlo at vpac.org> wrote:
> Dear All,
>
> I am trying to install for a user the KOBAS software, which is
> done apparently, but after about 3 hours is felt over with
> the error message:
>
> ======================
> [rossh at tango Ov_KOBAS]$ cat NY.e789941
> Traceback (most recent call last):
> File "/usr/local/python/2.6.2-gcc/bin/blast2ko.py", line 90, in <module>
> annots = dict([ (i.query, i) for i in annotator.annotate() ])
> File
> "/usr/local/python/2.6.2-gcc/lib/python2.6/site-packages/kobas/annot.py",
> line 151, in annotate
> for record in self.reader:
> File
> "/usr/local/python/2.6.2-gcc/lib/python2.6/site-packages/Bio/Blast/NCBIXML.py",
> line 605, in parse
> raise ValueError("Your XML file was empty")
> ValueError: Your XML file was empty
>
> =============================
>
> The script appears to have completed the blast section
> against the KOBAS database, but has fallen over on
> the annotation pass.
>
> I haven't come across this error before.
>
> Thanks again for your help.
>
> cheers,
> Laszlo

Hi Laszlo,

Have you previously ever had KOBAS working? I would
guess this is your first attempt...

The error message from Biopython seems quite clear,
KOBAS is trying to parse an empty XML file. This may
have been due to a problem calling BLAST - which
they probably do via Biopython. Have you checked
your installation of standalone NCBI blast (i.e. the
command line tool blastall) is working? I don't know
what NCBI databases are needed, probably nr.

Unfortunately, there is anther issue here too...

KOBAS is described here:

Mao et al. (2005) Bioinformatics 21(19) pp. 3787-93
http://dx.doi.org/10.1093/bioinformatics/bti430

Wu et al. (2006) Nucleic Acids Research 34
http://dx.doi.org/10.1093/nar/gkl167

The link given in the original paper seems to be dead now:
http://genome.cbi.pku.edu.cn/download.html

Their second paper gives http://kobas.cbi.pku.edu.cn/
which includes links to download their source code.
I had a quick look at this (KOBOS 1.1.0), and it has
not been updated recently. As you are using Python
2.6, you'll see some harmless deprecation warnings
about the sets module (a trivial issue to fix).

What version of Biopython do you have installed?

Their website says they need Biopython 1.24 or later,
but this isn't true. Their file fasta.py uses Biopython's
Bio.SeqIO module which was added in Biopython 1.43.
Their file annot.py uses Bio.Blast.NCBIXML.parse
function, which was also added in Biopython 1.43.

Also, and perhaps most importantly (as mentioned in
the first paper) they are using Martel for parsing KEGG.
We have dropped Martel, and Biopython 1.50 was
the last release to include it. I'm not sure at what
point in the pipeline they use KEGG, but I guess
this will cause trouble after the BLAST step. We
*could* provide the final version of Martel as a
separate standalone package - I'd need to find
half a day free. Note I would strongly recommend
using mxTextTools version 2 (not version 3) as
something about the unicode related API changes
are known to cause some subtle problem with
Martel as used in older versions of Biopython.

I think you (or Biopython) need to get in touch with
the KOBAS authors. They can at least tell us what
version of Biopython they used to delvelop KOBAS
1.1.0. Also, they may have already updated their
code for the webservice, and just not updated the
download files.

Regards,

Peter



More information about the Biopython mailing list