[Biopython-dev] KEGG support

Renato Alves rjalves at igc.gulbenkian.pt
Thu Feb 11 00:44:59 UTC 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- From Peter on 02/10/2010 10:27 PM:
> Excellent news. Have you looked at the existing KEGG parsers in
> Biopython, and do you think the current style is suitable? (I haven't
> looked at the code recently myself, but will do).

The style seems good enough but I was thinking of having a more
functional approach, at least for the parser to try to get away of the
massive if/elif/else cascades. The writer would come as second priority
and would be similar although I would also try to keep code duplication
at lower levels than what we can see in the Enzyme/__init__.py file. I
would also consider using Genes.py instead of Genes/__init__.py ... I
don't see the need of packages here.

> Regarding the SeqIO interface (for KEGG GENES only?), I would be
> happy to advise. Initially I suggest you work on adding a parser much
> like the other KEGG parsers, returning gene records. Then we can
> add a Bio/SeqIO/KeggGeneIO.py wrapper to turn these into SeqRecord
> objects.

Yes for now my main goal would be GENES. The other formats can probably
grow from there. Your suggestion on the SeqIO seems reasonable. I'll try
to have a prototype in the next days/weekend and we can discuss from there.

> I have not used SOAP, and have a personal preference for REST style
> APIs. However, if that is what KEGG offers, this is worth considering.
> I think Brad has some experience with (other) SOAP services in Python.
> Note the KEGG documentation suggests using SOAPpy for Python.

According to the http://www.genome.jp/kegg/docs/weblink.html page they
do mention a REST like URL for generic entries, pathways and brite. But
it seems more useful for external linking than as an API. I couldn't
even figure out how to return the information in plaintext instead of
the default HTML. About SOAPpy, I've nothing against it besides the fact
that when I first tried I had few problems. Anyway it was a long time
ago... I've only played with suds since.

> Interestingly, KEGG are however looking into providing RDF (and
> perhaps one day SPARQL endpoints). I will try and find out what sort
> of time scale they have in mind while I am at the BioHackathon 2010
> this week - http://hackathon3.dbcls.jp/

We'll be waiting on your feedback on this :)

> For now, I would prioritise the KEGG flat file parsers.

Agreed.

> Peter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAktzUwgACgkQYh11EUYTX9SPcwCfSrNkIovs1vnPinuAtMFZQJYn
pmAAnjHAAro2Ls/c1Nq4DCuliReaPm64
=Dohn
-----END PGP SIGNATURE-----



More information about the Biopython-dev mailing list