[Biopython-dev] Working with the new SearchIO API

Kai Blin kai.blin at biotech.uni-tuebingen.de
Mon Oct 29 20:43:49 UTC 2012


Hi Bow,

I've been looking closer at the SearchIO API changes introduced in
August. I think there still is a design problem with the object model,
at least when looking at how this affects the hmmer3 parser (and affects
the hmmer2 parsing as well).

Possibly I'm not seeing the big picture here, so let me explain what I'm
seeing, and then you can tell me what I missed. :)

So, the hmmer2 and hmmer3 file format basically looks like this

# header
# ...
# ...

information about the query

list of hits

list of hsps

(alignments for hsps)

(some statistics)
//

Now, when parsing this file line-wise, you obviously run into the hits
first. However, with the new API, you can't create a Hit object without
knowing the HSPs, but you haven't read them yet.

To work around this, you need to create a fake hit object
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L201).
Then, in the loop that creates the fake hit objects, one of the exit
conditions then parses the HSP entries and then replaces the fake hit
objects by "real" Hit objects.
(https://github.com/bow/biopython/blob/searchio/Bio/SearchIO/HmmerIO/hmmer3_text.py#L188)

By the way, that code is a bit misleading. Took me a while to notice the
switch of the list's contents. Anyway, back to business.

So basically you need to create two hit objects for every hit you're
looking at. What's the advantage of forcing Hsp objects to be passed to
the Hit constructor? Just to make sure your Hit objects have a valid Hsp
at some later point?

I'm aware that I'm just looking at the SearchIO design from the
perspective of the hmmer2 parser, but I'd like to understand the reasons
for the API being the way it currently is.

Hope you can shed some light on this,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of Tübingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 Tübingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben



More information about the Biopython-dev mailing list