[Biopython-dev] GSoC Project Update -- 11

Wibowo Arindrarto w.arindrarto at gmail.com
Tue Aug 7 17:56:26 UTC 2012

Hello everyone,

I have just posted my latest update on my project here:

It's been taking quite a while since I posted my last update since
there has been a considerable change to the SearchIO object model I'm
using. The details are in my blog post, but to keep it short, it was
because the previous model (QueryResult, Hit, and HSP) was inadequate
in handling files that have multiple sequences in their HSP (so far
seen in files output by BLAT and Exonerate). In my previous updates,
I've been using simple Python lists to store attributes related to
these multiple sequences, but that turned out to be problematic as it
may make the object have inconsistent attributes.

After trying out several different implementations and discussing them
with Peter, we've finally settled on a new model. The new model
changes the HSP object into a container that stores a new object:
HSPFragment. HSPFragment represents a single, contiguous alignment of
the hit and query sequence. It only stores the sequence, coordinates,
frames, and strands. Other attributes made by the search program (such
as evalues or scores) are stored in the HSP object.

This change required some modifications on all of the current parsers,
but from a user's perspective working with file formats other than
BLAT or Exonerate, the changes should be minimum.

Aside from this, there's also a small update on the main API which
lets it accept keyword arguments. The arguments modify behaviors of
the parser, and they are different for each parser. Currently, this is
only used by the BLAST tabular parser, but I imagine more parsers will
use this in the future.

Finally, having settled on a firmer object model, I'll be spending the
rest of my time to focus on the documentation. There may still be
small fixes to the code, but I expect nothing as major as this one.


More information about the Biopython-dev mailing list