[Biopython-dev] GSoC Project Update -- 4

Wibowo Arindrarto w.arindrarto at gmail.com
Wed May 30 21:44:04 UTC 2012


Hi everyone,

I just posted my latest GSoC update here:
http://bow.web.id/blog/2012/05/assembling-the-parsers/

To summarize:

I've been working on more SearchIO parsers last week, adding more
formats to support. We know have SearchIO-specific BLAST+ XML parser
(it was first implemented on top of NCBIXML). It uses ElementTree as
the base XML parser, with promising performance gains. I've also
completed SearchIO's blast tabular parser, which takes in the BLAST+
tabular output files with or without headers. If the tabular file has
headers, it can parse any number of columns in any order as long the
columns with hit and query IDs are present. Finally, I've finished
writing the HMMER plain text parser. For now, the parser can handle
outputs from hmmscan and hmmsearch, single and multiple queries. All
these parsers have been tested using the test cases I've generated
previously.

Additionally, I also had a public discussion with Peter on Github
regarding SearchIO objects here:
https://github.com/bow/biopython/commit/69a0ab64dfa7718f7455ca4c3961e95277fb4dbc#-P0,
if anyone is interested. It started as a discussion on some behaviors
of the HSP object, but also relates to other issues raised earlier
(the dynamic SeqRecord coordinates Peter brought up earlier and
Biopython's platform support).

That's it for this week :).

cheers,
Bow



More information about the Biopython-dev mailing list