[Biopython-dev] Conserved Domains Database Support

Tue Jul 3 19:19:04 UTC 2012

Hi everyone,

I'm new to the BioPython library and was wondering if there was any support
for the conserved domains database from NCBI?  In particular, the
superfamily batch files that their webtool releases.  Doing a Google
search, there was some interest for this back in 2008; however, they were
mainly interested in parsing the HTML output of CDD searches.  Now that CDD
offers a nice, regular downloadable datatype, has any BioPython support
been implemented to work with this?

If not, I'd like to contribute.

The data is simple tab-delmited formats of domain alignments, E.G.:

Q#10000    0    >WHL22.364604.0    superfamily    212291    7    290
1.01528e-138    401.1    cl09099    P-loop_NTPase    superfamily
0

I had envisioned a simple class of mainly getters/setters with a few
methods such as sorting by Query batches.

~Adam