[Biopython-dev] Conserved Domains Database Support

Tue Jul 3 22:03:29 UTC 2012

On Tue, Jul 3, 2012 at 8:19 PM, Adam Hughes <hughesadam87 at gmail.com> wrote:
> Hi everyone,
>
> I'm new to the BioPython library and was wondering if there was any support
> for the conserved domains database from NCBI?  In particular, the
> superfamily batch files that their webtool releases.  Doing a Google
> search, there was some interest for this back in 2008; however, they were
> mainly interested in parsing the HTML output of CDD searches.

HTML scrappers were always a bit of a pain :(

> Now that CDD
> offers a nice, regular downloadable datatype, has any BioPython support
> been implemented to work with this?
>
> If not, I'd like to contribute.
>
> The data is simple tab-delmited formats of domain alignments, E.G.:
>
> Q#10000    0    >WHL22.364604.0    superfamily    212291 7 290
> 1.01528e-138    401.1    cl09099    P-loop_NTPase    superfamily
> 0
>
> I had envisioned a simple class of mainly getters/setters with a few
> methods such as sorting by Query batches.
>
> ~Adam

That is interesting - and offers to work on Biopython are always
nice. Is this a file giving domain definitions (HMM or whatever
CDD uses), or precomputed search results for different query
sequences? Maybe a URL would help - I've not looked at this
resource for quite a while. I used to use the rpsblast tool to
run local (offline) searches against CDD databases, and that
offered several BLAST output flavours.

Peter

P.S. I'll be away with intermittent email access for the rest of the week.