[Biopython-dev] Conserved Domains Database Support
Peter Cock
p.j.a.cock at googlemail.com
Tue Jul 3 22:03:29 UTC 2012
On Tue, Jul 3, 2012 at 8:19 PM, Adam Hughes <hughesadam87 at gmail.com> wrote:
> Hi everyone,
>
> I'm new to the BioPython library and was wondering if there was any support
> for the conserved domains database from NCBI? In particular, the
> superfamily batch files that their webtool releases. Doing a Google
> search, there was some interest for this back in 2008; however, they were
> mainly interested in parsing the HTML output of CDD searches.
HTML scrappers were always a bit of a pain :(
> Now that CDD
> offers a nice, regular downloadable datatype, has any BioPython support
> been implemented to work with this?
>
> If not, I'd like to contribute.
>
> The data is simple tab-delmited formats of domain alignments, E.G.:
>
> Q#10000 0 >WHL22.364604.0 superfamily 212291 7 290
> 1.01528e-138 401.1 cl09099 P-loop_NTPase superfamily
> 0
>
> I had envisioned a simple class of mainly getters/setters with a few
> methods such as sorting by Query batches.
>
> ~Adam
That is interesting - and offers to work on Biopython are always
nice. Is this a file giving domain definitions (HMM or whatever
CDD uses), or precomputed search results for different query
sequences? Maybe a URL would help - I've not looked at this
resource for quite a while. I used to use the rpsblast tool to
run local (offline) searches against CDD databases, and that
offered several BLAST output flavours.
Peter
P.S. I'll be away with intermittent email access for the rest of the week.
More information about the Biopython-dev
mailing list