[BioPython] [Biopython-dev] Bio.CDD, anyone?

Thu Jun 19 10:44:00 EDT 2008

Michiel de Hoon wrote:
>> I wonder if the NCBI make any of this available as XML via Entrez?  I
>> had a quick look and couldn't find anything.
>>     
>
> Actually I already asked this question to NCBI. Their answer was that a subset of the information shown on the web page is available as XML via Entrez's ESummary and EFetch (and thus available from Biopython). The full CDD records are stored as one large file, which is obtainable from NCBI's ftp site, but currently it is not possible to get individual CDD records except in HTML form through the NCBI website.
>
> --Michiel.
>
>
> Peter <biopython at maubp.freeserve.co.uk> wrote: > Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
>   
>> records. The parser parses HTML pages from CDD's web site. Since the parser
>> was written about six years ago, the CDD web site has changed considerably.
>> Bio.CDD therefore cannot parse current HTML pages from CDD.
>>     
>
> A couple of years ago, I wanted to get the CDD domain name and
> description and ended up writing my own very simple and crude parser
> to extract just this information.  Doing a proper job would mean
> extracting lots and lots of fields, e.g.
> http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=29475
>
> I wonder if the NCBI make any of this available as XML via Entrez?  I
> had a quick look and couldn't find anything.
>
> Peter
>
>
>        
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>   
Hi,
Do you know how the test files were created? If there is not an easy 
answer then it makes the decision easier.

Anyhow, I  vote to remove this module as, in addition to the things 
previously mentioned, it would far better to support interproscan 
(http://www.ebi.ac.uk/Tools/InterProScan/ ) than just a single tool.

Bruce