[Biopython] accessing superfamilies (putative conserved domains) via biopython

Peter Cock p.j.a.cock at googlemail.com
Tue Nov 12 15:12:50 UTC 2013


On Tue, Nov 12, 2013 at 2:55 PM, Anna Kostikova
<anna.kostikova at gmail.com> wrote:
> Hello everyone,
>
> Is there any way of getting putative conserved domain information
> (such as superfamilies, specific hits, multidomains) with biopython?
> When running (e.g.) BLASTX on NCBI this information typically appears
> in a Conserved Domain section above Distribution of Blast Hits. Is
> there a way to extract or access it via biopython?
>
> I also found the Web CD-search tool, but this one only takes protein
> sequences as an input and doesn't seems to have a biopython API.
>
> Is there any solution to search for/map CDs automatically (if not via NCBI)?
>
> Thanks,
> Anna

I think you are looking for the rpsblast tool, usually used with the NCBI
Conserved Domain Database (CDD) or one of the sub-databases
like PFAM (which you can also search with hmmer). This is part of
the standalone legacy BLAST or BLAST+ applications form the NCBI.

Biopython should happily parse the XML output from rpsblast.

Peter



More information about the Biopython mailing list