[Biopython] Pulling Alignment From PSI-Blast Output

Brett Bowman bnbowman at gmail.com
Tue Feb 8 17:48:37 UTC 2011


Sadly no - I tried lining up the output sequence alignments, but the result
is meaningless because they are all just aligned pair-wise to the query.
 I'm wondering if maybe I just need to go back and use BlastPGP somehow?  I
know they cut out a lot of features to make the PSIblast standalone
executable.  Though why they would remove things from the output makes no
sense to me...

So I guess that goes back to my previous question - if parsing the PSIBlast
XML output only gives me a Bio.Blast.Record.Blast object, then where do the
Bio.Blast.Record.PSIBlast objects, which are supposed to have that alignment
built in, come from?

-Brett

On Tue, Feb 8, 2011 at 4:05 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> I am surprised that the multiple alignment is not in the XML at all. It can
> not be constructed from the information in the XML? Anyway, if it is in
> there, I would suggest to use Bio.Entrez to parse the XML instead of the
> parser in Bio.Blast. The Bio.Entrez parser will give you all the information
> in the XML; the parser in Bio.Blast is more polished but may not give you
> all the information present in the PSI-Blast output.
>
> --Michiel.
>
>
> --- On *Tue, 2/8/11, Brett Bowman <bnbowman at gmail.com>* wrote:
>
>
> From: Brett Bowman <bnbowman at gmail.com>
> Subject: Re: [Biopython] Pulling Alignment From PSI-Blast Output
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython at biopython.org
> Date: Tuesday, February 8, 2011, 2:40 AM
>
>
> I thought about that, but there doesn't appear to be any multiple-alignment
> data in the XML file - just pair-wise alignments of the query with each hit.
>  In addition, when I parse the output file with NCBIXML I get a
> Bio.Blast.Record.Blast object, instead of a Bio.Blast.Record.PSIBlast
> object.  The Biopython cookbook describes how to work with a PSIBlast
> object, but it doesn't really cover how to make one...
>
> On Mon, Feb 7, 2011 at 5:20 PM, Michiel de Hoon <mjldehoon at yahoo.com<http://mc/compose?to=mjldehoon@yahoo.com>
> > wrote:
>
> One option you could try is to let PSI-Blast generate its output in XML and
> check if the information you need is present in the XML. If it is, you can
> parse the XML with the read() function in Bio.Entrez. You may find that
> Bio.Entrez needs an additional DTD file to be able to parse the PSI-Blast
> XML output (Bio.Entrez will tell you which one and where to store it). If
> so, please let us know, so we can include the required DTDs in the next
> release of Biopython.
>
> --Michiel.
>
> --- On Mon, 2/7/11, Brett Bowman <bnbowman at gmail.com<http://mc/compose?to=bnbowman@gmail.com>>
> wrote:
>
> > From: Brett Bowman <bnbowman at gmail.com<http://mc/compose?to=bnbowman@gmail.com>
> >
> > Subject: [Biopython] Pulling Alignment From PSI-Blast Output
> > To: biopython at biopython.org<http://mc/compose?to=biopython@biopython.org>
> > Date: Monday, February 7, 2011, 5:30 PM
> > I'm trying to use the PSI-Blast
> > results from a series of proteins to detect
> > distant homologues, using HMMs of various sorts.
> > Currently I'm pulling down
> > the sequence IDs with PSI-Blast, downloading the full
> > sequences from NCBI,
> > then aligning everything with ClustalW or Muscle.
> > However this is eating up
> > way more processor time than I have to spare, so I want to
> > just pull the
> > full multi-sequence alignment from the PSI-blast results if
> > possible (OUTFMT
> > option #3 or 4), for use in building the HMMs.  But it
> > doesn't look like
> > AlignIO has a module for reading the peculiar format that
> > PSI-Blast
> > generates...
> >
> > Has this been done before, or will I need to write my own
> > parser?
> >
> > Brett Bowman
> > Woelk Lab
> > UCSD School of Medicine
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org<http://mc/compose?to=Biopython@lists.open-bio.org>
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
>
>
>
>
>
>



More information about the Biopython mailing list