[Biopython] Pulling Alignment From PSI-Blast Output
Ruchira Datta
ruchira.datta at gmail.com
Tue Feb 8 17:59:28 UTC 2011
On Tue, Feb 8, 2011 at 4:05 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I am surprised that the multiple alignment is not in the XML at all. It can
> not be constructed from the information in the XML?
It might be there implicitly in HSPs (high-scoring segment pairs). Trying
to make a multiple sequence alignment out of these is a huge pain.
Brett, might you not run all of HHmake (which includes running PSI-BLAST)?
It extracts the multiple sequence alignment. (It might "clean up" after
itself by deleting it -- you might need to go in and take that line out.)
Then, if you wanted to do more stuff to the MSA before running HHmake, you
could just discard the rest of the results of the first run and run HHmake
again at the point that you want.
--Ruchira
Anyway, if it is in there, I would suggest to use Bio.Entrez to parse the
> XML instead of the parser in Bio.Blast. The Bio.Entrez parser will give you
> all the information in the XML; the parser in Bio.Blast is more polished but
> may not give you all the information present in the PSI-Blast output.
>
> --Michiel.
>
> --- On Tue, 2/8/11, Brett Bowman <bnbowman at gmail.com> wrote:
>
> From: Brett Bowman <bnbowman at gmail.com>
> Subject: Re: [Biopython] Pulling Alignment From PSI-Blast Output
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython at biopython.org
> Date: Tuesday, February 8, 2011, 2:40 AM
>
> I thought about that, but there doesn't appear to be any multiple-alignment
> data in the XML file - just pair-wise alignments of the query with each hit.
> In addition, when I parse the output file with NCBIXML I get a
> Bio.Blast.Record.Blast object, instead of a Bio.Blast.Record.PSIBlast
> object. The Biopython cookbook describes how to work with a PSIBlast
> object, but it doesn't really cover how to make one...
>
>
>
> On Mon, Feb 7, 2011 at 5:20 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
>
>
> One option you could try is to let PSI-Blast generate its output in XML and
> check if the information you need is present in the XML. If it is, you can
> parse the XML with the read() function in Bio.Entrez. You may find that
> Bio.Entrez needs an additional DTD file to be able to parse the PSI-Blast
> XML output (Bio.Entrez will tell you which one and where to store it). If
> so, please let us know, so we can include the required DTDs in the next
> release of Biopython.
>
>
>
>
>
> --Michiel.
>
>
>
> --- On Mon, 2/7/11, Brett Bowman <bnbowman at gmail.com> wrote:
>
>
>
> > From: Brett Bowman <bnbowman at gmail.com>
>
> > Subject: [Biopython] Pulling Alignment From PSI-Blast Output
>
> > To: biopython at biopython.org
>
> > Date: Monday, February 7, 2011, 5:30 PM
>
> > I'm trying to use the PSI-Blast
>
> > results from a series of proteins to detect
>
> > distant homologues, using HMMs of various sorts.
>
> > Currently I'm pulling down
>
> > the sequence IDs with PSI-Blast, downloading the full
>
> > sequences from NCBI,
>
> > then aligning everything with ClustalW or Muscle.
>
> > However this is eating up
>
> > way more processor time than I have to spare, so I want to
>
> > just pull the
>
> > full multi-sequence alignment from the PSI-blast results if
>
> > possible (OUTFMT
>
> > option #3 or 4), for use in building the HMMs. But it
>
> > doesn't look like
>
> > AlignIO has a module for reading the peculiar format that
>
> > PSI-Blast
>
> > generates...
>
> >
>
> > Has this been done before, or will I need to write my own
>
> > parser?
>
> >
>
> > Brett Bowman
>
> > Woelk Lab
>
> > UCSD School of Medicine
>
> > _______________________________________________
>
> > Biopython mailing list - Biopython at lists.open-bio.org
>
> > http://lists.open-bio.org/mailman/listinfo/biopython
>
> >
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list