[Biopython] Pulling Alignment From PSI-Blast Output

Tue Feb 8 20:28:45 UTC 2011

>
> It might be there implicitly in HSPs (high-scoring segment pairs).  Trying
> to make a multiple sequence alignment out of these is a huge pain.
>

It isn't, I checked - the alignments are only pairwise, so combining them
produces non-sense.  Thats why I have been using Muscle and ClustalW to
create an alignment after the Blast, despite the large computational cost.

> Brett, might you not run all of HHmake (which includes running PSI-BLAST)?
> It extracts the multiple sequence alignment.
>

HHmake does not include PSI-Blast - all HHmake does is convert a multiple
sequence alignment to an HMM profile.  The original paper that inspired this
project used PSI-Blast and then used HHmake, but make no mention of
alignments.  I have sent the authors an e-mail for clarification, but have
not received a reply.  But from the way the paper was written, it does not
appear that they used an alignment program, hence my confusion at my
inability to do the same.

-Brett

On Tue, Feb 8, 2011 at 9:59 AM, Ruchira Datta <ruchira.datta at gmail.com>wrote:

>
>
> On Tue, Feb 8, 2011 at 4:05 AM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:
>
>> I am surprised that the multiple alignment is not in the XML at all. It
>> can not be constructed from the information in the XML?
>
>
> It might be there implicitly in HSPs (high-scoring segment pairs).  Trying
> to make a multiple sequence alignment out of these is a huge pain.
>
> Brett, might you not run all of HHmake (which includes running PSI-BLAST)?
> It extracts the multiple sequence alignment.  (It might "clean up" after
> itself by deleting it -- you might need to go in and take that line out.)
> Then, if you wanted to do more stuff to the MSA before running HHmake, you
> could just discard the rest of the results of the first run and run HHmake
> again at the point that you want.
>
> --Ruchira
>
> Anyway, if it is in there, I would suggest to use Bio.Entrez to parse the
>> XML instead of the parser in Bio.Blast. The Bio.Entrez parser will give you
>> all the information in the XML; the parser in Bio.Blast is more polished but
>> may not give you all the information present in the PSI-Blast output.
>>
>> --Michiel.
>>
>> --- On Tue, 2/8/11, Brett Bowman <bnbowman at gmail.com> wrote:
>>
>> From: Brett Bowman <bnbowman at gmail.com>
>> Subject: Re: [Biopython] Pulling Alignment From PSI-Blast Output
>> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
>> Cc: biopython at biopython.org
>> Date: Tuesday, February 8, 2011, 2:40 AM
>>
>> I thought about that, but there doesn't appear to be any
>> multiple-alignment data in the XML file - just pair-wise alignments of the
>> query with each hit.  In addition, when I parse the output file with NCBIXML
>> I get a Bio.Blast.Record.Blast object, instead of a
>> Bio.Blast.Record.PSIBlast object.  The Biopython cookbook describes how to
>> work with a PSIBlast object, but it doesn't really cover how to make one...
>>
>>
>>
>> On Mon, Feb 7, 2011 at 5:20 PM, Michiel de Hoon <mjldehoon at yahoo.com>
>> wrote:
>>
>>
>> One option you could try is to let PSI-Blast generate its output in XML
>> and check if the information you need is present in the XML. If it is, you
>> can parse the XML with the read() function in Bio.Entrez. You may find that
>> Bio.Entrez needs an additional DTD file to be able to parse the PSI-Blast
>> XML output (Bio.Entrez will tell you which one and where to store it). If
>> so, please let us know, so we can include the required DTDs in the next
>> release of Biopython.
>>
>>
>>
>>
>>
>> --Michiel.
>>
>>
>>
>> --- On Mon, 2/7/11, Brett Bowman <bnbowman at gmail.com> wrote:
>>
>>
>>
>> > From: Brett Bowman <bnbowman at gmail.com>
>>
>> > Subject: [Biopython] Pulling Alignment From PSI-Blast Output
>>
>> > To: biopython at biopython.org
>>
>> > Date: Monday, February 7, 2011, 5:30 PM
>>
>> > I'm trying to use the PSI-Blast
>>
>> > results from a series of proteins to detect
>>
>> > distant homologues, using HMMs of various sorts.
>>
>> > Currently I'm pulling down
>>
>> > the sequence IDs with PSI-Blast, downloading the full
>>
>> > sequences from NCBI,
>>
>> > then aligning everything with ClustalW or Muscle.
>>
>> > However this is eating up
>>
>> > way more processor time than I have to spare, so I want to
>>
>> > just pull the
>>
>> > full multi-sequence alignment from the PSI-blast results if
>>
>> > possible (OUTFMT
>>
>> > option #3 or 4), for use in building the HMMs.  But it
>>
>> > doesn't look like
>>
>> > AlignIO has a module for reading the peculiar format that
>>
>> > PSI-Blast
>>
>> > generates...
>>
>> >
>>
>> > Has this been done before, or will I need to write my own
>>
>> > parser?
>>
>> >
>>
>> > Brett Bowman
>>
>> > Woelk Lab
>>
>> > UCSD School of Medicine
>>
>> > _______________________________________________
>>
>> > Biopython mailing list  -  Biopython at lists.open-bio.org
>>
>> > http://lists.open-bio.org/mailman/listinfo/biopython
>>
>> >
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>
>