[Biopython] Pulling Alignment From PSI-Blast Output

Ruchira Datta ruchira.datta at gmail.com
Tue Feb 8 20:54:34 UTC 2011


Hi Brett,

I have run HHmake numerous times myself and the overall script does include
running PSI-BLAST.  I suspect they didn't include it in the paper because
it's not part of their algorithm which they invented.

>From the hh_1.5.1 version, from the script buildali.pl, here are some
relevant lines:

my $blastpgp=$ncbidir."/blastpgp -I T -s T"; # blastpgp executable

our $blastpgp = $blastpgp . " -I T -s T"; # show gi's in defline; use
Smith-Waterman

    # Fast psiblast search for very similar sequences
    &System("$blastpgp -I T -b 10 -v 10 -e 1e-6 -A 10 -f 15 -d $dbfile -i
$tmp.seq &> $tmp.bla");

  if ($nseqin<=1) {
      &blastpgp("-e $Eult -d $db -i $seqfile","$blafile");
  } else {
      if ($core==1) {
    &blastpgp("-e $Eult -d $db -i $seqfile","$blafile");
    &System("perl $perl/alignhits.pl -cov $covcore -e $E2 $qid $bcore
$pmaxopt -best -psi -q $seqfile $blafile $coreali",$v2);
    if ($bcore ne "") {$bcore.=" -B $coreali";} # From here on use $coreali
for end pruning of $coreali
      }
      &blastpgp("-e $Eult -d $db -i $seqfile -B $psifile","$blafile");
  }

It's possible you could *only* use their script alignhits.pl:

#! /usr/bin/perl -w
# Extract a multiple alignment of hits from Blast or PsiBlast output
# Usage:   alignhits.pl [options] blast.out alignment-file

Very sorry for subjecting you to Perl, but it does seem like it might be
helpful in your case. :-)

--Ruchira

On Tue, Feb 8, 2011 at 12:28 PM, Brett Bowman <bnbowman at gmail.com> wrote:

>  It might be there implicitly in HSPs (high-scoring segment pairs).
>> Trying to make a multiple sequence alignment out of these is a huge pain.
>>
>
>  It isn't, I checked - the alignments are only pairwise, so combining them
> produces non-sense.  Thats why I have been using Muscle and ClustalW to
> create an alignment after the Blast, despite the large computational cost.
>
>
>> Brett, might you not run all of HHmake (which includes running
>> PSI-BLAST)?  It extracts the multiple sequence alignment.
>>
>
> HHmake does not include PSI-Blast - all HHmake does is convert a multiple
> sequence alignment to an HMM profile.  The original paper that inspired this
> project used PSI-Blast and then used HHmake, but make no mention of
> alignments.  I have sent the authors an e-mail for clarification, but have
> not received a reply.  But from the way the paper was written, it does not
> appear that they used an alignment program, hence my confusion at my
> inability to do the same.
>
> -Brett
>
> On Tue, Feb 8, 2011 at 9:59 AM, Ruchira Datta <ruchira.datta at gmail.com>wrote:
>
>>
>>
>> On Tue, Feb 8, 2011 at 4:05 AM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:
>>
>>> I am surprised that the multiple alignment is not in the XML at all. It
>>> can not be constructed from the information in the XML?
>>
>>
>> It might be there implicitly in HSPs (high-scoring segment pairs).  Trying
>> to make a multiple sequence alignment out of these is a huge pain.
>>
>> Brett, might you not run all of HHmake (which includes running
>> PSI-BLAST)?  It extracts the multiple sequence alignment.  (It might "clean
>> up" after itself by deleting it -- you might need to go in and take that
>> line out.)  Then, if you wanted to do more stuff to the MSA before running
>> HHmake, you could just discard the rest of the results of the first run and
>> run HHmake again at the point that you want.
>>
>> --Ruchira
>>
>> Anyway, if it is in there, I would suggest to use Bio.Entrez to parse the
>>> XML instead of the parser in Bio.Blast. The Bio.Entrez parser will give you
>>> all the information in the XML; the parser in Bio.Blast is more polished but
>>> may not give you all the information present in the PSI-Blast output.
>>>
>>> --Michiel.
>>>
>>> --- On Tue, 2/8/11, Brett Bowman <bnbowman at gmail.com> wrote:
>>>
>>> From: Brett Bowman <bnbowman at gmail.com>
>>> Subject: Re: [Biopython] Pulling Alignment From PSI-Blast Output
>>> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
>>> Cc: biopython at biopython.org
>>> Date: Tuesday, February 8, 2011, 2:40 AM
>>>
>>> I thought about that, but there doesn't appear to be any
>>> multiple-alignment data in the XML file - just pair-wise alignments of the
>>> query with each hit.  In addition, when I parse the output file with NCBIXML
>>> I get a Bio.Blast.Record.Blast object, instead of a
>>> Bio.Blast.Record.PSIBlast object.  The Biopython cookbook describes how to
>>> work with a PSIBlast object, but it doesn't really cover how to make one...
>>>
>>>
>>>
>>> On Mon, Feb 7, 2011 at 5:20 PM, Michiel de Hoon <mjldehoon at yahoo.com>
>>> wrote:
>>>
>>>
>>> One option you could try is to let PSI-Blast generate its output in XML
>>> and check if the information you need is present in the XML. If it is, you
>>> can parse the XML with the read() function in Bio.Entrez. You may find that
>>> Bio.Entrez needs an additional DTD file to be able to parse the PSI-Blast
>>> XML output (Bio.Entrez will tell you which one and where to store it). If
>>> so, please let us know, so we can include the required DTDs in the next
>>> release of Biopython.
>>>
>>>
>>>
>>>
>>>
>>> --Michiel.
>>>
>>>
>>>
>>> --- On Mon, 2/7/11, Brett Bowman <bnbowman at gmail.com> wrote:
>>>
>>>
>>>
>>> > From: Brett Bowman <bnbowman at gmail.com>
>>>
>>> > Subject: [Biopython] Pulling Alignment From PSI-Blast Output
>>>
>>> > To: biopython at biopython.org
>>>
>>> > Date: Monday, February 7, 2011, 5:30 PM
>>>
>>> > I'm trying to use the PSI-Blast
>>>
>>> > results from a series of proteins to detect
>>>
>>> > distant homologues, using HMMs of various sorts.
>>>
>>> > Currently I'm pulling down
>>>
>>> > the sequence IDs with PSI-Blast, downloading the full
>>>
>>> > sequences from NCBI,
>>>
>>> > then aligning everything with ClustalW or Muscle.
>>>
>>> > However this is eating up
>>>
>>> > way more processor time than I have to spare, so I want to
>>>
>>> > just pull the
>>>
>>> > full multi-sequence alignment from the PSI-blast results if
>>>
>>> > possible (OUTFMT
>>>
>>> > option #3 or 4), for use in building the HMMs.  But it
>>>
>>> > doesn't look like
>>>
>>> > AlignIO has a module for reading the peculiar format that
>>>
>>> > PSI-Blast
>>>
>>> > generates...
>>>
>>> >
>>>
>>> > Has this been done before, or will I need to write my own
>>>
>>> > parser?
>>>
>>> >
>>>
>>> > Brett Bowman
>>>
>>> > Woelk Lab
>>>
>>> > UCSD School of Medicine
>>>
>>> > _______________________________________________
>>>
>>> > Biopython mailing list  -  Biopython at lists.open-bio.org
>>>
>>> > http://lists.open-bio.org/mailman/listinfo/biopython
>>>
>>> >
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>>
>>
>>
>



More information about the Biopython mailing list