[Bioperl-l] Experiences from a newbie

Brian Osborne brian_osborne at cognia.com
Sat Dec 13 00:24:45 EST 2003


Andreas,

>When I told blast, not to omit the DETAIL list, everything worked fine.

Interesting. I'd guess that when you just ask BLAST for hits the parsing
would simply not work correctly. But for an accurate understanding of the
internals we'll need to hear from someone like Jason!

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Andreas Bernauer
Sent: Friday, December 12, 2003 2:39 PM
To: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Experiences from a newbie

Brian Osborne wrote:
> Andreas,
>
> >As I din't want to look at the actual sequences and only needed the
> >hits themselves, I told blast to omit the detailed listing of the
> >alignments.  This turned out to be a mistake, as for some reason that
> >I can never think of, I can only get HITS in $result, when I include
> >the alignments (which are accessed via HSP which I never use).  Again
> >something that I cannot access with my logic.
>
> I couldn't understand this. Could you restate this? or tell us exactly
what
> information you wanted to get from the Hit object?

0Sure:

Let's have a look at this excerpt from a BLAST search result:


BLASTP 2.2.1 [Apr-13-2001]

## header omitted

## This is the HIT LIST:
                                                                   Score
E
Sequences producing significant alignments:                        (bits)
Value

gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Pr...    63
2e-10
gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Pro...    63
2e-10
## ... etc. etc.   ...
gi|33237342|ref|AAP99410.1| Septum formation inhibitor [ Prochlo...    31
0.60


## this is the DETAIL LIST:

>gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [
           Prochlorococcus marinus subsp. marinus str. CCMP1375 ]
          Length = 385

 Score = 62.8 bits (151), Expect = 2e-10
 Identities = 71/348 (20%), Positives = 153/348 (43%), Gaps = 45/348 (12%)

Query: 38  VKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEE---IITIQEKDQTLLVKTKKTSINLN 94
           ++ +L+ +I+    G++ V  K+F ++I+ +  E    ++  +  + + +K+K  +  +
Sbjct: 55  IQTSLSASIE--SSGAITVPSKLFGEIISKLSSESSITLSTDDSSEQVNLKSKSGNYQVR 112

## etc. etc.

Query: 315 NISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVP 362
            I+FN   LL+ ++  E+N I   F   +   + T   E     +++P
Sbjct: 333 QIAFNSRYLLEGLKIIETNTILLKFNAPTTPAIFTPNDETNFVYLVMP 380


>gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain
           [Prochlorococcus marinus subsp. pastoris str. CCMP1378]
          Length = 385

 Score = 62.8 bits (151), Expect = 2e-10
 Identities = 74/323 (22%), Positives = 150/323 (45%), Gaps = 38/323 (11%)

Query: 36  FSVKCNLNKNID--ILEQGSLIVKGKIFNDLINGIKEEI---ITIQEKDQTLLVKTKKTS 90
           F +   +  + D  + + G++ +  K+ ++++N +  E    + + E    +L+K+ + S
Sbjct: 49  FDLNLGIQTSFDATVNKSGAITIPSKLLSEIVNKLPSETPVSLDVDESSDNILIKSDRGS 108

## etc. etc.


I only need the gi number of the protein that was hit, the score and
the evalue.  I don't need the alignment or anything else from the
DETAIL list.  So I told blast to omit the detail list.  (Idea: I am
searching through several whole genomes, so this would also save me a
significant amount of drive space.)  But when I read the blast file
with the hit list omitted, bioperl didn't report any hits at all,
although the file still contained the hit list.

I found this counterintuitive, as I expected the HIT object to give me
the members of the HIT list, and the HSP object to give me more
information about the alignment, i.e. informations about the members
of the DETAIL list.

When I told blast, no to omit the DETAIL list, everything worked fine.

As I've written previously, I don't use the HSP object at all (get_gi
extracts the gi number, $blastId refers to the blast $file):

use Bio::SearchIO;
# ...omitted for breviety...
my $report = new Bio::SearchIO (-format => 'blast',
                          -file   => $file);

while (my $result = $report->next_result ) {
  print "Next ($blastId) result (" . $result->num_hits . " hits)\n";
  my $hitNo = 0;
  while (my $hit = $result->next_hit) {
    $hitNo++;
    printf "%s\t%s\t%s\t%d\t%d\n",
      get_gi($result->query_name), get_gi($hit->name),
          $hit->significance, $blastId, $hitNo;
  }
}




Andreas.




More information about the Bioperl-l mailing list