[Bioperl-l] Experiences from a newbie

Andreas Bernauer andreas.bernauer at gmx.de
Fri Dec 12 14:39:19 EST 2003


Brian Osborne wrote:
> Andreas,
> 
> >As I din't want to look at the actual sequences and only needed the
> >hits themselves, I told blast to omit the detailed listing of the
> >alignments.  This turned out to be a mistake, as for some reason that
> >I can never think of, I can only get HITS in $result, when I include
> >the alignments (which are accessed via HSP which I never use).  Again
> >something that I cannot access with my logic.
> 
> I couldn't understand this. Could you restate this? or tell us exactly what
> information you wanted to get from the Hit object?

0Sure:

Let's have a look at this excerpt from a BLAST search result:


BLASTP 2.2.1 [Apr-13-2001]

## header omitted

## This is the HIT LIST:
                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [ Pr...    63  2e-10
gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain [Pro...    63  2e-10
## ... etc. etc.   ... 
gi|33237342|ref|AAP99410.1| Septum formation inhibitor [ Prochlo...    31  0.60


## this is the DETAIL LIST:

>gi|33236978|ref|AAP99047.1| DNA polymerase III beta subunit [
           Prochlorococcus marinus subsp. marinus str. CCMP1375 ]
          Length = 385

 Score = 62.8 bits (151), Expect = 2e-10
 Identities = 71/348 (20%), Positives = 153/348 (43%), Gaps = 45/348 (12%)

Query: 38  VKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEE---IITIQEKDQTLLVKTKKTSINLN 94
           ++ +L+ +I+    G++ V  K+F ++I+ +  E    ++  +  + + +K+K  +  + 
Sbjct: 55  IQTSLSASIE--SSGAITVPSKLFGEIISKLSSESSITLSTDDSSEQVNLKSKSGNYQVR 112

## etc. etc. 

Query: 315 NISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVP 362
            I+FN   LL+ ++  E+N I   F   +   + T   E     +++P
Sbjct: 333 QIAFNSRYLLEGLKIIETNTILLKFNAPTTPAIFTPNDETNFVYLVMP 380


>gi|33860561|ref|NP_892122.1| DNA polymerase III, beta chain
           [Prochlorococcus marinus subsp. pastoris str. CCMP1378]
          Length = 385

 Score = 62.8 bits (151), Expect = 2e-10
 Identities = 74/323 (22%), Positives = 150/323 (45%), Gaps = 38/323 (11%)

Query: 36  FSVKCNLNKNID--ILEQGSLIVKGKIFNDLINGIKEEI---ITIQEKDQTLLVKTKKTS 90
           F +   +  + D  + + G++ +  K+ ++++N +  E    + + E    +L+K+ + S
Sbjct: 49  FDLNLGIQTSFDATVNKSGAITIPSKLLSEIVNKLPSETPVSLDVDESSDNILIKSDRGS 108

## etc. etc.


I only need the gi number of the protein that was hit, the score and
the evalue.  I don't need the alignment or anything else from the
DETAIL list.  So I told blast to omit the detail list.  (Idea: I am
searching through several whole genomes, so this would also save me a
significant amount of drive space.)  But when I read the blast file
with the hit list omitted, bioperl didn't report any hits at all,
although the file still contained the hit list.

I found this counterintuitive, as I expected the HIT object to give me
the members of the HIT list, and the HSP object to give me more
information about the alignment, i.e. informations about the members
of the DETAIL list.

When I told blast, no to omit the DETAIL list, everything worked fine.

As I've written previously, I don't use the HSP object at all (get_gi
extracts the gi number, $blastId refers to the blast $file):

use Bio::SearchIO;
# ...omitted for breviety...
my $report = new Bio::SearchIO (-format => 'blast',
      			  -file   => $file);

while (my $result = $report->next_result ) {
  print "Next ($blastId) result (" . $result->num_hits . " hits)\n";
  my $hitNo = 0;
  while (my $hit = $result->next_hit) {
    $hitNo++;
    printf "%s\t%s\t%s\t%d\t%d\n",
      get_gi($result->query_name), get_gi($hit->name), 
          $hit->significance, $blastId, $hitNo;
  }
}




Andreas.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20031212/42bd40b8/attachment-0001.bin


More information about the Bioperl-l mailing list