[Bioperl-l] Parsing strand and frame for blastx results using SearchIO

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed Jul 23 15:49:21 UTC 2008


Hello

Am currently using bioperl-1.5.2_102 and BLASTX 2.2.17 [Aug-26-2007],
but this also holds true in 1.5.1 and 1.4.0.

I have a results file that is pasted below, and a script that is also
pasted below - just wondering how searchIO thinks the strand is 0 and
the frame is 2?

Result:
BLASTX 2.2.17 [Aug-26-2007]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= lcl|chr2 42223489  - 44537930
         (2,314,442 letters)

Database: ../chemokine_receptors_aa.txt 
           15 sequences; 5543 total letters

Searching..................................................done



                                                                 Score
E
Sequences producing significant alignments:                      (bits)
Value

chCCRc
727   0.0  

>chCCRc 
          Length = 377

 Score =  727 bits (1877), Expect = 0.0
 Identities = 362/377 (96%), Positives = 362/377 (96%)
 Frame = -3

Query: 496563
MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 496384
 
MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK
Sbjct: 1
MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 60

Query: 496383
SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 496204
 
SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY
Sbjct: 61
SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 120

Query: 496203
ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 496024
 
ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS
Sbjct: 121
ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 180

Query: 496023
LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 495844
 
LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL
Sbjct: 181
LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 240

Query: 495843
IHCYAQILRNLRKCKNQNXXXXXXXXXXXXXXXFLFWTPFNVVLFLDSLQSLLIIDNCQA 495664
              IHCYAQILRNLRKCKNQN
FLFWTPFNVVLFLDSLQSLLIIDNCQA
Sbjct: 241
IHCYAQILRNLRKCKNQNKIKAIKMIFIIVIVFFLFWTPFNVVLFLDSLQSLLIIDNCQA 300

Query: 495663
SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 495484
 
SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS
Sbjct: 301
SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 360

Query: 495483 SLVLSQISGCSDSAGVL 495433
              SLVLSQISGCSDSAGVL
Sbjct: 361    SLVLSQISGCSDSAGVL 377


  Database: ../chemokine_receptors_aa.txt
    Posted date:  Jul 23, 2008  2:43 PM
  Number of letters in database: 5543
  Number of sequences in database:  15
  
Lambda     K      H
   0.318    0.134    0.401 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 15
Number of Hits to DB: 28,029,184
Number of extensions: 721812
Number of successful extensions: 2651
Number of sequences better than 1.0e-05: 15
Number of HSP's gapped: 2358
Number of HSP's successfully gapped: 148
Length of query: 771480
Length of database: 5543
Length adjustment: 102
Effective length of query: 771378
Effective length of database: 4013
Effective search space: 3095539914
Effective search space used: 3095539914
Neighboring words threshold: 12
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 33 (17.3 bits)


Script:
#!/usr/bin/perl

use lib "/usr/local/bioperl-1.5.2_102";
use Bio::SearchIO;

my $searchio = Bio::SearchIO->new(-format => 'blast', -file =>
"test.blast");

while (my($result) = $searchio->next_result()) {
	unless (defined $result) {
		# there is an undefined result at the end of SearchIO
objects
		last;
	}

	while(my $hit = $result->next_hit) {
		while (my $hsp = $hit->next_hsp) {
		
			print $hit->name, "\t";
			print $hsp->strand('hit'), "\t";
			print $hsp->frame, "\n";
		}
	}
}

Output:
-bash-3.00$ perl parse_test_blast.pl
chCCRc  0       2


Head of Informatics
Institute for Animal Health
Compton
Berks
RG20 7NN
01635 578411 

http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. 
If you have received this message in error please delete it & notify the
originator immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 




More information about the Bioperl-l mailing list