[Bioperl-l] Parsing strand and frame for blastx results using SearchIO

Roy Chaudhuri roy.chaudhuri at gmail.com
Wed Jul 23 16:13:41 UTC 2008


There's an explanation in the BioPerl FAQ:
http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_frame_for_a_translated_search.3F

The problem with the strand being reported as 0 is because you are 
asking for the strand of the hit (protein) rather than the query (DNA).

Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.

michael watson (IAH-C) wrote:
> Hello
> 
> Am currently using bioperl-1.5.2_102 and BLASTX 2.2.17 [Aug-26-2007],
> but this also holds true in 1.5.1 and 1.4.0.
> 
> I have a results file that is pasted below, and a script that is also
> pasted below - just wondering how searchIO thinks the strand is 0 and
> the frame is 2?
> 
> Result:
> BLASTX 2.2.17 [Aug-26-2007]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
> Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= lcl|chr2 42223489  - 44537930
>          (2,314,442 letters)
> 
> Database: ../chemokine_receptors_aa.txt 
>            15 sequences; 5543 total letters
> 
> Searching..................................................done
> 
> 
> 
>                                                                  Score
> E
> Sequences producing significant alignments:                      (bits)
> Value
> 
> chCCRc
> 727   0.0  
> 
>> chCCRc 
>           Length = 377
> 
>  Score =  727 bits (1877), Expect = 0.0
>  Identities = 362/377 (96%), Positives = 362/377 (96%)
>  Frame = -3
> 
> Query: 496563
> MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 496384
>  
> MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK
> Sbjct: 1
> MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 60
> 
> Query: 496383
> SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 496204
>  
> SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY
> Sbjct: 61
> SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 120
> 
> Query: 496203
> ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 496024
>  
> ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS
> Sbjct: 121
> ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 180
> 
> Query: 496023
> LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 495844
>  
> LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL
> Sbjct: 181
> LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 240
> 
> Query: 495843
> IHCYAQILRNLRKCKNQNXXXXXXXXXXXXXXXFLFWTPFNVVLFLDSLQSLLIIDNCQA 495664
>               IHCYAQILRNLRKCKNQN
> FLFWTPFNVVLFLDSLQSLLIIDNCQA
> Sbjct: 241
> IHCYAQILRNLRKCKNQNKIKAIKMIFIIVIVFFLFWTPFNVVLFLDSLQSLLIIDNCQA 300
> 
> Query: 495663
> SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 495484
>  
> SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS
> Sbjct: 301
> SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 360
> 
> Query: 495483 SLVLSQISGCSDSAGVL 495433
>               SLVLSQISGCSDSAGVL
> Sbjct: 361    SLVLSQISGCSDSAGVL 377
> 
> 
>   Database: ../chemokine_receptors_aa.txt
>     Posted date:  Jul 23, 2008  2:43 PM
>   Number of letters in database: 5543
>   Number of sequences in database:  15
>   
> Lambda     K      H
>    0.318    0.134    0.401 
> 
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140 
> 
> 
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 15
> Number of Hits to DB: 28,029,184
> Number of extensions: 721812
> Number of successful extensions: 2651
> Number of sequences better than 1.0e-05: 15
> Number of HSP's gapped: 2358
> Number of HSP's successfully gapped: 148
> Length of query: 771480
> Length of database: 5543
> Length adjustment: 102
> Effective length of query: 771378
> Effective length of database: 4013
> Effective search space: 3095539914
> Effective search space used: 3095539914
> Neighboring words threshold: 12
> Window for multiple hits: 40
> X1: 16 ( 7.3 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.7 bits)
> S2: 33 (17.3 bits)
> 
> 
> Script:
> #!/usr/bin/perl
> 
> use lib "/usr/local/bioperl-1.5.2_102";
> use Bio::SearchIO;
> 
> my $searchio = Bio::SearchIO->new(-format => 'blast', -file =>
> "test.blast");
> 
> while (my($result) = $searchio->next_result()) {
> 	unless (defined $result) {
> 		# there is an undefined result at the end of SearchIO
> objects
> 		last;
> 	}
> 
> 	while(my $hit = $result->next_hit) {
> 		while (my $hsp = $hit->next_hsp) {
> 		
> 			print $hit->name, "\t";
> 			print $hsp->strand('hit'), "\t";
> 			print $hsp->frame, "\n";
> 		}
> 	}
> }
> 
> Output:
> -bash-3.00$ perl parse_test_blast.pl
> chCCRc  0       2
> 
> 
> Head of Informatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
> 
> http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml
> 
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. 
> If you have received this message in error please delete it & notify the
> originator immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful. 
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute. 
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes. 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list