[Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl.

Frank Schwach fs5 at sanger.ac.uk
Tue Jan 24 11:43:18 UTC 2012


I think that's an option that you can set when you ask for a new BLAST 
parser:
my $searchio = new Bio::SearchIO(
   -format => 'blast',
   -file   => 't/data/ecolitst.bls',
   -best_hit_only => 1,
);

now when you use the same script that you have been using so far (loop 
over all hits), there will only be one hit per result.

Frank


On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote:
> Hi,
>        Thanks a lot for help Frank. It works for every Blast output.
> One more question is that i want to best hit only(top hit of every query).
> I show there is option called
> $obj->best_hit_only; in Bio::SearchIO module.
> So help to add this to my script.
> I could not do. Its confusing.
>
> Thanks in Advanced.
>
> With best regards,
> Pawan
>
>
> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach<fs5 at sanger.ac.uk>  wrote:
>> Excellent, well done!
>> No, this is the way to do it. In BioPerl modules that use strand information
>> you will find the values +1/-1 or undef. If you want to display those as
>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert it,
>> but you know now how to do it.
>> You have a syntax error in your code where you retrieve the query name:
>>
>>
>> my $QueryName = $result->query_name(), my $QueryDescript =
>> $result->query_description();
>>
>> should be two lines and the comma should be a semicolon.
>>
>> Good luck!
>>
>> Frank
>>
>>
>>
>>
>>
>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote:
>>>
>>> So By using the if else conditon function, I have solve Frank.
>>> I mean is there anyway in bioperl we can get directly using other
>>> module! I hope u got it!
>>>
>>>
>>> So my second Question have not replied that is
>>>
>>> i have blastn report as below:
>>>
>>> BLASTN 2.2.18 [Mar-02-2008]
>>>
>>>
>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>
>>> Query= ORB_1210001_hsa-miR-548aa#5_1
>>>          (24 letters)
>>>
>>> Database: hsa-mmu-rno_miRNA.fa
>>>            3524 sequences; 76,424 total letters
>>>
>>> Searching..................................................done
>>>
>>>
>>>
>>>                                                                  Score    E
>>> Sequences producing significant alignments:                      (bits)
>>> Value
>>>
>>> hsa-miR-548aa
>>> 48   2e-009
>>> hsa-miR-548d-5p
>>> 36   9e-006
>>> hsa-miR-548b-5p
>>> 36   9e-006
>>> hsa-miR-548z
>>> 34   3e-005
>>> hsa-miR-548q
>>> 30   5e-004
>>> hsa-miR-548n
>>> 30   5e-004
>>> hsa-miR-548ab
>>> 28   0.002
>>> hsa-miR-548v
>>> 28   0.002
>>> hsa-miR-548c-5p
>>> 28   0.002
>>> hsa-miR-548ag
>>> 26   0.008
>>> hsa-miR-548u
>>> 26   0.008
>>> hsa-miR-548c-3p
>>> 26   0.008
>>> hsa-miR-603
>>> 26   0.008
>>> hsa-miR-548a-3p
>>> 26   0.008
>>> hsa-miR-548ac
>>> 24   0.033
>>> hsa-miR-548an
>>> 22   0.13
>>> hsa-miR-548aj
>>> 22   0.13
>>> hsa-miR-548i
>>> 22   0.13
>>> hsa-miR-548g
>>> 22   0.13
>>> hsa-miR-548j
>>> 22   0.13
>>> hsa-miR-548a-5p
>>> 22   0.13
>>>
>>>> hsa-miR-548aa
>>>
>>>           Length = 25
>>>
>>>   Score = 48.1 bits (24), Expect = 2e-009
>>>   Identities = 24/24 (100%)
>>>   Strand = Plus / Minus
>>>
>>>
>>> Query: 1  tggtgcaaaagtaattgtggtttt 24
>>>           ||||||||||||||||||||||||
>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2
>>>
>>>
>>>> hsa-miR-548d-5p
>>>
>>>           Length = 22
>>>
>>>   Score = 36.2 bits (18), Expect = 9e-006
>>>   Identities = 18/18 (100%)
>>>   Strand = Plus / Plus
>>>
>>>
>>> Query: 7  aaaagtaattgtggtttt 24
>>>           ||||||||||||||||||
>>> Sbjct: 1  aaaagtaattgtggtttt 18
>>>
>>>
>>>
>>> in this result i could not parse my code. i think my code does not
>>> accept the Query header that is
>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast
>>> output.
>>>
>>> kindly help me out.
>>>
>>> with regards,
>>> Pawan.
>>>
>>> On 1/16/12, Frank Schwach<fs5 at sanger.ac.uk>    wrote:
>>>>
>>>> Hi Pawan ,
>>>>
>>>> Please always "reply to all", so that you keep the discussion on the
>>>> bioperl mailing list and more people can help you.
>>>> What you need is a very basic Perl command. I could give you the code
>>>> but I think you get more out of it if you experiment with it on your own
>>>> because it is very fundamental. I'll point you in the right direction:
>>>> you want an if-then-else conditional construct.
>>>>
>>>> Perl's documentation about this is here:
>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs
>>>>
>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to
>>>> print "MINUS", or else you might want to print "no strand" or something,
>>>> or even treat it as an error and make the script abort.
>>>>
>>>> Give it a go and let us know if you need help. For basic (non-bio) Perl
>>>> question, please also check out the community at
>>>> http://www.perlmonks.org/.
>>>>
>>>> Hope that helps,
>>>>
>>>> Frank
>>>>
>>>>
>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote:
>>>>>
>>>>> Hi frank,
>>>>>
>>>>> Thanks for your kind reply.
>>>>> I could get the vale for query as 1 value if it is plus.
>>>>> and for hit = -1 if it is minus.
>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend.
>>>>>
>>>>> you can see my code as below:
>>>>>
>>>>> while ( my $result = $searchio->next_result() ) {
>>>>>       my $QueryName = $result->query_name(), my $QueryDescript =
>>>>> $result->query_description();
>>>>>       my $QueryLength = $result->query_length;
>>>>>       my $NoHits = $result->num_hits;
>>>>>
>>>>>       while( my $hit = $result->next_hit ) {
>>>>>           my $HitName = $hit->name();
>>>>>           my $HitDescrip = $hit->description();
>>>>>         my $HitLength = $hit->length;
>>>>>           my $Score = $hit->raw_score();
>>>>>         my $Bits = $hit->bits;
>>>>>
>>>>>           my $hsp = $hit->next_hsp; # Only check first (= best) hsp
>>>>>         my $Evalue =  $hsp->evalue();
>>>>>         my $AlnLen = $hsp->num_identical();
>>>>>         my $TotalLen = $hsp->hsp_length;
>>>>>         my $QueryStrand = $hsp->strand('query');
>>>>>         my $HitStrand = $hsp->strand('hit');
>>>>>
>>>>>         if($Evalue<     $cutoff){
>>>>>             print "$QueryName $QueryDescript\t";
>>>>>             print "$QueryLength\t";
>>>>>             print "$NoHits\t";
>>>>>             print "$HitName $HitDescrip\t";
>>>>>             print "$HitLength\t";
>>>>>             print "$Score\t";
>>>>>             print "$Bits\t";
>>>>>             print "$Evalue\t";
>>>>>             print "$AlnLen\t";
>>>>>             print "$TotalLen\t";
>>>>>             print "$QueryStrand\t";
>>>>>             print "$HitStrand\n";
>>>>>         }
>>>>>       }
>>>>>       print "\n";
>>>>> }
>>>>>
>>>>>
>>>>> This is a part of my code.
>>>>>
>>>>> i have blastn report as below:
>>>>>
>>>>> BLASTN 2.2.18 [Mar-02-2008]
>>>>>
>>>>>
>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>> Schaffer,
>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
>>>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1
>>>>>            (24 letters)
>>>>>
>>>>> Database: hsa-mmu-rno_miRNA.fa
>>>>>              3524 sequences; 76,424 total letters
>>>>>
>>>>> Searching..................................................done
>>>>>
>>>>>
>>>>>
>>>>>                                                                    Score
>>>>> E
>>>>> Sequences producing significant alignments:                      (bits)
>>>>> Value
>>>>>
>>>>> hsa-miR-548aa
>>>>> 48   2e-009
>>>>> hsa-miR-548d-5p
>>>>> 36   9e-006
>>>>> hsa-miR-548b-5p
>>>>> 36   9e-006
>>>>> hsa-miR-548z
>>>>> 34   3e-005
>>>>> hsa-miR-548q
>>>>> 30   5e-004
>>>>> hsa-miR-548n
>>>>> 30   5e-004
>>>>> hsa-miR-548ab
>>>>> 28   0.002
>>>>> hsa-miR-548v
>>>>> 28   0.002
>>>>> hsa-miR-548c-5p
>>>>> 28   0.002
>>>>> hsa-miR-548ag
>>>>> 26   0.008
>>>>> hsa-miR-548u
>>>>> 26   0.008
>>>>> hsa-miR-548c-3p
>>>>> 26   0.008
>>>>> hsa-miR-603
>>>>> 26   0.008
>>>>> hsa-miR-548a-3p
>>>>> 26   0.008
>>>>> hsa-miR-548ac
>>>>> 24   0.033
>>>>> hsa-miR-548an
>>>>> 22   0.13
>>>>> hsa-miR-548aj
>>>>> 22   0.13
>>>>> hsa-miR-548i
>>>>> 22   0.13
>>>>> hsa-miR-548g
>>>>> 22   0.13
>>>>> hsa-miR-548j
>>>>> 22   0.13
>>>>> hsa-miR-548a-5p
>>>>> 22   0.13
>>>>>
>>>>>> hsa-miR-548aa
>>>>>
>>>>>             Length = 25
>>>>>
>>>>>    Score = 48.1 bits (24), Expect = 2e-009
>>>>>    Identities = 24/24 (100%)
>>>>>    Strand = Plus / Minus
>>>>>
>>>>>
>>>>> Query: 1  tggtgcaaaagtaattgtggtttt 24
>>>>>             ||||||||||||||||||||||||
>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2
>>>>>
>>>>>
>>>>>> hsa-miR-548d-5p
>>>>>
>>>>>             Length = 22
>>>>>
>>>>>    Score = 36.2 bits (18), Expect = 9e-006
>>>>>    Identities = 18/18 (100%)
>>>>>    Strand = Plus / Plus
>>>>>
>>>>>
>>>>> Query: 7  aaaagtaattgtggtttt 24
>>>>>             ||||||||||||||||||
>>>>> Sbjct: 1  aaaagtaattgtggtttt 18
>>>>>
>>>>>
>>>>>
>>>>> in this result i could not parse my code. i think my code does not
>>>>> accept the Query header that is
>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast
>>>>> output.
>>>>>
>>>>> kindly help me out.
>>>>>
>>>>> with regards,
>>>>> Pawan.
>>>>>
>>>>>
>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach<fs5 at sanger.ac.uk>
>>>>> wrote:
>>>>>>
>>>>>> Hi Pawan,
>>>>>>
>>>>>> Can you show your code? Is it basically following the structure shown
>>>>>> in
>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>>>>> ?
>>>>>>
>>>>>> If that is the case
>>>>>>
>>>>>> $hsp->strand('query')
>>>>>>
>>>>>>
>>>>>> is exactly what you need.
>>>>>> To check if hit and query are on different strands you can do:
>>>>>>
>>>>>> if ( $hsp->strand('query')
>>>>>> * $hsp->strand('hit') == -1){
>>>>>>
>>>>>>    # do whatever you need to do if they are on opposite strands
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Hope that helps
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>                Using Bio::SearchIO module I am parsing the following
>>>>>>> Blast
>>>>>>> result.
>>>>>>> I have used the option- $hsp->strand('query').
>>>>>>>
>>>>>>> But I cannot get detail of alignment.
>>>>>>>
>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus)
>>>>>>> or reverse ( Strand = Plus / Minus)...
>>>>>>>    Can anyone help me to get report as Plus or Minus for query  or hit.
>>>>>>>
>>>>>>> thanks in advanced.
>>>>>>>
>>>>>>> With regards,
>>>>>>> Pawan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> BLASTN 2.2.18 [Dec-23-2011]
>>>>>>>
>>>>>>>
>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>>>> Schaffer,
>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database
>>>>>>> search
>>>>>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>>>
>>>>>>> Query= 000013_c10079-9984
>>>>>>>            (50 letters)
>>>>>>>
>>>>>>> Database: Cyano_Probe.fasta
>>>>>>>              4760 sequences; 238,000 total letters
>>>>>>>
>>>>>>> Searching..................................................done
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Score
>>>>>>>    E
>>>>>>> Sequences producing significant alignments:
>>>>>>>   (bits)
>>>>>>> Value
>>>>>>>
>>>>>>> 000013_c10079-9984
>>>>>>> 100   7e-024
>>>>>>> 002619_2689273-2690037
>>>>>>> 24   0.36
>>>>>>> 001126_c1123720-1123385
>>>>>>> 24   0.36
>>>>>>> 003211_c3326737-3326480
>>>>>>> 22   1.4
>>>>>>> 002415_2471082-2471420
>>>>>>> 22   1.4
>>>>>>> 002269_2321276-2322463
>>>>>>> 22   1.4
>>>>>>> 001328_c1326535-1326164
>>>>>>> 22   1.4
>>>>>>>
>>>>>>>> 000013_c10079-9984
>>>>>>>
>>>>>>>             Length = 50
>>>>>>>
>>>>>>>    Score = 99.6 bits (50), Expect = 7e-024
>>>>>>>    Identities = 50/50 (100%)
>>>>>>>    Strand = Plus / Plus
>>>>>>>
>>>>>>>
>>>>>>> Query: 1  agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50
>>>>>>>             ||||||||||||||||||||||||||||||||||||||||||||||||||
>>>>>>> Sbjct: 1  agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>> --
>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>> Limited,
>>>>>> a charity registered in England with number 1021457 and a company
>>>>>> registered
>>>>>> in England with number 2742969, whose registered office is 215 Euston
>>>>>> Road,
>>>>>> London, NW1 2BE.
>>>>
>>>>
>>>> --
>>>>   The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>   Limited, a charity registered in England with number 1021457 and a
>>>>   company registered in England with number 2742969, whose registered
>>>>   office is 215 Euston Road, London, NW1 2BE.
>>>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
>> a charity registered in England with number 1021457 and a company registered
>> in England with number 2742969, whose registered office is 215 Euston Road,
>> London, NW1 2BE.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list