[Bioperl-l] While loop - SearchIO for BioPerl

Mark A. Jensen maj at fortinbras.us
Thu Jul 9 01:51:40 UTC 2009


Allow me to shamelessly plug the following:

http://www.bioperl.org/wiki/HOWTO:Tiling#Quick_and_Dirty_.22Tiling.22

MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Rytsareva, Inna (I)" <IRytsareva at dow.com>; <bioperl-l at lists.open-bio.org>
Sent: Wednesday, July 08, 2009 9:41 PM
Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl


> Yep, that's what I was thinking.  The fragment in question is fairly  
> short.
> 
> Inna, if you want the best HSP you could just grab the one that best  
> fits what you expect (best eval, score, whatever).
> 
> chris
> 
> On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote:
> 
>> A lack of low-complexity filtering  (as seems apparent from this  
>> report snippet, if
>> I understand that concept correctly) could explain the multiple  
>> query hits on a
>> short (24bp) region of the same subject...
>> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
>> >
>> To: "Rytsareva, Inna (I)" <IRytsareva at dow.com>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, July 08, 2009 9:08 PM
>> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl
>>
>>
>>> I'm curious as to what this report looks like.  The example report  
>>> you  posted to the gbrowse list had serious issues (different  
>>> problem, 'No  midline' error which I replicated); mainly there were  
>>> no blank lines  making it pretty much invalid, so the parser had  
>>> issues with it.   Example lines from one HSP:
>>>
>>> > gnl|DAS|24699 pDAB101580
>>>          Length = 12942
>>> Score = 50.1 bits (25), Expect = 5e-06
>>> Identities = 37/41 (90%)
>>> Strand = Plus / Plus
>>> Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>>>            ||||||||||||||| ||| |||||||| ||||| ||||||
>>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>> Score = 46.1 bits (23), Expect = 8e-05
>>> Identities = 35/39 (89%)
>>> Strand = Plus / Plus
>>> Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>>>            ||||||||||||| ||| |||||||| ||||| ||||||
>>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>> Score = 46.1 bits (23), Expect = 8e-05
>>> Identities = 35/39 (89%)
>>> Strand = Plus / Plus
>>> Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>>>            ||||||||||||| ||| |||||||| ||||| ||||||
>>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>>
>>> ...
>>>
>>> chris
>>>
>>>
>>>
>>>
>>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a follow script to parse the BLAST  report:
>>>>
>>>> my $in = Bio::SearchIO->new (  -file =>$out_file,
>>>> -format =>'blast') or die $!;
>>>>
>>>> while (my $result = $in->next_result) {
>>>> while (my $hit = $result->next_hit)
>>>> {
>>>> while (my $hsp = $hit->next_hsp) {
>>>> $qhit = $hit->name;
>>>> $start = $hsp->hit->start;
>>>> $end = $hsp->hit->end;
>>>> }
>>>>
>>>>
>>>> } print "Hit= ", $qhit,
>>>> ",Start = ", $start,
>>>> ",End = ", $end,"\n"; }
>>>>
>>>> Usually, the report has a number of the same hsp for each hit.
>>>> Using "print" command it gives me a hit name, start and end  
>>>> positions
>>>> for each hit, except last on. For last one it prints all the hsps.
>>>> Something like this:
>>>>
>>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>>>> Hit= gnl|DAS|4276,Start = 557,End = 580
>>>> Hit= gnl|DAS|12959,Start = 801,End = 824
>>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>>
>>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>>>> I don't need these duplicates.
>>>> How can I fix that?
>>>>
>>>> Thanks,
>>>> Inna Rytsareva
>>>> Discovery Information Management
>>>> Dow AgroSciences
>>>> Indianapolis, IN
>>>> 317-337-4716
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
> 
> 
>



More information about the Bioperl-l mailing list