[Bioperl-l] While loop - SearchIO for BioPerl

Mark A. Jensen maj at fortinbras.us
Thu Jul 9 01:31:27 UTC 2009


A lack of low-complexity filtering  (as seems apparent from this report snippet, 
if
I understand that concept correctly) could explain the multiple query hits on a
short (24bp) region of the same subject...
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Rytsareva, Inna (I)" <IRytsareva at dow.com>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, July 08, 2009 9:08 PM
Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl


> I'm curious as to what this report looks like.  The example report you  posted 
> to the gbrowse list had serious issues (different problem, 'No  midline' error 
> which I replicated); mainly there were no blank lines  making it pretty much 
> invalid, so the parser had issues with it.   Example lines from one HSP:
>
> > gnl|DAS|24699 pDAB101580
>           Length = 12942
>  Score = 50.1 bits (25), Expect = 5e-06
>  Identities = 37/41 (90%)
>  Strand = Plus / Plus
> Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>             ||||||||||||||| ||| |||||||| ||||| ||||||
> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>  Score = 46.1 bits (23), Expect = 8e-05
>  Identities = 35/39 (89%)
>  Strand = Plus / Plus
> Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>             ||||||||||||| ||| |||||||| ||||| ||||||
> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>  Score = 46.1 bits (23), Expect = 8e-05
>  Identities = 35/39 (89%)
>  Strand = Plus / Plus
> Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>             ||||||||||||| ||| |||||||| ||||| ||||||
> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>
> ...
>
> chris
>
>
>
>
> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>
>> Hello,
>>
>> I have a follow script to parse the BLAST  report:
>>
>> my $in = Bio::SearchIO->new (  -file =>$out_file,
>> -format =>'blast') or die $!;
>>
>> while (my $result = $in->next_result) {
>> while (my $hit = $result->next_hit)
>> {
>> while (my $hsp = $hit->next_hsp) {
>> $qhit = $hit->name;
>> $start = $hsp->hit->start;
>> $end = $hsp->hit->end;
>> }
>>
>>
>> } print "Hit= ", $qhit,
>> ",Start = ", $start,
>> ",End = ", $end,"\n"; }
>>
>> Usually, the report has a number of the same hsp for each hit.
>> Using "print" command it gives me a hit name, start and end positions
>> for each hit, except last on. For last one it prints all the hsps.
>> Something like this:
>>
>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>> Hit= gnl|DAS|4276,Start = 557,End = 580
>> Hit= gnl|DAS|12959,Start = 801,End = 824
>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>
>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>> I don't need these duplicates.
>> How can I fix that?
>>
>> Thanks,
>> Inna Rytsareva
>> Discovery Information Management
>> Dow AgroSciences
>> Indianapolis, IN
>> 317-337-4716
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 




More information about the Bioperl-l mailing list