[Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?)

Mark A. Jensen maj at fortinbras.us
Tue May 12 16:06:56 EDT 2009


Patch below (to SearchUtils.pm) fixes the non-numeric warnings on Dan's data, 
but something deeper may be going on.
Also get many of the following warnings, haven't looked at it closely:

--------------------- WARNING ---------------------
MSG: Removing score value(s)
---------------------------------------------------

PATCH:

Index: SearchUtils.pm
===================================================================
--- SearchUtils.pm (revision 15674)
+++ SearchUtils.pm (working copy)
@@ -252,8 +252,8 @@
         }

         $qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_->{'stop'} - 
$_->{'start'} + 1;
-        $qctg_dat{ "$frame$strand" }->{'totalIdentical'}   += $_->{'iden'};
-        $qctg_dat{ "$frame$strand" }->{'totalConserved'}   += $_->{'cons'};
+        $qctg_dat{ "$frame$strand" }->{'totalIdentical'}   += $_->{'iden'} || 
0;
+        $qctg_dat{ "$frame$strand" }->{'totalConserved'}   += $_->{'cons'} || 
0;
         $qctg_dat{ "$frame$strand" }->{'qstrand'}   = $strand;
     }

@@ -407,9 +407,12 @@
             };
             if($@) { warn "\a\n$@\n"; }
             else {
+  # make sure numerical
+  $_->{'iden'} ||= 0;
+  $_->{'cons'} ||= 0;
                 $_->{'start'} = $start; # Assign a new start coordinate to the 
contig
-                $_->{'iden'} += $numID; # and add new data to #identical, 
#conserved.
-                $_->{'cons'} += $numCons;
+                $_->{'iden'} += ($numID||0); # and add new data to #identical, 
#conserved.
+                $_->{'cons'} += ($numCons||0);
                 push(@{$_->{hsps}}, $hsp);
                 $overlap     = 1;
             }
@@ -424,9 +427,13 @@
             };
             if($@) { warn "\a\n$@\n"; }
             else {
+  # make sure numerical
+  $_->{'iden'} ||= 0;
+  $_->{'cons'} ||= 0;
+
                 $_->{'stop'}  = $stop; # Assign a new stop coordinate to the 
contig
-                $_->{'iden'} += $numID; # and add new data to #identical, 
#conserved.
-                $_->{'cons'} += $numCons;
+                $_->{'iden'} += ($numID||0); # and add new data to #identical, 
#conserved.
+                $_->{'cons'} += ($numCons||0);
                 push(@{$_->{hsps}}, $hsp);
                 $overlap    = 1;
             }
@@ -461,8 +468,8 @@
                         };
                         if($@) { warn "\a\n$@\n"; }
                         else {
-                            $ids  += $these_ids;
-                            $cons += $these_cons;
+                            $ids  += ($these_ids||0);
+                            $cons += ($these_cons||0);
                         }

                         last if $hsp_start == $u_start;
@@ -490,8 +497,8 @@
                         };
                         if($@) { warn "\a\n$@\n"; }
                         else {
-                            $ids  += $these_ids;
-                            $cons += $these_cons;
+                            $ids  += ($these_ids||0);
+                            $cons += ($these_cons||0);
                         }

                         last if $hsp_end == $u_stop;

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Dan Bolser" 
<dan.bolser at gmail.com>
Sent: Tuesday, May 12, 2009 10:04 AM
Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?)


> More complicated than that, I'm afraid.  We should try to fix that at  the 
> source of the problem.
>
> This appears to stem from SearchUtils HSP tiling, which in turn  utilizes 
> HSPI::matches(), which in turn checks num_identical/ num_conserved.  My guess 
> is, since this is blasttable format, one of  these isn't set and thus is 
> returning the wrong value.  I'll attempt  to track it down today, but it may 
> take some time.
>
> chris
>
> On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote:
>
>> This sounds like a
>>
>> $sum = eval join( '+', @a);
>>
>> problem, which can be fixed with
>>
>> $sum = eval join('+', map { $_ || () } @a) ;
>>
>> MAJ
>> ----- Original Message ----- From: "Dan Bolser" <dan.bolser at gmail.com>
>> To: "Chris Fields" <cjfields at illinois.edu>
>> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Tuesday, May 12, 2009 9:17 AM
>> Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' 
>> fromSearchIO?)
>>
>>
>> 2009/5/12 Chris Fields <cjfields at illinois.edu>:
>>> Fixed that in svn. We're all still learning the ropes...
>>
>> In that case, I'm seeing multiple instances of...
>>
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 256
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 412
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 429
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 465
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 473
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 494
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 
>> 502
>>
>>
>> Hmm... I was about to go on to complain about the weird GFF that I was
>> seeing, but suddenly it looks OK. My bioperl install must think your
>> standing over my shoulder and is therefore behaving itself!
>>
>>
>> Thanks again for all the help,
>> Dan.
>>
>>
>>
>>
>>> chris
>>>
>>> On May 12, 2009, at 5:55 AM, Dan Bolser wrote:
>>>
>>>> 2009/5/12 Dan Bolser <dan.bolser at gmail.com>:
>>>>>
>>>>> Unfortunately bp_search2gff.pl is giving me errors:
>>>>>
>>>>> bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered  -f
>>>>> blasttable -o BlastResults/blast_table_filtered.gff -t hit
>>>>> --match --target --component
>>>>>
>>>>> --------------------- WARNING ---------------------
>>>>> MSG: Removing score value(s)
>>>>> ---------------------------------------------------
>>>>> Can't locate object method "remove_tags" via package
>>>>> "Bio::SeqFeature::Similarity" at
>>>>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm  line
>>>>> 393, <GEN1> line 5.
>>>>
>>>>
>>>> I'm just learning the ropes...
>>>>
>>>> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11
>>>> 15:25:55.000000000 +0100
>>>> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12
>>>> 11:52:41.000000000 +0100
>>>> @@ -390,7 +390,7 @@
>>>> }
>>>> if ($self->has_tag('score')) {
>>>> $self->warn("Removing score value(s)");
>>>> - $self->remove_tags('score');
>>>> + $self->remove_tag('score');
>>>> }
>>>> $self->add_tag_value('score',$value);
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Anyone seen this before?
>>>>>
>>>>> Cheers,
>>>>> Dan.
>>>>>
>>>>>
>>>>>
>>>>> 2009/5/12 Dan Bolser <dan.bolser at gmail.com>:
>>>>>>
>>>>>> Thanks for the info guys, I think I was naively hoping that the
>>>>>> feature would know how to cast itself as a 'SeqFeature' (GFF).
>>>>>>
>>>>>> I think I understand the problem better now, so I'll try to  summarise:
>>>>>>
>>>>>> There is no standard way to encode a HSP as a feature (not least
>>>>>> because there are two choices about which sequence (query or the  hit)
>>>>>> it should be attached to). BioPerl will try, but the result will  not
>>>>>> be "well structured" SeqFeatures or "well formed" GFF.
>>>>>>
>>>>>>
>>>>>> From what I read I guess it should be possible to standardize this
>>>>>> mapping (based on something in one of the examples or the  'search2gff'
>>>>>> script), assuming you specify weather you want features put on the
>>>>>> query or on the hit.
>>>>>>
>>>>>> At some point last year I was trying out the bp_search2gff.pl  and my
>>>>>> own code to write a GFF file for loading and viewing by Gbrowse.  At
>>>>>> that time I gave up, as nothing seemed to be working. I was hoping
>>>>>> that doing this at a lower level (i.e. never writing any GFF  myself)
>>>>>> it would stand a better chance of working.
>>>>>>
>>>>>> Also I was thinking that Gbrowse, if given a SeqFeature::Store,  could
>>>>>> autoconfigure its interface to some degree. I guess its back to  the
>>>>>> docs ;-)
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'll keep trying and see if I can get anywhere.
>>>>>>
>>>>>> Thanks again,
>>>>>> Dan.
>>>>>>
>>>>>>
>>>>>>
>>>>>> References for the above:
>>>>>>
>>>>>> 2009/5/11 Jason Stajich <jason at bioperl.org>:
>>>>>>
>>>>>>> otherwise you need to be converting the HSPs into seqfeatures  with the
>>>>>>> right associated information (i.e. the tag/value pairs that are  in the 
>>>>>>> 9th
>>>>>>> column) in order to have well structured data in the database.
>>>>>>
>>>>>>> You can get the individual features from the feature pair with
>>>>>>> $hsp->query or $hsp->hit which can also be passed to a GFF  writer (or 
>>>>>>> call
>>>>>>> $hsp->hit->gff_string). Note that since the data storage is not 
>>>>>>> structured
>>>>>>> in a GFF3 like-way this won't immediately produce well formed  GFF3 for 
>>>>>>> the
>>>>>>> 9th column.
>>>>>>
>>>>>>
>>>>>> 2009/5/11 Chris Fields <cjfields at illinois.edu>:
>>>>>>
>>>>>>> The main problem is the mapping is subjective based on what your
>>>>>>> reference sequence is within the BLAST run (e.g. whether it is  the 
>>>>>>> query or
>>>>>>> the hit), and is something that can't be automatically  discerned. I 
>>>>>>> ended
>>>>>>> up rolling my own with SeqFeature::Store (just mapped the  relevant data 
>>>>>>> to
>>>>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the  relevant 
>>>>>>> scripts
>>>>>>> to integrate my changes in, just haven't had the time
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 



More information about the Bioperl-l mailing list