[Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?)
Mark A. Jensen
maj at fortinbras.us
Tue May 12 20:06:56 UTC 2009
Patch below (to SearchUtils.pm) fixes the non-numeric warnings on Dan's data,
but something deeper may be going on.
Also get many of the following warnings, haven't looked at it closely:
--------------------- WARNING ---------------------
MSG: Removing score value(s)
---------------------------------------------------
PATCH:
Index: SearchUtils.pm
===================================================================
--- SearchUtils.pm (revision 15674)
+++ SearchUtils.pm (working copy)
@@ -252,8 +252,8 @@
}
$qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_->{'stop'} -
$_->{'start'} + 1;
- $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'};
- $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'};
+ $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'} ||
0;
+ $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'} ||
0;
$qctg_dat{ "$frame$strand" }->{'qstrand'} = $strand;
}
@@ -407,9 +407,12 @@
};
if($@) { warn "\a\n$@\n"; }
else {
+ # make sure numerical
+ $_->{'iden'} ||= 0;
+ $_->{'cons'} ||= 0;
$_->{'start'} = $start; # Assign a new start coordinate to the
contig
- $_->{'iden'} += $numID; # and add new data to #identical,
#conserved.
- $_->{'cons'} += $numCons;
+ $_->{'iden'} += ($numID||0); # and add new data to #identical,
#conserved.
+ $_->{'cons'} += ($numCons||0);
push(@{$_->{hsps}}, $hsp);
$overlap = 1;
}
@@ -424,9 +427,13 @@
};
if($@) { warn "\a\n$@\n"; }
else {
+ # make sure numerical
+ $_->{'iden'} ||= 0;
+ $_->{'cons'} ||= 0;
+
$_->{'stop'} = $stop; # Assign a new stop coordinate to the
contig
- $_->{'iden'} += $numID; # and add new data to #identical,
#conserved.
- $_->{'cons'} += $numCons;
+ $_->{'iden'} += ($numID||0); # and add new data to #identical,
#conserved.
+ $_->{'cons'} += ($numCons||0);
push(@{$_->{hsps}}, $hsp);
$overlap = 1;
}
@@ -461,8 +468,8 @@
};
if($@) { warn "\a\n$@\n"; }
else {
- $ids += $these_ids;
- $cons += $these_cons;
+ $ids += ($these_ids||0);
+ $cons += ($these_cons||0);
}
last if $hsp_start == $u_start;
@@ -490,8 +497,8 @@
};
if($@) { warn "\a\n$@\n"; }
else {
- $ids += $these_ids;
- $cons += $these_cons;
+ $ids += ($these_ids||0);
+ $cons += ($these_cons||0);
}
last if $hsp_end == $u_stop;
----- Original Message -----
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Dan Bolser"
<dan.bolser at gmail.com>
Sent: Tuesday, May 12, 2009 10:04 AM
Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?)
> More complicated than that, I'm afraid. We should try to fix that at the
> source of the problem.
>
> This appears to stem from SearchUtils HSP tiling, which in turn utilizes
> HSPI::matches(), which in turn checks num_identical/ num_conserved. My guess
> is, since this is blasttable format, one of these isn't set and thus is
> returning the wrong value. I'll attempt to track it down today, but it may
> take some time.
>
> chris
>
> On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote:
>
>> This sounds like a
>>
>> $sum = eval join( '+', @a);
>>
>> problem, which can be fixed with
>>
>> $sum = eval join('+', map { $_ || () } @a) ;
>>
>> MAJ
>> ----- Original Message ----- From: "Dan Bolser" <dan.bolser at gmail.com>
>> To: "Chris Fields" <cjfields at illinois.edu>
>> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Tuesday, May 12, 2009 9:17 AM
>> Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features'
>> fromSearchIO?)
>>
>>
>> 2009/5/12 Chris Fields <cjfields at illinois.edu>:
>>> Fixed that in svn. We're all still learning the ropes...
>>
>> In that case, I'm seeing multiple instances of...
>>
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 256
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 412
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 429
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 465
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 473
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 494
>> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line
>> 502
>>
>>
>> Hmm... I was about to go on to complain about the weird GFF that I was
>> seeing, but suddenly it looks OK. My bioperl install must think your
>> standing over my shoulder and is therefore behaving itself!
>>
>>
>> Thanks again for all the help,
>> Dan.
>>
>>
>>
>>
>>> chris
>>>
>>> On May 12, 2009, at 5:55 AM, Dan Bolser wrote:
>>>
>>>> 2009/5/12 Dan Bolser <dan.bolser at gmail.com>:
>>>>>
>>>>> Unfortunately bp_search2gff.pl is giving me errors:
>>>>>
>>>>> bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered -f
>>>>> blasttable -o BlastResults/blast_table_filtered.gff -t hit
>>>>> --match --target --component
>>>>>
>>>>> --------------------- WARNING ---------------------
>>>>> MSG: Removing score value(s)
>>>>> ---------------------------------------------------
>>>>> Can't locate object method "remove_tags" via package
>>>>> "Bio::SeqFeature::Similarity" at
>>>>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line
>>>>> 393, <GEN1> line 5.
>>>>
>>>>
>>>> I'm just learning the ropes...
>>>>
>>>> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11
>>>> 15:25:55.000000000 +0100
>>>> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12
>>>> 11:52:41.000000000 +0100
>>>> @@ -390,7 +390,7 @@
>>>> }
>>>> if ($self->has_tag('score')) {
>>>> $self->warn("Removing score value(s)");
>>>> - $self->remove_tags('score');
>>>> + $self->remove_tag('score');
>>>> }
>>>> $self->add_tag_value('score',$value);
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Anyone seen this before?
>>>>>
>>>>> Cheers,
>>>>> Dan.
>>>>>
>>>>>
>>>>>
>>>>> 2009/5/12 Dan Bolser <dan.bolser at gmail.com>:
>>>>>>
>>>>>> Thanks for the info guys, I think I was naively hoping that the
>>>>>> feature would know how to cast itself as a 'SeqFeature' (GFF).
>>>>>>
>>>>>> I think I understand the problem better now, so I'll try to summarise:
>>>>>>
>>>>>> There is no standard way to encode a HSP as a feature (not least
>>>>>> because there are two choices about which sequence (query or the hit)
>>>>>> it should be attached to). BioPerl will try, but the result will not
>>>>>> be "well structured" SeqFeatures or "well formed" GFF.
>>>>>>
>>>>>>
>>>>>> From what I read I guess it should be possible to standardize this
>>>>>> mapping (based on something in one of the examples or the 'search2gff'
>>>>>> script), assuming you specify weather you want features put on the
>>>>>> query or on the hit.
>>>>>>
>>>>>> At some point last year I was trying out the bp_search2gff.pl and my
>>>>>> own code to write a GFF file for loading and viewing by Gbrowse. At
>>>>>> that time I gave up, as nothing seemed to be working. I was hoping
>>>>>> that doing this at a lower level (i.e. never writing any GFF myself)
>>>>>> it would stand a better chance of working.
>>>>>>
>>>>>> Also I was thinking that Gbrowse, if given a SeqFeature::Store, could
>>>>>> autoconfigure its interface to some degree. I guess its back to the
>>>>>> docs ;-)
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'll keep trying and see if I can get anywhere.
>>>>>>
>>>>>> Thanks again,
>>>>>> Dan.
>>>>>>
>>>>>>
>>>>>>
>>>>>> References for the above:
>>>>>>
>>>>>> 2009/5/11 Jason Stajich <jason at bioperl.org>:
>>>>>>
>>>>>>> otherwise you need to be converting the HSPs into seqfeatures with the
>>>>>>> right associated information (i.e. the tag/value pairs that are in the
>>>>>>> 9th
>>>>>>> column) in order to have well structured data in the database.
>>>>>>
>>>>>>> You can get the individual features from the feature pair with
>>>>>>> $hsp->query or $hsp->hit which can also be passed to a GFF writer (or
>>>>>>> call
>>>>>>> $hsp->hit->gff_string). Note that since the data storage is not
>>>>>>> structured
>>>>>>> in a GFF3 like-way this won't immediately produce well formed GFF3 for
>>>>>>> the
>>>>>>> 9th column.
>>>>>>
>>>>>>
>>>>>> 2009/5/11 Chris Fields <cjfields at illinois.edu>:
>>>>>>
>>>>>>> The main problem is the mapping is subjective based on what your
>>>>>>> reference sequence is within the BLAST run (e.g. whether it is the
>>>>>>> query or
>>>>>>> the hit), and is something that can't be automatically discerned. I
>>>>>>> ended
>>>>>>> up rolling my own with SeqFeature::Store (just mapped the relevant data
>>>>>>> to
>>>>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant
>>>>>>> scripts
>>>>>>> to integrate my changes in, just haven't had the time
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list