[Bioperl-l] Aggressive aggregation?

Aaron J. Mackey amackey at pcbi.upenn.edu
Mon Mar 14 12:39:26 EST 2005


In the "FWIW" category:

This is what I did to break the "aggressive aggregation" (attached 
patch); it relies on the fact that when aggregation occurs, the base 
feature's range always (at least in my use cases so far) contains (or at 
least overlaps) the subfeature's ranges.  So in the code below, when 
more than one base feature is detected, then range checking kicks in. 
This won't help you if, for instance, you're saving separate HSP linking 
information as different hits (because the hits will still overlap), but 
it does solve the more common case of one protein/EST matching in 
multiple, distinct locations on the genome.

-Aaron


-------------- next part --------------
diff -u -r1.30 Aggregator.pm
--- Aggregator.pm       3 Aug 2004 09:17:23 -0000       1.30
+++ Aggregator.pm       14 Mar 2005 17:45:35 -0000
@@ -303,7 +303,7 @@
           ? join ($;,$feature->group,$feature->refseq,$feature->source)
           : join ($;,$feature->group,$feature->refseq);
       if ($main_method && lc $feature->method eq lc $main_method) {
-       $aggregates{$key}{base} ||= $feature->clone;
+       push @{$aggregates{$key}{base}}, $feature->clone;
       } else {
        push @{$aggregates{$key}{subparts}},$feature;
       }
@@ -321,18 +321,29 @@
     if ($require_whole_object && $self->components) {
       next unless $aggregates{$_}{base}; # && $aggregates{$_}{subparts};
     }
-    my $base = $aggregates{$_}{base};
+
+    my $base = shift @{$aggregates{$_}{base} || []};
     unless ($base) { # no base, so create one
       my $first = $aggregates{$_}{subparts}[0];
       $base = $first->clone;     # to inherit parent coordinate system, etc
       $base->score(undef);
       $base->phase(undef);
     }
-    $base->method($pseudo_method);
-    $base->add_subfeature($_) foreach @{$aggregates{$_}{subparts}};
-    $base->adjust_bounds;
-    $base->compound(1);  # set the compound flag
-    push @result,$base;
+    while ($base) {
+      $base->method($pseudo_method);
+      if (@{$aggregates{$_}{base} || []}) {
+       # only capture those subfeatures that overlap the base
+       for my $part (@{$aggregates{$_{subparts}}}) {
+         $base->add_subfeature($part) if $part->overlaps($base, "strong");
+       }
+      } else {
+       $base->add_subfeature($_) foreach @{$aggregates{$_}{subparts}};
+      }
+      $base->adjust_bounds;
+      $base->compound(1);  # set the compound flag
+      push @result,$base;
+      $base = shift @{$aggregates{$_}{base} || []}
+    }
   }
   @$features = @result;
 }


More information about the Bioperl-l mailing list