[Biojava-dev] Proposal to change FeatureFilter implementation

Thomas Down thomas at derkholm.net
Wed Jul 9 13:36:41 EDT 2003


Hi David,

I've been reading through this proposal, but I'm not 100% clear
about what the advatages of this approach would be.  It might
be clearer if you gave some ideas about what the SQLSupport
interface will look like.  My concern is that if this system
is going to work well with multiple different databases
(BioSQL, Ensembl, Gadfly, probably some others), either the
SQLSupport interface will have to be something very complex
and semantically rich, or it's going to have to effectively
special-case all the supported FeatureFilter types.  In the
latter case, I'd prefer to see FeatureFilter -> SQL compilers
maintained separately from the basic, very lightweight, FeatureFilter 
objects.

How about a scheme like:

   public interface FilterCompiler {
       public SQLQuery compileFilter(FeatureFilter ff);
   }

   public interface Query { 
      public Map<String,String> getTables(); // map of table names
                                             // to aliases, so that
                                             // a given table can be
                                             // used more than once
      public String getWhereClause();

      /**
       * Get a FeatureFilter which should be applied to all
       * features produced by this Query.  This provides any
       * extra constraints which couldn't be compiled to SQL
       */

      public FeatureFilter getResidualFilter();
   } 

(not entirely thought out, but you get the idea).

In cases where it is possible to share logic between several
different database back ends (AND and OR operators are an obvious
possibility), this could be provided as an AbstractFilterCompiler
or something.

I wrote a FeatureFilter -> SQL compiler which sort-of followed
this pattern (although without nice interfaces or anything)
for running FeatureFilters on whole BioSQLSequenceDBs.  It was
a bit limited, but did works well for many interesting cases.
Presumably it's still around somewhere.



      Thomas.


Once upon a time, David Huen wrote:
> 
> I wish to propose a change to FeatureFilter to better support SQL-based 
> SequenceDB implementations.
> 
> The proposal is to extend FeatureFilter with an interface SQLFeatureFilter.
> interface SQLFeatureFilter extends FeatureFilter
> {
>     public interface Query
>     {
>         public Set getColumns();
>         public Set getTables();
>         public String getWhereClause();
>     }
> 
>     public Query getQuery(SQLSupport support) 
>         throws UnSupportedOperationException;
> }
> 
> SQLSupport is adaptor code used by the SQLFeatureFilter to generate SQL 
> Query objects.
> 
> The next thing is to modify the existing FeatureFilters to transparently 
> support this interface.
> 
> To the current implementation, this proposal would do nothing.
> 
> With an SQL-backed implementation, the implementation could pass a 
> SQLSupport object to the getQuery method that would be used to generate the 
> SQL statement.
> 
> For example, in Gadfly, FeatureFilter.ByType("transcript") would return a 
> Query object that yielded {"seq_feature"} from getTables() and 
> {"seq_feature.type='transcript'"} from getWhereClause().
> 
> The FilterFeature logical operator classes would just join the results of 
> the getWhereClauses() of child filters and add the tables used by them to 
> the Set returned by getTables().
> 
> The final SQL statement can then be constructed from the Query object.
> 
> Advantages
> 1) transparent to exisitng code
> 2) allows existing FeatureFilter optimisers to work on the FeatureFilters 
> transparently
> 3) allows generation of SQL to directly filter on the tables in a manner 
> suitable to that particular implementation
> 4) where appropriate SQL cannot be generated fr that particular 
> FeatureFilter combination, the SQL implementation will get the 
> UnsupportedOperationException and can revert to the conventional accept() 
> method.
> 5) it could allow quite considerable speedups in SQL implementations
> 
> Does anyone have comments? objections? suggestions as to better ways of 
> doing it (like adding a getQueryString(SQLSupport support) method that when 
> called generates the final string too)?


More information about the biojava-dev mailing list