[Bioperl-l] Bio::DB::SeqFeature::Store::Pg index on "name": why varchar_pattern_ops?

Robert Buels rmb32 at cornell.edu
Wed Apr 7 20:23:14 EDT 2010


What's the thinking behind the BDBSFS::Pg (ha) index on the "name" table 
at Pg.pm line 282:

     CREATE INDEX name_name_varchar_patt_ops_idx ON name USING BTREE
     (lower(name) varchar_pattern_ops);

Why the varchar_pattern_ops operator class?

I'm testing an installation (backing gbrowse, of course) on Pg 8.3, and 
performance was really stinking on the SFS query:

     SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
     FROM feature as f, name as n
     WHERE (n.id=f.id AND lower(n.name) = lower($1) AND n.display_name>0)

so I explained it, giving

                                       QUERY PLAN 

--------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..130159.60 rows=11059 width=556) 

    ->  Seq Scan on name n  (cost=0.00..53489.41 rows=11059 width=4) 

          Filter: ((display_name > 0) AND (lower((name)::text) = 
'foo'::text))
    ->  Index Scan using feature_pkey on feature f  (cost=0.00..6.92 
rows=1 width=556)
          Index Cond: (f.id = n.id)


Seq scan, not good.  It's not using the name_name_varchar_patt_ops_idx. 
  So I added an index on just lower(name) without the 
varchar_pattern_ops business (create index lowername on name ( 
lower(name) )) and performance started getting fine again, with explain:


                                       QUERY PLAN
------------------------------------------------------------------------------------
  Nested Loop  (cost=640.81..175775.44 rows=20521 width=501)
    ->  Bitmap Heap Scan on name n  (cost=640.81..31557.42 rows=20521 
width=4)
          Recheck Cond: (lower((name)::text) = 'foo'::text)
          Filter: (display_name > 0)
          ->  Bitmap Index Scan on lowername  (cost=0.00..635.68 
rows=20597 width=0)
                Index Cond: (lower((name)::text) = 'foo'::text)
    ->  Index Scan using feature_pkey on feature f  (cost=0.00..7.02 
rows=1 width=501
          Index Cond: (f.id = n.id)
(8 rows)


This has much better performance in practice, since it's not 
sequentially scanning the 4.1M row "name" table.

This suggests to me that the varchar_pattern_ops class should be 
removed, but could one of the original BDBSFS::Pg authors (like Lincoln 
or Scott) weigh in on this?

Rob







More information about the Bioperl-l mailing list