[Bioperl-l] Bio::DB::SeqFeature::Store::Pg index on "name": why varchar_pattern_ops?
Robert Buels
rmb32 at cornell.edu
Thu Apr 8 00:23:14 UTC 2010
What's the thinking behind the BDBSFS::Pg (ha) index on the "name" table
at Pg.pm line 282:
CREATE INDEX name_name_varchar_patt_ops_idx ON name USING BTREE
(lower(name) varchar_pattern_ops);
Why the varchar_pattern_ops operator class?
I'm testing an installation (backing gbrowse, of course) on Pg 8.3, and
performance was really stinking on the SFS query:
SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
FROM feature as f, name as n
WHERE (n.id=f.id AND lower(n.name) = lower($1) AND n.display_name>0)
so I explained it, giving
QUERY PLAN
--------------------------------------------------------------------------------------
Nested Loop (cost=0.00..130159.60 rows=11059 width=556)
-> Seq Scan on name n (cost=0.00..53489.41 rows=11059 width=4)
Filter: ((display_name > 0) AND (lower((name)::text) =
'foo'::text))
-> Index Scan using feature_pkey on feature f (cost=0.00..6.92
rows=1 width=556)
Index Cond: (f.id = n.id)
Seq scan, not good. It's not using the name_name_varchar_patt_ops_idx.
So I added an index on just lower(name) without the
varchar_pattern_ops business (create index lowername on name (
lower(name) )) and performance started getting fine again, with explain:
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=640.81..175775.44 rows=20521 width=501)
-> Bitmap Heap Scan on name n (cost=640.81..31557.42 rows=20521
width=4)
Recheck Cond: (lower((name)::text) = 'foo'::text)
Filter: (display_name > 0)
-> Bitmap Index Scan on lowername (cost=0.00..635.68
rows=20597 width=0)
Index Cond: (lower((name)::text) = 'foo'::text)
-> Index Scan using feature_pkey on feature f (cost=0.00..7.02
rows=1 width=501
Index Cond: (f.id = n.id)
(8 rows)
This has much better performance in practice, since it's not
sequentially scanning the 4.1M row "name" table.
This suggests to me that the varchar_pattern_ops class should be
removed, but could one of the original BDBSFS::Pg authors (like Lincoln
or Scott) weigh in on this?
Rob
More information about the Bioperl-l
mailing list