[Bioperl-l] Bio::DB::SeqFeature::Store::Pg index on "name": why varchar_pattern_ops?
Scott Cain
scott at scottcain.net
Thu Apr 8 15:40:47 EDT 2010
>From what I've read, if you have both vanilla and varch_pattern_ops,
the query planner will decide (presumably correctly) which index to
use.
Scott
On Thu, Apr 8, 2010 at 3:05 PM, Robert Buels <rmb32 at cornell.edu> wrote:
> Hmm, well, the vanilla index seems necessary for my pg 8.3.9/ubuntu
> hardy/utf8 locales/UTF8 encoding installation. If both a
> varchar_pattern_ops and a vanilla index are present, will installations that
> need the varchar_pattern_ops one use it? Randall, do you have an
> installation you can test that on?
>
> Rob
>
>
> Scott Cain wrote:
>>
>> Hi Rob,
>>
>> I was a little concerned that this might happen. This change was
>> suggested by Randall on the GBrowse mailing list a few weeks ago, and
>> after some testing on my machine, it seemed to work well. The problem
>> was that wild card searches on the name column were seqscanning, and
>> the suggested index fixed the problem. I wonder if the query planner
>> in postgres would be smart enough to use the appropriate index if we
>> created both one with and one without varchar_pattern_ops.
>>
>> The reason for using varchar_pattern_ops specifically was to allow for
>> searching in a non-C locale. Another option would be to make the
>> creation of the index conditional when the database is created. It
>> could use the standard C locale setting and if a flag is set, create
>> the index to allow non-C locale searches (and presumably, the default
>> C locale would be the right thing for most users).
>>
>> Scott
>>
>>
>> On Wed, Apr 7, 2010 at 8:23 PM, Robert Buels <rmb32 at cornell.edu> wrote:
>>>
>>> What's the thinking behind the BDBSFS::Pg (ha) index on the "name" table
>>> at
>>> Pg.pm line 282:
>>>
>>> CREATE INDEX name_name_varchar_patt_ops_idx ON name USING BTREE
>>> (lower(name) varchar_pattern_ops);
>>>
>>> Why the varchar_pattern_ops operator class?
>>>
>>> I'm testing an installation (backing gbrowse, of course) on Pg 8.3, and
>>> performance was really stinking on the SFS query:
>>>
>>> SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
>>> FROM feature as f, name as n
>>> WHERE (n.id=f.id AND lower(n.name) = lower($1) AND n.display_name>0)
>>>
>>> so I explained it, giving
>>>
>>> QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------
>>> Nested Loop (cost=0.00..130159.60 rows=11059 width=556)
>>> -> Seq Scan on name n (cost=0.00..53489.41 rows=11059 width=4)
>>> Filter: ((display_name > 0) AND (lower((name)::text) =
>>> 'foo'::text))
>>> -> Index Scan using feature_pkey on feature f (cost=0.00..6.92 rows=1
>>> width=556)
>>> Index Cond: (f.id = n.id)
>>>
>>>
>>> Seq scan, not good. It's not using the name_name_varchar_patt_ops_idx.
>>> So
>>> I added an index on just lower(name) without the varchar_pattern_ops
>>> business (create index lowername on name ( lower(name) )) and performance
>>> started getting fine again, with explain:
>>>
>>>
>>> QUERY PLAN
>>>
>>> ------------------------------------------------------------------------------------
>>> Nested Loop (cost=640.81..175775.44 rows=20521 width=501)
>>> -> Bitmap Heap Scan on name n (cost=640.81..31557.42 rows=20521
>>> width=4)
>>> Recheck Cond: (lower((name)::text) = 'foo'::text)
>>> Filter: (display_name > 0)
>>> -> Bitmap Index Scan on lowername (cost=0.00..635.68 rows=20597
>>> width=0)
>>> Index Cond: (lower((name)::text) = 'foo'::text)
>>> -> Index Scan using feature_pkey on feature f (cost=0.00..7.02 rows=1
>>> width=501
>>> Index Cond: (f.id = n.id)
>>> (8 rows)
>>>
>>>
>>> This has much better performance in practice, since it's not sequentially
>>> scanning the 4.1M row "name" table.
>>>
>>> This suggests to me that the varchar_pattern_ops class should be removed,
>>> but could one of the original BDBSFS::Pg authors (like Lincoln or Scott)
>>> weigh in on this?
>>>
>>> Rob
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
More information about the Bioperl-l
mailing list