How to optimize IN query on indexed columnOptimizing ORDER BY in a full text search queryPostgreSQL 9.2...

Can GPL and BSD licensed applications be used for government work?

How do I run a game when my PCs have different approaches to combat?

Is the apartment I want to rent a scam?

Sitecore Powershell extensions module compatibility with Sitecore 9.2

How can I prevent corporations from growing their own workforce?

Who has jurisdiction for a crime committed in an embassy?

What was the rationale behind 36 bit computer architectures?

Why did NASA use U.S customary units?

What exactly makes a General Products hull nearly indestructible?

How can I make sure my players' decisions have consequences?

Where to place an artificial gland in the human body?

What is the purpose of the fuel shutoff valve?

High income, sudden windfall

Extrapolation v. Interpolation

Why do people say "I am broke" instead of "I am broken"?

How can I create a shape in Illustrator which follows a path in descending order size?

How can I receive packages while in France?

Strange Cron Job takes up 100% of CPU Ubuntu 18 LTS Server

Can I paint a load center cover?

Inverse Colombian Function

Why did modems have speakers?

Problem loading expl3 in plain TeX

Sextortion with actual password not found in leaks

Can two figures have the same area, perimeter, and same number of segments have different shape?

How to optimize IN query on indexed column

Optimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

                                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)

         Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

         Rows Removed by Index Recheck: 5067635

         Filter: (sale_date = 1970)

         Heap Blocks: exact=38679 lossy=496680

         ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)

               Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

 Planning time: 0.165 ms

 Execution time: 63524.100 ms

(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 

    WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'



    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)

         Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))

         Rows Removed by Index Recheck: 5067635

         Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))

         Heap Blocks: exact=38679 lossy=496680

         ->  BitmapOr  (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)

                     Index Cond: ((color_code)::text = 'red'::text)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)

                     Index Cond: ((color_code)::text = 'green'::text)

 Planning time: 0.162 ms

 Execution time: 18896.785 ms

(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  







                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)

   ->  Gather  (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)

         Workers Planned: 2

         Workers Launched: 2

         ->  Partial Aggregate  (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)

               ->  Parallel Seq Scan on mytable  (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)

                     Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))

                     Rows Removed by Filter: 4196103

 Planning time: 0.161 ms

 Execution time: 97069.896 ms

(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

edited 8 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago

add a comment |

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

                                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)

         Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

         Rows Removed by Index Recheck: 5067635

         Filter: (sale_date = 1970)

         Heap Blocks: exact=38679 lossy=496680

         ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)

               Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

 Planning time: 0.165 ms

 Execution time: 63524.100 ms

(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 

    WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'



    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)

         Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))

         Rows Removed by Index Recheck: 5067635

         Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))

         Heap Blocks: exact=38679 lossy=496680

         ->  BitmapOr  (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)

                     Index Cond: ((color_code)::text = 'red'::text)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)

                     Index Cond: ((color_code)::text = 'green'::text)

 Planning time: 0.162 ms

 Execution time: 18896.785 ms

(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  







                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)

   ->  Gather  (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)

         Workers Planned: 2

         Workers Launched: 2

         ->  Partial Aggregate  (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)

               ->  Parallel Seq Scan on mytable  (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)

                     Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))

                     Rows Removed by Filter: 4196103

 Planning time: 0.161 ms

 Execution time: 97069.896 ms

(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

edited 8 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago

add a comment |

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

                                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)

         Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

         Rows Removed by Index Recheck: 5067635

         Filter: (sale_date = 1970)

         Heap Blocks: exact=38679 lossy=496680

         ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)

               Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

 Planning time: 0.165 ms

 Execution time: 63524.100 ms

(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 

    WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'



    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)

         Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))

         Rows Removed by Index Recheck: 5067635

         Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))

         Heap Blocks: exact=38679 lossy=496680

         ->  BitmapOr  (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)

                     Index Cond: ((color_code)::text = 'red'::text)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)

                     Index Cond: ((color_code)::text = 'green'::text)

 Planning time: 0.162 ms

 Execution time: 18896.785 ms

(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  







                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)

   ->  Gather  (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)

         Workers Planned: 2

         Workers Launched: 2

         ->  Partial Aggregate  (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)

               ->  Parallel Seq Scan on mytable  (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)

                     Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))

                     Rows Removed by Filter: 4196103

 Planning time: 0.161 ms

 Execution time: 97069.896 ms

(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

edited 8 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)

I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 

WHERE color_code in ('red','green') and sale_date = '1970'

                                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)

         Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

         Rows Removed by Index Recheck: 5067635

         Filter: (sale_date = 1970)

         Heap Blocks: exact=38679 lossy=496680

         ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)

               Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))

 Planning time: 0.165 ms

 Execution time: 63524.100 ms

(10 rows)

Query plan for OR

explain analyze SELECT count(total_amount) FROM mytable 

    WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'



    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)

   ->  Bitmap Heap Scan on mytable  (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)

         Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))

         Rows Removed by Index Recheck: 5067635

         Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))

         Heap Blocks: exact=38679 lossy=496680

         ->  BitmapOr  (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)

                     Index Cond: ((color_code)::text = 'red'::text)

               ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)

                     Index Cond: ((color_code)::text = 'green'::text)

 Planning time: 0.162 ms

 Execution time: 18896.785 ms

(13 rows)

Update

If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.

"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  







                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)

   ->  Gather  (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)

         Workers Planned: 2

         Workers Launched: 2

         ->  Partial Aggregate  (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)

               ->  Parallel Seq Scan on mytable  (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)

                     Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))

                     Rows Removed by Filter: 4196103

 Planning time: 0.161 ms

 Execution time: 97069.896 ms

(10 rows)

Question

Is there a way I can optimize by IN clause query other than turning it into an OR clause?

postgresql index query-performance optimization

edited 8 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

edited 8 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

edited 8 hours ago

asked 9 hours ago

Anthony

1303 bronze badges

asked 9 hours ago

Anthony

1303 bronze badges

asked 9 hours ago

Anthony

1303 bronze badges

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago

add a comment |

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago

Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago

Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago

@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago

@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago

It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago

add a comment |

2 Answers
2

active

oldest

votes

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 

     values ('red', '1970'), ('green','1969')

 ) 

 select * from T 

 where color_code in ('green', 'red') 

   and sale_date = '1970';



 color_code | sale_date 

------------+-----------

 red        | 1970

(1 row)

However:

with T (color_code, sale_date) as ( 

    values ('red', '1970'), ('green','1969')

) 

select * from T 

where color_code = 'green' or color_code = 'red' 

  and sale_date = '1970';



color_code | sale_date 

------------+-----------

 red        | 1970

 green      | 1969

(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 

where (color_code = 'green' or color_code = 'red') 

  and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

     Heap Blocks: exact=38679 lossy=496680

This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.

Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f243790%2fhow-to-optimize-in-query-on-indexed-column%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 

     values ('red', '1970'), ('green','1969')

 ) 

 select * from T 

 where color_code in ('green', 'red') 

   and sale_date = '1970';



 color_code | sale_date 

------------+-----------

 red        | 1970

(1 row)

However:

with T (color_code, sale_date) as ( 

    values ('red', '1970'), ('green','1969')

) 

select * from T 

where color_code = 'green' or color_code = 'red' 

  and sale_date = '1970';



color_code | sale_date 

------------+-----------

 red        | 1970

 green      | 1969

(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 

where (color_code = 'green' or color_code = 'red') 

  and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 

     values ('red', '1970'), ('green','1969')

 ) 

 select * from T 

 where color_code in ('green', 'red') 

   and sale_date = '1970';



 color_code | sale_date 

------------+-----------

 red        | 1970

(1 row)

However:

with T (color_code, sale_date) as ( 

    values ('red', '1970'), ('green','1969')

) 

select * from T 

where color_code = 'green' or color_code = 'red' 

  and sale_date = '1970';



color_code | sale_date 

------------+-----------

 red        | 1970

 green      | 1969

(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 

where (color_code = 'green' or color_code = 'red') 

  and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 

     values ('red', '1970'), ('green','1969')

 ) 

 select * from T 

 where color_code in ('green', 'red') 

   and sale_date = '1970';



 color_code | sale_date 

------------+-----------

 red        | 1970

(1 row)

However:

with T (color_code, sale_date) as ( 

    values ('red', '1970'), ('green','1969')

) 

select * from T 

where color_code = 'green' or color_code = 'red' 

  and sale_date = '1970';



color_code | sale_date 

------------+-----------

 red        | 1970

 green      | 1969

(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 

where (color_code = 'green' or color_code = 'red') 

  and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'

with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

because they are not logically equivalent (will return different results). A trivial example:

 with T (color_code, sale_date) as ( 

     values ('red', '1970'), ('green','1969')

 ) 

 select * from T 

 where color_code in ('green', 'red') 

   and sale_date = '1970';



 color_code | sale_date 

------------+-----------

 red        | 1970

(1 row)

However:

with T (color_code, sale_date) as ( 

    values ('red', '1970'), ('green','1969')

) 

select * from T 

where color_code = 'green' or color_code = 'red' 

  and sale_date = '1970';



color_code | sale_date 

------------+-----------

 red        | 1970

 green      | 1969

(2 rows)

For the comparison to be meaningful, you need to change your query to:

select * from T 

where (color_code = 'green' or color_code = 'red') 

  and sale_date = '1970';

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

answered 7 hours ago

Lennart

14.1k2 gold badges13 silver badges43 bronze badges

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

2

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago

add a comment |

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

     Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago

add a comment |

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

     Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago

add a comment |

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

     Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

There some things you can do that might make both specifications of this query faster.

For one, look at this line:

     Heap Blocks: exact=38679 lossy=496680

These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

edited 5 hours ago

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

answered 8 hours ago

jjanes

16k1 gold badge10 silver badges19 bronze badges

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago

add a comment |

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago

My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago

I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago

You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago

well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs