How to optimize IN query on indexed columnOptimizing ORDER BY in a full text search queryPostgreSQL 9.2...

Can GPL and BSD licensed applications be used for government work?

How do I run a game when my PCs have different approaches to combat?

Is the apartment I want to rent a scam?

Sitecore Powershell extensions module compatibility with Sitecore 9.2

How can I prevent corporations from growing their own workforce?

Who has jurisdiction for a crime committed in an embassy?

What was the rationale behind 36 bit computer architectures?

Why did NASA use U.S customary units?

What exactly makes a General Products hull nearly indestructible?

How can I make sure my players' decisions have consequences?

Where to place an artificial gland in the human body?

What is the purpose of the fuel shutoff valve?

High income, sudden windfall

Extrapolation v. Interpolation

Why do people say "I am broke" instead of "I am broken"?

How can I create a shape in Illustrator which follows a path in descending order size?

How can I receive packages while in France?

Strange Cron Job takes up 100% of CPU Ubuntu 18 LTS Server

Can I paint a load center cover?

Inverse Colombian Function

Why did modems have speakers?

Problem loading expl3 in plain TeX

Sextortion with actual password not found in leaks

Can two figures have the same area, perimeter, and same number of segments have different shape?



How to optimize IN query on indexed column


Optimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







4















I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?










share|improve this question

























  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    9 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    9 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    8 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    7 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    5 hours ago


















4















I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?










share|improve this question

























  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    9 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    9 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    8 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    7 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    5 hours ago














4












4








4


1






I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?










share|improve this question
















I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:



"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher



SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:



SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)


Query plan for OR



explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2081265.36..2081265.37 rows=1 width=8) (actual time=18895.998..18895.998 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=56223.06..2076956.39 rows=1723588 width=6) (actual time=161.335..18468.146 rows=1727037 loops=1)
Recheck Cond: (((color_code)::text = 'red'::text) OR ((color_code)::text = 'green'::text))
Rows Removed by Index Recheck: 5067635
Filter: (((color_code)::text = 'red'::text) OR (((color_code)::text = 'green'::text) AND (sale_date = 1970)))
Heap Blocks: exact=38679 lossy=496680
-> BitmapOr (cost=56223.06..56223.06 rows=1725919 width=0) (actual time=153.683..153.684 rows=0 loops=1)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..663.35 rows=20655 width=0) (actual time=3.935..3.935 rows=26768 loops=1)
Index Cond: ((color_code)::text = 'red'::text)
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..54697.91 rows=1705264 width=0) (actual time=149.745..149.746 rows=1700269 loops=1)
Index Cond: ((color_code)::text = 'green'::text)
Planning time: 0.162 ms
Execution time: 18896.785 ms
(13 rows)


Update



If I add an index (color_code, total_count, and sale_date) I notice that no index is used at all. Rather it does a partial scan.



"mytable_color_total_count_sale_Date_idx" btree (color_code, total_count, sale_date)  



QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2099755.26..2099755.27 rows=1 width=8) (actual time=97066.585..97066.586 rows=1 loops=1)
-> Gather (cost=2099755.04..2099755.25 rows=2 width=8) (actual time=97063.512..97069.838 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2098755.04..2098755.05 rows=1 width=8) (actual time=97061.531..97061.532 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..2096119.69 rows=1054140 width=6) (actual time=27782.491..96730.232 rows=841604 loops=3)
Filter: ((sale_date = 1970) AND ((color_code)::text = ANY ('{red,green}'::text[])))
Rows Removed by Filter: 4196103
Planning time: 0.161 ms
Execution time: 97069.896 ms
(10 rows)


Question



Is there a way I can optimize by IN clause query other than turning it into an OR clause?







postgresql index query-performance optimization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 8 hours ago







Anthony

















asked 9 hours ago









AnthonyAnthony

1303 bronze badges




1303 bronze badges













  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    9 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    9 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    8 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    7 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    5 hours ago



















  • Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

    – Anthony
    9 hours ago











  • Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

    – jjanes
    9 hours ago











  • @Lennart Can you please explain further. Not sure I understand.

    – Anthony
    8 hours ago











  • @Anthony, I've added a little longer explanation as an answer

    – Lennart
    7 hours ago











  • It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

    – jjanes
    5 hours ago

















Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago





Sorry, I assumed it wasn't using the index since I didn't see the additional row and the execution time was higher. I've modified the question to ask how the IN query can be optimized rather than missing index. Thanks

– Anthony
9 hours ago













Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago





Is this fully repeatable? It looks to me like the second query is faster simply because the first query already read all the data from disk into memory, so the 2nd one didn't need to. Swap back and forth between the two queries repeatedly and see how the timings compare. Also, it would be good to do explain (analyze, buffers), especially if you can turn track_io_timing on first.

– jjanes
9 hours ago













@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago





@Lennart Can you please explain further. Not sure I understand.

– Anthony
8 hours ago













@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago





@Anthony, I've added a little longer explanation as an answer

– Lennart
7 hours ago













It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago





It is unlikely that adding that new index would directly cause a different index to stop being used. But if you look at the cost estimate of the first and last executions plans, they are almost identical to each other. Since the statistics collector uses random sampling, small differences in the sample use could cause different plans to come out on top when the estimated costs are so similar.

– jjanes
5 hours ago










2 Answers
2






active

oldest

votes


















5














You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer



















  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago



















1














I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



     Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer


























  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    8 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    8 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    5 hours ago














Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f243790%2fhow-to-optimize-in-query-on-indexed-column%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









5














You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer



















  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago
















5














You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer



















  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago














5












5








5







You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)





share|improve this answer













You can't compare performance of:



WHERE color_code in ('red','green') and sale_date = '1970'


with:



WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:



 with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
(1 row)


However:



with T (color_code, sale_date) as ( 
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';

color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)


For the comparison to be meaningful, you need to change your query to:



select * from T 
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';


That said, I would suggest an index like:



CREATE INDEX ... ON ... (sale_date, color_code)






share|improve this answer












share|improve this answer



share|improve this answer










answered 7 hours ago









LennartLennart

14.1k2 gold badges13 silver badges43 bronze badges




14.1k2 gold badges13 silver badges43 bronze badges








  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago














  • 2





    I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

    – jyao
    6 hours ago








2




2





I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago





I think you hit the point that the two queries in OP's post are not the same logically speaking, so the question from the OP is somewhat not valid (based on the results of the two logically different queries).

– jyao
6 hours ago













1














I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



     Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer


























  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    8 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    8 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    5 hours ago
















1














I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



     Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer


























  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    8 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    8 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    5 hours ago














1












1








1







I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



     Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.






share|improve this answer















I think the difference in timing you see is simply a caching effect, based on which query you ran first. It is probably not a real difference caused by how you specify the query (Although as Lennart described, your queries are not really equivalent as you are missing parentheses around the OR part--although all your rows seem to meet sale_date = '1970' anyway, so this difference is important in general, but doesn't make a difference in the exact example)



There some things you can do that might make both specifications of this query faster.



For one, look at this line:



     Heap Blocks: exact=38679 lossy=496680


This means that your work_mem is not large enough to hold the entire bitmap. All of those lossy blocks need to have every row in them rechecked, which takes time. Increasing work_mem would prevent this and should speed up the query. Ideally the lossy blocks would drop to zero (at which point the 'lossy' label wouldn't be displayed anymore.



Second, having an index on mytable (color_code, sale_date, total_count) could allow for index-only scans, because all needed data would be in the index and it wouldn't have to visit the table at all (assuming the table is kept well-vacuumed).



These are mutually exclusive: if you do the index-only scan rather than the bitmap scan, then work_mem no longer matters.







share|improve this answer














share|improve this answer



share|improve this answer








edited 5 hours ago

























answered 8 hours ago









jjanesjjanes

16k1 gold badge10 silver badges19 bronze badges




16k1 gold badge10 silver badges19 bronze badges













  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    8 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    8 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    5 hours ago



















  • My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

    – Anthony
    8 hours ago











  • I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

    – Anthony
    8 hours ago











  • You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

    – jjanes
    5 hours ago











  • well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

    – jjanes
    5 hours ago

















My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago





My PostgreSQL runs on RDS. Should I just use a bigger RDS instance or is there a way to specifically increase work_mem. On that note...how would Heap Blocks: exact=38679 lossy=496680 look if I had a bigger work_mem would lossy number go down? Asking so that I can check after increasing work_mem. Can you please explain what you mean by well-vacuumed. Does that mean that no updates are done to the table? Also, after creating the index on (color_code, total_count, and sale_date) I'm noticing that no index is being used. It is only doing partial scan

– Anthony
8 hours ago













I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago





I've updated the question with results of adding an index on mytable(color_code, sale_date, total_count)

– Anthony
8 hours ago













You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago





You can change work_mem in RDS without changing your instance class. Of course you need to have enough memory to support the increased value, but the default setting of work_mem is conservative--unless you have a lot of simultaneous connections you should not have a problem increasing it.

– jjanes
5 hours ago













well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago





well-vacuumed here would mean that select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='mytable' is high. Say, above 0.90. Frequent updates might be OK, as long as it also gets frequent vacuums.

– jjanes
5 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f243790%2fhow-to-optimize-in-query-on-indexed-column%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Hudson River Historic District Contents Geography History The district today Aesthetics Cultural...

The number designs the writing. Feandra Aversely Definition: The act of ingrafting a sprig or shoot of one...

Ayherre Geografie Demografie Externe links Navigatiemenu43° 23′ NB, 1° 15′ WL43° 23′ NB, 1°...