Indexing Functions With PostgreSQL
In PostgreSQL, it’s possible to create expressional indexes. In other words, indexes such as:
CREATE INDEX varchar_column_idx ON bar (UPPER(varchar_column));
It is possible to do this on any function that is marked as IMMUTABLE. IMMUTABLE just means that the function will always return the same result when given the same input. This is very powerful, as you can create indexes on user created functions, which can dramatically improve performance.
For instance, lets say I need to group some data by week, with every week starting on Sunday. I could do:
CREATE OR REPLACE FUNCTION find_previous_sunday
(TIMESTAMP) RETURNS DATE AS $$
BEGIN
RETURN
($1 + (-extract(dow from $1) || ' days')::interval)::date;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Note, that you can always do extract(week from timestamp), but there isn’t much flexibility there, and clients are often picky about when they want their weeks to start, plus this is only an example. Onwards to the benchmark:
---
--- Test without indexing
---
EXPLAIN ANALYZE SELECT
find_previous_sunday(entry_time) AS week_start,
SUM(distance)
FROM tbl
WHERE user_id = 1
GROUP BY week_start
ORDER BY week_start DESC limit 1;
QUERY PLAN
-------------------------------
Limit (... loops=1)
-> GroupAggregate (...)(actual time=1.561..1.561...)
-> Sort (...) (actual time=1.556..1.556 rows=2...)
Sort Key: find_previous_sunday(entry_time)
-> Index Scan using tbl_user_id on tbl
(...) (actual time=0.060..1.388 rows=128...)
Index Cond: (user_id = 1)
Total runtime: 1.651 ms
(7 rows)
---
--- Index find_previous_sunday function, and run test again.
---
CREATE INDEX tbl_week_start ON tbl
(find_previous_sunday(entry_time));
VACUUM FULL ANALYZE;
EXPLAIN ANALYZE SELECT
find_previous_sunday(entry_time) AS week_start,
SUM(distance)
FROM tbl
WHERE user_id = 1
GROUP BY week_start
ORDER BY week_start DESC limit 1;
QUERY PLAN
-------------------------------
Limit (...) (actual time=0.062..0.062 rows=1 loops=1)
-> GroupAggregate (...) (actual time=0.062..0.062...)
-> Index Scan Backward using tbl_week_start on tbl
(...) (actual time=0.042..0.057 rows=2...)
Filter: (user_id = 1)
Total runtime: 0.096 ms
(5 rows)
1.651ms without indexing, 0.096ms with. Note that there’s only about 200 rows in the test table, and performance without indexing scales very poorly as the number of rows increases.
