Managing Row Counts Using Triggers
It seems like this is a subject that I keep coming back to. Every situation is unique, and demands it’s own solution; so far, there’s no one-size-fits all solution for working around COUNT(*) in PostgreSQL. It some situations, there is no need for a workaround, and the performance hit of COUNT(*) is neglectable (this is generally true if you have under 2000 rows, but it mainly depends on the server).
This time, the problem I ran into is how to maintain the row count for how many friends a person has. Let’s say we have 2000 users, and each user has 30 friends. That would be 60,000 rows, and out of the range of what COUNT(*) can handle. The solution here is to create a count index table, that tracks how many friends each user has. The friends table looks something like this:
CREATE TABLE friends (
your_id BIGINT NOT NULL,
friend_id BIGINT NOT NULL,
CONSTRAINT friends_chk_your_id FOREIGN KEY
(your_id) REFERENCES members (id) DEFERRABLE,
CONSTRAINT friends_chk_friend_id FOREIGN KEY
(friend_id) REFERENCES members (id) DEFERRABLE
);
CREATE INDEX friends_your_id_idx ON friends (your_id);
CREATE INDEX friends_friend_id_idx ON friends (friend_id);
CREATE UNIQUE INDEX friends_unq_id_idx ON friends (your_id, friend_id);
Nothing special, just your member id, and your friends member id, with a unique index to make sure there’s no duplicate entries, and some foreign keys to make sure the user id’s are valid. I want to keep track of how many friends a user has, so that would be a simple two column table:
CREATE TABLE friends_count_idx (
num_rows INTEGER NOT NULL,
your_id BIGINT NOT NULL,
PRIMARY KEY(your_id),
CONSTRAINT friends_count_idx_chkcount CHECK (num_rows >= 0)
);
CREATE INDEX friends_count_idx_num_rows ON friends_count_idx (num_rows);
Pretty simple, right? A primary key for your_id, and a check constraint to make sure you don’t go under 0 rows. Now here’s where the fun begins.
I want a function to initialize the friends_count_idx table, as well as clean it up if something ever happens and it starts to drift. So basically, we’re looking at a maintenance function that deletes any existing records, as well as inserts records based on what’s in friends:
CREATE OR REPLACE FUNCTION friends_count_idx_materialize ()
RETURNS VARCHAR AS $$
DECLARE
rows INTEGER;
BEGIN
-- Delete old records
DELETE FROM friends_count_idx;
INSERT INTO friends_count_idx (num_rows, your_id)
SELECT COUNT(*), your_id FROM friends
GROUP BY your_id;
GET DIAGNOSTICS rows = ROW_COUNT;
RETURN rows || ' Rows Processed';
END;
$$ LANGUAGE plpgsql;
SELECT friends_count_idx_materialize();
--------------------
53 Rows Processed
(1 row)
SELECT * FROM friends_count_idx LIMIT 1;
num_rows | your_id
----------+---------
23 | 14
(1 row)
Now we’re getting somewhere… But I also need a set of triggers that handle inserts and deletes. They also need to be able to auto-create a record in friends_count_idx for new users, who are adding their first friend. Still, nothing too hard, but time consuming to write and debug:
CREATE OR REPLACE FUNCTION friends_count_idx_insert ()
RETURNS TRIGGER AS $$
DECLARE
r RECORD;
BEGIN
SELECT INTO r num_rows FROM friends_count_idx
WHERE your_id = NEW.your_id;
-- Auto Create Record
IF NOT FOUND THEN
INSERT INTO friends_count_idx (num_rows, your_id)
SELECT COUNT(*), your_id FROM friends
WHERE your_id = NEW.your_id GROUP BY your_id;
RETURN NEW;
END IF;
-- Update Existing Record
UPDATE friends_count_idx SET num_rows = (r.num_rows + 1)
WHERE your_id = NEW.your_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION friends_count_idx_delete ()
RETURNS TRIGGER AS $$
DECLARE
r RECORD;
rows INTEGER;
BEGIN
SELECT INTO r num_rows FROM friends_count_idx
WHERE your_id = OLD.your_id;
-- Auto Create Record
IF NOT FOUND THEN
INSERT INTO friends_count_idx (num_rows, your_id)
SELECT COUNT(*), your_id FROM friends
WHERE your_id = OLD.your_id GROUP BY your_id;
RETURN OLD;
END IF;
rows = r.num_rows - 1;
-- Check for row count drift
IF rows <= 0 THEN
RETURN OLD;
END IF;
-- Update Existing Record
UPDATE friends_count_idx SET num_rows = (r.num_rows - 1)
WHERE your_id = OLD.your_id;
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER friends_count_idx_insert AFTER INSERT ON friends
FOR EACH ROW EXECUTE PROCEDURE friends_count_idx_insert();
CREATE TRIGGER friends_count_idx_delete AFTER DELETE ON friends
FOR EACH ROW EXECUTE PROCEDURE friends_count_idx_delete();
SELECT * FROM friends_count_idx WHERE your_id = 3;
num_rows | your_id
----------+---------
3 | 3
(1 row)
INSERT INTO friends (your_id, friend_id) VALUES (3, 7);
SELECT * FROM friends_count_idx WHERE your_id = 3;
num_rows | your_id
----------+---------
4 | 3
(1 row)
DELETE FROM friends WHERE your_id = 3 AND friend_id = 7;
SELECT * FROM friends_count_idx WHERE your_id = 3;
num_rows | your_id
----------+---------
3 | 3
(1 row)
So, it’s running smoothly. It could still probably use some more debugging, just to make sure every possible situation is covered, but the skeleton is working fine.
