PostgreSQL Slow Count() Workaround
You may also be interested in this. It’s a far more elegant solution, if you only need the entire number of rows in a table.
MySQL zealots tend to point to PostgreSQL’s slow count() as a weakness, however, in the real world, count() isn’t used very often, and if it’s really needed, most good database systems provide a framework for you to build a workaround. The reason MySQL has fast counts, is because, they’re cached. This works in MySQL because MySQL uses serialized INSERTs, but higher grade DBs are transactional, and PostgreSQL uses MVCC, so caching row counts produces inaccurate results.
Here’s an example count() workaround for PostgreSQL, using a combination of PL/PgSQL and TRIGGERs. You could certainly extend this model, adding more columns to the cached rows table, so you can simulate SELECT count(id) WHERE foo = ‘bar’, although you’d have to add a TRIGGER for UPDATEs.
--- This table will be used as a dummy table,
--- that we wan't to get a count() from.
CREATE TABLE tbl1 (
id INTEGER
);
--- This table is where the cached counts will be stored
CREATE TABLE tbl1_num_rows (
num_rows INTEGER NOT NULL
);
--- Function to update the count in tbl1_num_rows,
--- on an INSERT
CREATE OR REPLACE FUNCTION tbl1_up_rows_insert ()
RETURNS TRIGGER AS $$
DECLARE
row RECORD;
BEGIN
--- Auto-add record
SELECT INTO row num_rows FROM tbl1_num_rows;
IF NOT FOUND THEN
INSERT INTO tbl1_num_rows VALUES (0);
END IF;
UPDATE tbl1_num_rows SET num_rows =
((SELECT num_rows FROM tbl1_num_rows) + 1);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
--- Function to update the count in tbl1_num_rows,
--- on a DELETE
CREATE OR REPLACE FUNCTION tbl1_up_rows_delete ()
RETURNS TRIGGER AS $$
DECLARE
row RECORD;
BEGIN
--- Auto-add record
SELECT INTO row num_rows FROM tbl1_num_rows;
IF NOT FOUND THEN
INSERT INTO tbl1_num_rows VALUES (0);
ELSE
IF row.num_rows > 0 THEN
UPDATE tbl1_num_rows SET num_rows =
((SELECT num_rows FROM tbl1_num_rows) - 1);
END IF;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
--- Add the TRIGGERS to tbl1
CREATE TRIGGER tbl1_up_rows_on_insert AFTER INSERT ON tbl1 FOR EACH ROW
EXECUTE PROCEDURE tbl1_up_rows_insert();
CREATE TRIGGER tbl1_up_rows_on_delete AFTER DELETE ON tbl1 FOR EACH ROW
EXECUTE PROCEDURE tbl1_up_rows_delete();

December 20th, 2007 at 5:37 am
Hi! I’m currently learning PL/PGSQL for the very same issue as well. It should be possible to reduce your code into one function by making use of the variable TG_OP which is on of INSERT, UPDATE or DELETE (depending on your create function definition):
IF TG_OP = ‘INSERT’ THEN
… do the insert stuff
ELSE
… do the delete stuff
END IF;
Furthermore in your delete function, you’re fetching the row record twice. You could use the first row record to determine whether num_rows is already at 0 (if found) and then skipping the update statement at all. Thereby the second select into row statement wouldn’t be needed:
SELECT INTO row num_rows …
IF NOT FOUND THEN
INSERT …
ELSE
IF row.num_rows > 0 THEN
UPDATE tbl1_num_rows SET num_rows = ((SELECT …
END IF;
END IF;
Yours,
Lars
December 20th, 2007 at 10:19 am
Thanks for the suggestions, I cleaned up the delete function! Like I said in the post, it’s merely an example. I wrote the entire thing in like 10 minutes
If I was writing this for an actual production database, I would certainly clean it up.
I actually wrote a cool little system, using PL/PGSQL and TRIGGERS just last week, to keep track of ratings using the same basic logic. It’s pretty damn cool, it basically works like this:
new_rating = (((current_rating * number_of_votes) + submitted_rating) / (number_of_votes + 1))
It keeps an average rating, along with the number of votes, cached in a table without even touching the actual ratings table.
January 29th, 2008 at 8:21 am
A simpler solution consists on using VACUUM and the catalog tables.
SELECT reltuples FROM pg_tables WHERE relname = ‘tbl1′
You have to VACUUM ANALYZE the table in question in order to update the catalog.
January 29th, 2008 at 8:23 am
Wrong query, it has to say:
SELECT reltuples FROM pg_class WHERE relname = ‘tbl1′ AND relkind = ‘r’
January 29th, 2008 at 9:46 am
Ricardo,
Yes, I am aware of that, and if you look around my blog, you’ll see that query at least once.
However, there’s several problems with using that method:
1) The row count isn’t updated until you run VACUUM ANALYZE, so it can’t be used for anything that requires the exact number of rows. Also, I have projects where people add/delete rows every couple minutes, so that produces wildly inaccurate results.
2) That query can’t be used to emulate a “WHERE foo = ‘bar’” query, ie, you can’t do something similar to SELECT COUNT(*) AS num_rows FROM tbl1 WHERE foo = ‘bar’, whereas with the other method, it can be extended to do that fairly easily.
I do use that query sometimes, it’s just not feasible for all situations.
March 22nd, 2008 at 12:02 am
I read on http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL (which is how I got here) that “it goes through the entire table sequentially”.
Can you give a database idiot (that would be me) an idea of what quantity of information it’s pulling from disk? Is it pulling in headers, or PKs from each record, or does COUNT(*) read the entire contents of the table from disk?
thanks!
March 22nd, 2008 at 8:20 am
Ken,
It depends on the situation. If you give it a qualifier (ie. WHERE col1 = ‘foo’), then it will read an index, however, it will still have to check each row to see if it’s visible to the current transaction, since that information isn’t stored in index, plus if the WHERE qualifies too many rows, say 600, then it will drop down to a sequential scan. If you do simply SELECT COUNT(*) FROM table, then it will perform a sequential scan on the entire table (see below).
There’s essentially two ways to read information from a table, you can do a sequential scan, or an index scan. What a sequential scan does, is scan every single row in the table, which as you may expect, is a lot of information and thus causes the query to be slow. The other type of scan is an index scan, which reads the information from an index, which is a lot smaller than the actual table, and is much faster. For more information on index scans and sequential scans, wikipedia is your friend.
Also, note that not all tables will need a trigger to emulate COUNT(*). In my experience, for Web Applications, as long as you have under 2000-8000 rows (depending on the server), you’re in the clear, and COUNT(*) really shouldn’t be an issue. Tip: use EXPLAIN ANALYZE to see if you really need a trigger. As long as you don’t go over ~10-20ms for the query, there shouldn’t be a problem for a Web Application. FYI, I have a 3000 row table that takes ~1.2ms to scan with COUNT(*), there’s never been a problem with speed in the place it’s used, and all of the queries combined take ~15ms. Typically, there’s only a problem once you start getting more than 10,000 rows.
June 19th, 2008 at 3:29 pm
I couldn’t understand some parts of this article PostgreSQL Slow Count() Workaround, but I guess I just need to check some more resources regarding this, because it sounds interesting.