Archive for December, 2007

PostgreSQL Slow Count() Workaround

Wednesday, December 12th, 2007

You may also be interested in this. It’s a far more elegant solution, if you only need the entire number of rows in a table.


MySQL zealots tend to point to PostgreSQL’s slow count() as a weakness, however, in the real world, count() isn’t used very often, and if it’s really needed, most good database systems provide a framework for you to build a workaround. The reason MySQL has fast counts, is because, they’re cached. This works in MySQL because MySQL uses serialized INSERTs, but higher grade DBs are transactional, and PostgreSQL uses MVCC, so caching row counts produces inaccurate results.

Here’s an example count() workaround for PostgreSQL, using a combination of PL/PgSQL and TRIGGERs. You could certainly extend this model, adding more columns to the cached rows table, so you can simulate SELECT count(id) WHERE foo = ‘bar’, although you’d have to add a TRIGGER for UPDATEs.


--- This table will be used as a dummy table,
--- that we wan't to get a count() from.
CREATE TABLE tbl1 (
    id INTEGER
);

--- This table is where the cached counts will be stored
CREATE TABLE tbl1_num_rows (
    num_rows INTEGER NOT NULL
);

--- Function to update the count in tbl1_num_rows,
--- on an INSERT
CREATE OR REPLACE FUNCTION tbl1_up_rows_insert ()
RETURNS TRIGGER AS $$
    DECLARE
        row RECORD;
    BEGIN
        --- Auto-add record
        SELECT INTO row num_rows FROM tbl1_num_rows;
        IF NOT FOUND THEN
            INSERT INTO tbl1_num_rows VALUES (0);
        END IF;

        UPDATE tbl1_num_rows SET num_rows =
		((SELECT num_rows FROM tbl1_num_rows) + 1);

        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

--- Function to update the count in tbl1_num_rows,
--- on a DELETE
CREATE OR REPLACE FUNCTION tbl1_up_rows_delete ()
RETURNS TRIGGER AS $$
    DECLARE
        row RECORD;
    BEGIN
        --- Auto-add record
        SELECT INTO row num_rows FROM tbl1_num_rows;
        IF NOT FOUND THEN
            INSERT INTO tbl1_num_rows VALUES (0);
        ELSE
            IF row.num_rows > 0 THEN
                UPDATE tbl1_num_rows SET num_rows =
                    ((SELECT num_rows FROM tbl1_num_rows) - 1);
            END IF;
        END IF;

        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

--- Add the TRIGGERS to tbl1
CREATE TRIGGER tbl1_up_rows_on_insert AFTER INSERT ON tbl1 FOR EACH ROW
EXECUTE PROCEDURE tbl1_up_rows_insert();

CREATE TRIGGER tbl1_up_rows_on_delete AFTER DELETE ON tbl1 FOR EACH ROW
EXECUTE PROCEDURE tbl1_up_rows_delete();

Why The Statement “MySQL Is Faster Than PostgreSQL” Is Wrong

Friday, December 7th, 2007

I’ve come across a lot of hardcore MySQL fanboys who say they use MySQL because it’s faster than PostgreSQL. On a certain level, this statement is wrong. While I agree that, generally speaking, MySQL is faster than PostgreSQL, you have to understand that developers like that don’t take load into account. Most of the benchmarks they cite aren’t real world benchmarks; they test how fast data is delivered from the database, but don’t test how well the database performs under high load. In that respect, PostgreSQL beats MySQL hands down on multi-core, multi-processor, real-world servers, while MySQL crumbles under the load. This is due to the way PostgreSQL is built; PostgreSQL is an excellent example of good multi-threaded programming, and it scales almost linearly up to about 16 cores (as of version 8.0). And honestly, who cares if MySQL can deliver a simple query 0.01ms faster than PostgreSQL under no load? Promoting your database of choice with that argument is childish.

But that’s not the reason I like PostgreSQL. I like it because it offers a wealth of features that make my life a whole lot easier, it allows for better data integrity than MySQL, and it does all that with far less “gotcha’s” than MySQL.