PostgreSQL: Handling Ratings, Without Aggregates
September 26th, 2008This is something you shouldn’t have to learn the hard way: How to create a ratings system without using aggregate functions. Specifically, what I’m talking about by an aggregate, is the AVG() SQL function. The problem with using AVG(), is it reads every row in the table, which works well when you’re not dealing with very many rows, but as your data grows, the query speed progressively slows down. This is because aggregate functions essentially perform a sequential scan on the table. The solution for this, is to store the average for each individual item in a table, and use simple math calculations for updates.
Such a table may look something like this:
CREATE TABLE item_ratings (
item_id INTEGER NOT NULL PRIMARY KEY,
rating DOUBLE PRECISION,
votes INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX item_ratings_rating_key ON item_ratings (rating);
CREATE INDEX item_ratings_votes_key ON item_ratings (votes);
There’s a number of options for adding the initial entry, that updates will eventually be done on, such as doing it in code or via a trigger/rule. For the sake of simplicity, lets say there’s already a blank entry for item #1.
INSERT INTO item_ratings (item_id, rating, votes) VALUES (1, 0, 0);
Now, instead of adding a new entry for each vote cast, you simply update the previous entry. Lets say someone casts a vote, giving it a 4 star rating. That would be handled as such:
UPDATE item_ratings SET
rating = ((rating * votes) + 4) / (votes + 1),
votes = (votes + 1)
WHERE item_id = 1;
SELECT * FROM item_ratings WHERE item_id = 1;
item_id | rating | votes
---------+--------+-------
1 4 1
That’s pretty much it. Now all ratings are stored per item, instead of generated by an aggregate. This can save major headaches when dealing with large datasets, since AVG() in PostgreSQL tends to be slow when working with such datasets.
