Archive for November, 2007

Preserving Database Compatibiliy

Friday, November 30th, 2007

There’s always a lot of talk about preserving database compatibility, creating database abstraction layers, providing syntax compatibility, etc. At some point, you have to ask yourself, is it worth the effort? In most cases, it’s not. Although I highly recommend database abstraction layers, like PEAR::MDB2, unless you really need to have a project run on more than one database system, trying to keep compatibility is just silly. The reason is, MySQL, PostgreSQL, and Oracle all differ in several major ways. For instance, PostgreSQL and Oracle use Sequences, and MySQL uses it’s own auto_increment for primary keys, and the name of datatypes in Oracle differ greatly from the names of datatypes in MySQL and PostgreSQL. Also, each of those databases differ in behavior, such as MySQL having loose standards, and PostgreSQL and Oracle having strict standards, which can be seen with INSERTs on MySQL, where it will accept data that doesn’t meet the restrictions put on the column.

Keep in mind that in order to keep compatibility, you have to sacrifice speed. Now, for most small projects, sacrificing speed isn’t that big of an issue, but for large high traffic websites, sacrificing speed costs a lot, in server requirements, scalability, and user experience.

If you’re thinking of making a cross-database project, ask yourself these questions:

  • Are you expecting your project to be used on a wide variety of servers?
  • Are you expecting a large base of people who rely on your project?
  • Will making your project only compatible with a single database hurt it?
  • Will your project be publicly available, or sold to several different clients?

If you answer yes to one of these, than making your project cross-database compatible may be a good idea. Also, if you’re making something like a forum or blog package, than cross-database compatibility is probably a good idea.

PostgreSQL And Timestamps Inside Transactions

Sunday, November 18th, 2007

Calling now(), or current_timestamp within a transaction always returns the same value (the time the transaction started I think). Here’s a workaround that I’m using:

BEGIN;

SELECT timeofday()::timestamptz;

COMMIT;

Working Around ORDER BY RANDOM()

Monday, November 5th, 2007

Here’s what I’m doing to work around ORDER BY RANDOM() in PostgreSQL, which is generally slow:

  1. Get the maximum value for the primary key, eg, “SELECT MAX(id) FROM tbl”
  2. Get a random number within that range. In PHP, you would do “$rand_id = rand(1, $max_id);”
  3. Use the random number to fetch data, eg, “SELECT * FROM tbl WHERE id >= $max_id LIMIT 1″

I use “>=” as opposed to “=” to ensure that, if there’s any holes in the ID’s, it wont return an empty dataset.

Of course, this is only useful when obtaining a single record, as the starting position is the only thing that’s random. If you need any more records, you’d have to get a new random number and perform another query.

A quick and dirty benchmark shows a pretty good performance increase. On a 600 row table with PostgreSQL 8.2, ORDER BY RANDOM takes ~2.1ms, where the other method takes between 0.200ms and 0.300ms.

Optionally, you could replace “SELECT MAX(id) FROM tbl” with “SELECT last_value FROM tbl_seq” for a performance increase, assuming you know the name of the sequence for the table you’re working on.