Archive for the ‘Tech’ Category

Aggressive Caching With Memcached

Sunday, August 17th, 2008

There’s many different methods to use caching, with memcached. I call this one the Aggressive approach.

The basic logic is this: if you don’t even touch the database, you will be able to serve far more requests, than if you did. The biggest downside is, you can’t really do things too dynamically, and really, this only works well for flat pages (ie., it’s not for anything that displays content, based on the user).

In my case, I added the following code (shown as pseudo code) to the main/controller/dispatch script:


$memcache = new Memcache;
$memcache->connect(...);
$data = $memcache->get($_SERVER['REQUEST_URI'])

if ($data) {
    echo $data;
    die();
 }


$db = new PDO(...);
ob_start();
ob_implicit_flush(false);
...
... do whatever ...
...
$data = ob_get_contents();
$memcache->set($_SERVER['REQUEST_URI'], $data, 0, 3600);
ob_end_flush(); 

One of the biggest downsides to this approach, is figuring out how to delete cached objects when something’s updated. If you have a single table that holds the pages that you’re caching, however, you can just add a hook that deletes the cached object, by URI, when anything is written that would affect it.

This really doesn’t work well for anything super dynamic, but it’s great for increasing performance in areas that would otherwise waste resources.

Comcasts HD Lineup Sucks

Sunday, February 17th, 2008

It’s plain and simple: Comcasts HD lineup sucks.I live in the Tacoma, WA area, and Comcast’s HD lineup here is absolute crap. I have, maybe, 20 HD channels. For Christ’s sake, Dish Network has more HD than Comcast, and I am pissed off because Comcast keeps running these commercials, saying they have more HD than anyone else. LIES!Let’s run through a couple HD channels that DirectTV has, and Comcast doesn’t:

  • CNN HD
  • TLC HD
  • Discovery HD
  • Nickelodeon HD
  • Cartoon Network HD
  • FX HD
  • Spike HD
  • SciFi HD
  • CMT HD
  • MTV HD
  • TBS HD
  • CNBC HD
  • Bravo HD
  • Food Network HD
  • Versus HD
  • Science Channel HD
  • MGM HD
  • Speed HD
  • Movie Channel HD
  • Fox Business HD

Comcast started rolling out new HD channels in the Philadelphia area, and they promote those channels in their advertisements, even though that’s the only area that has those channels. I’m fed up with this BS.

PHP And pg_pconnect - Persistent Connections Suck

Thursday, January 10th, 2008

Lately, a server I’ve been maintaining has been running out of database connections. Upon further investigation, the culprit is clearly PHP’s pg_pconnect function. Persistent connections with PHP is buggy to say the least, and it’s often mistaken for connection pooling. Connection pooling is different from the persistent connection implementation that PHP uses, since persistent connections are only per Apache process. That means that each Apache process (likely per user), has it’s own pool of connections that only it can use. What’s worse is the connection isn’t closed until the Apache process times out, so most of the open connections are idle. The flaw in this design is astonishing. Right now, there’s 38 open database connections and only two of them are actively being used. Also, by default, PHP has pgsql.max_persistent set to -1, meaning each Apache process is allowed unlimited database connections. If you’re set on using persistent connections over pooling, you should set that to no more than 1 or 2.

The real solution is to use a “middleman” process to provide true connection pooling. This can easily be done easily with pgpool.

This is the pgpool configuration I’m using right now:

listen_addresses = ‘localhost’
port = 5433
socket_dir = ‘/tmp’
backend_host_name = ‘localhost’
backend_port = 5432
backend_socket_dir = ‘/var/run/postgresql’

Now, the only change you have to make to your PHP projects, is to use pg_connect instead of pg_pconnect, and connect to port 5433, instead of 5432. That’s it! Pgpool will automatically pool your database connections, and you will avoid running out of connections!

hmnos ocea
uogu rfsinlduhbg

PostgreSQL Slow Count() Workaround

Wednesday, December 12th, 2007

You may also be interested in this. It’s a far more elegant solution, if you only need the entire number of rows in a table.


MySQL zealots tend to point to PostgreSQL’s slow count() as a weakness, however, in the real world, count() isn’t used very often, and if it’s really needed, most good database systems provide a framework for you to build a workaround. The reason MySQL has fast counts, is because, they’re cached. This works in MySQL because MySQL uses serialized INSERTs, but higher grade DBs are transactional, and PostgreSQL uses MVCC, so caching row counts produces inaccurate results.

Here’s an example count() workaround for PostgreSQL, using a combination of PL/PgSQL and TRIGGERs. You could certainly extend this model, adding more columns to the cached rows table, so you can simulate SELECT count(id) WHERE foo = ‘bar’, although you’d have to add a TRIGGER for UPDATEs.


--- This table will be used as a dummy table,
--- that we wan't to get a count() from.
CREATE TABLE tbl1 (
    id INTEGER
);

--- This table is where the cached counts will be stored
CREATE TABLE tbl1_num_rows (
    num_rows INTEGER NOT NULL
);

--- Function to update the count in tbl1_num_rows,
--- on an INSERT
CREATE OR REPLACE FUNCTION tbl1_up_rows_insert ()
RETURNS TRIGGER AS $$
    DECLARE
        row RECORD;
    BEGIN
        --- Auto-add record
        SELECT INTO row num_rows FROM tbl1_num_rows;
        IF NOT FOUND THEN
            INSERT INTO tbl1_num_rows VALUES (0);
        END IF;

        UPDATE tbl1_num_rows SET num_rows =
		((SELECT num_rows FROM tbl1_num_rows) + 1);

        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

--- Function to update the count in tbl1_num_rows,
--- on a DELETE
CREATE OR REPLACE FUNCTION tbl1_up_rows_delete ()
RETURNS TRIGGER AS $$
    DECLARE
        row RECORD;
    BEGIN
        --- Auto-add record
        SELECT INTO row num_rows FROM tbl1_num_rows;
        IF NOT FOUND THEN
            INSERT INTO tbl1_num_rows VALUES (0);
        ELSE
            IF row.num_rows > 0 THEN
                UPDATE tbl1_num_rows SET num_rows =
                    ((SELECT num_rows FROM tbl1_num_rows) - 1);
            END IF;
        END IF;

        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

--- Add the TRIGGERS to tbl1
CREATE TRIGGER tbl1_up_rows_on_insert AFTER INSERT ON tbl1 FOR EACH ROW
EXECUTE PROCEDURE tbl1_up_rows_insert();

CREATE TRIGGER tbl1_up_rows_on_delete AFTER DELETE ON tbl1 FOR EACH ROW
EXECUTE PROCEDURE tbl1_up_rows_delete();

Why The Statement “MySQL Is Faster Than PostgreSQL” Is Wrong

Friday, December 7th, 2007

I’ve come across a lot of hardcore MySQL fanboys who say they use MySQL because it’s faster than PostgreSQL. On a certain level, this statement is wrong. While I agree that, generally speaking, MySQL is faster than PostgreSQL, you have to understand that developers like that don’t take load into account. Most of the benchmarks they cite aren’t real world benchmarks; they test how fast data is delivered from the database, but don’t test how well the database performs under high load. In that respect, PostgreSQL beats MySQL hands down on multi-core, multi-processor, real-world servers, while MySQL crumbles under the load. This is due to the way PostgreSQL is built; PostgreSQL is an excellent example of good multi-threaded programming, and it scales almost linearly up to about 16 cores (as of version 8.0). And honestly, who cares if MySQL can deliver a simple query 0.01ms faster than PostgreSQL under no load? Promoting your database of choice with that argument is childish.

But that’s not the reason I like PostgreSQL. I like it because it offers a wealth of features that make my life a whole lot easier, it allows for better data integrity than MySQL, and it does all that with far less “gotcha’s” than MySQL.

Preserving Database Compatibiliy

Friday, November 30th, 2007

There’s always a lot of talk about preserving database compatibility, creating database abstraction layers, providing syntax compatibility, etc. At some point, you have to ask yourself, is it worth the effort? In most cases, it’s not. Although I highly recommend database abstraction layers, like PEAR::MDB2, unless you really need to have a project run on more than one database system, trying to keep compatibility is just silly. The reason is, MySQL, PostgreSQL, and Oracle all differ in several major ways. For instance, PostgreSQL and Oracle use Sequences, and MySQL uses it’s own auto_increment for primary keys, and the name of datatypes in Oracle differ greatly from the names of datatypes in MySQL and PostgreSQL. Also, each of those databases differ in behavior, such as MySQL having loose standards, and PostgreSQL and Oracle having strict standards, which can be seen with INSERTs on MySQL, where it will accept data that doesn’t meet the restrictions put on the column.

Keep in mind that in order to keep compatibility, you have to sacrifice speed. Now, for most small projects, sacrificing speed isn’t that big of an issue, but for large high traffic websites, sacrificing speed costs a lot, in server requirements, scalability, and user experience.

If you’re thinking of making a cross-database project, ask yourself these questions:

  • Are you expecting your project to be used on a wide variety of servers?
  • Are you expecting a large base of people who rely on your project?
  • Will making your project only compatible with a single database hurt it?
  • Will your project be publicly available, or sold to several different clients?

If you answer yes to one of these, than making your project cross-database compatible may be a good idea. Also, if you’re making something like a forum or blog package, than cross-database compatibility is probably a good idea.

This is why I love PostgreSQL

Saturday, August 18th, 2007

I’ve always loved PostgreSQL ever since I started really playing around with it’s potential. Now I always find myself stumbling upon new tricks and extremely elegant solutions to problems, that would be a pain in the ass to handle on the application side. Here’s a cool little feature I found while working on some logging stuff.

SELECT * FROM stats WHERE age(current_timestamp, created) <= ‘30 minutes’;

Very useful in some situations. I still haven’t found a MySQL equivalent.