Archive for the ‘General’ Category

PostgreSQL Quick Tip: Indexing Large Columns

Monday, July 14th, 2008

When working with large columns, you can run into some indexing issues. For instance, there’s a limit on how many characters a btree index entry can use, which poses problems for indexing text fields, since inserts and updates will fail if they use more characters than the index can use. Also, you can run into performance and disk consumption issues with large columns. The solution for this is to index columns using the hashtext() function. so, ie:

CREATE INDEX table1_column1_hash ON table1 hashtext(column1);

Then in your queries, do:

SELECT * FROM table1 WHERE hashtext(column1) = hashtext('foo');

There’s some issues with this solution, mainly being that you can only have exact matches for the column, so you can’t do range scans or LIKE queries, however, if you need searching features you should probably be using full text searching, which is available in PostgreSQL 8.2+.

PostgreSQL Planing Built In Replication

Wednesday, June 18th, 2008

Yes, that’s right, the PostgreSQL team is planning to add built in replication. This has come, in part, do to the demand of replication solutions, and the lack of easy-to-setup, easy-to-maintain replication for PostgreSQL. Many projects have moved to MySQL, simply because developers don’t want to deal with the headaches of Slony-I or PGCluster.

The features are planned to ship with PostgreSQL 8.4, however, do to technical issues, read-only queries to slaves may not be ready until 8.5.

This is going to be a huge leap for PostgreSQL.

The full announcement is on the PostgreSQL mailing list.

Comcast PowerBoost Is PR BS

Monday, June 16th, 2008

Comcast PowerBoost is useless. They ramble on about how Comcast Internet is so much faster than DSL, with PowerBoost giving you speeds up to 12mbit. Read the fine print, you’ll notice that PowerBoost only works for the first 10MB of a file (attached), and even that isn’t guaranteed. To put this into perspective, if I downloaded a 10MB file on my 7mbit DSL, and did the same on Comcast Internet with PowerBoost, Comcast would be a whopping 5 seconds faster, assuming both services work as advertised. Now if I were to download a 700MB iso with my 7mbit DSL, that would take ~14 minutes. If I did the same with 6mbit Comcast, compensating for PowerBoost, it would take ~16 minutes, DSL being about 2 minutes faster. Even if you had 8mbit Comcast Internet, the speed difference with DSL would still only be about 2 minutes.

Comcast relies on trickery to try and win over customers, by saying their internet service is way faster than DSL. In reality, PowerBoost does nothing to make downloads faster, there’s almost no difference between DSL and Cable download speeds (assuming that both services deliver exactly what they advertise), and Comcast’s methods of advertising are shady, to say the least.

A Simple Way To Deal With Web Form Spam

Wednesday, June 11th, 2008

I’ve been getting hit with a lot of web form spam lately, but I hate Captcha, so I thought of a simple solution for it. All I did was add an extra field to my form, and I changed the CSS so it’s set as “display: none;“. Then when the form is submitted, the validator checks to see if there’s anything in the extra field. If there is, the submitted fields are stored in a separate table, just for spam, along with information like IP address, User Agent, etc. It’s working pretty well, so far; hopefully the Spam Bots won’t adapt, but if they do, I’ll be ready.

I’m hoping to eventually collect enough information from these Spam Bots, to create a spam filtering system for web forms

One Of PostgreSQL’s Biggest Weaknesses

Sunday, April 13th, 2008

One of PostgreSQL’s biggest weaknesses is easy replication and clustering. Note, that I said easy. There’s absolutely replication solutions for PostgreSQL, but many of them have serious downfalls, and are a serious pain in the ass to setup.

Slony-I is a decent single-master multi-slave solution, if you don’t need LOBs replicated. It’s a little hard to setup, and you have to learn it’s special syntax for the config files you will have to create. A serious pitfall is it is ascynchronous, and therefore doesn’t guarantee all nodes will have the same information at any given time. Writes don’t scale very well. Also, if the master server goes down, you’re pretty much screwed unless you have a really cleaver way of failing over, since:

  • Sequence replication isn’t always reliable.
  • Triggers are disabled on all nodes, except for the master, to make sure they’re not called twice.
  • Incomplete failover results in errors being thrown on all nodes, including the failover master, with writes.

When using Slony-I, you need something like pgpool-II so you can transparently load balance nodes and redirect writes to the master server.

pgpool-II also has a multi-master replication mode, however, it absolutely sucks. The reason? It’s statement based, so instead of doing an insert and copying the result to all other nodes, it runs the same statement on all nodes, resulting in drift on all fields that use default values.

PgCluster seems to fix just about all of the problems with Slony-I and pgpool-II, by providing synchronous multi-master replication, however:

  • It’s basically a patchset to PostgreSQL, so it tends to run a release behind.
  • It’s a major, major, MAJOR, pain in the ass to setup.

Mozilla Projects - Getting On My Nerves

Thursday, April 10th, 2008

I’m growing sick of Mozilla projects. The problem I have with them? Bugs, bugs, bugs, bugs, bugs.

Just for instance, in Mozilla Thunderbird there’s this bug that has been around for years and is driving me mad. After it’s been running for a while, it suddenly starts sucking up 100% of my CPU, and I have to restart it. It has been reported by many users, and it still doesn’t seem to have been fixed.

Also, with Firefox, after it’s been running for a while (maybe a couple days or so) it starts slowing down. And I mean really slowing down. Sometimes it takes ten seconds for it to respond to a single click. Other times, I have to kill it because it won’t respond. What’s worse, is it also crashes completely on some sites; I’ve run into this many times on del.icio.us. Firefox 3 beta is even worse!

PostgreSQL Database Size Stats

Tuesday, April 1st, 2008

A while ago, I saw this post on Paul’s blog, about a query to get database size. I looked into it, but I was pretty unhappy with the query, as pg_database_size appears to include the size of pg_catalog.

The actual query in question is this:

SELECT
    pg_database.datname,
    pg_size_pretty(pg_database_size(pg_database.datname))
        AS size FROM pg_database;

The query I came up with, has a much more accurate usage of what a user can actually control, the downside being you have to connect to each database to get usage:

SELECT
    CURRENT_DATABASE(),
    pg_size_pretty(sum(pg_total_relation_size(tablename))::int8)
        AS size
    FROM pg_tables
    WHERE schemaname != 'pg_catalog'
    AND schemaname != 'information_schema';

For a better breakdown of which tables are hogging disk space, I came up with this:

SELECT
    tablename,
    pg_size_pretty(pg_total_relation_size(tablename)) AS total_usage,
    pg_size_pretty((pg_total_relation_size(tablename)
        - pg_relation_size(tablename))) AS external_table_usage
    FROM pg_tables
    WHERE schemaname != 'pg_catalog'
        AND schemaname != 'information_schema'
    ORDER BY pg_total_relation_size(tablename);

The external_table_usage column shows how much space is being used for things such as indexes, very useful if you’re looking to free up some space by dropping unneeded indexes.

Programming Language Rundown

Friday, March 21st, 2008

Ruby

A lot of hype, useless in the real world.

Python

A bunch of self-punishing nuns who whip themselves with chain every time they don’t follow strict syntax standards.

Perl

Used to write scripts that automate server functions, by people who don’t realize you can do the exact same things with a bash script in 1/5th the time and with half the bugs. Also: impossible to read.

ColdFusion

The bad guy in the movie that never seems to die.

PHP

Written by the largest group of schizophrenics with obsessive-compulsive disorder. They try to keep backwards compatibility where it’s not needed, and remove it where it is. Also: has had more API changes than Michael Jackson has had nose jobs.

LOLCODE

The best language ever written. There’s even PL/LOLCODE for PostgreSQL.

HAI

CAN HAS STDIO?

VISIBLE “HAI WORLD!”

KTHXBYE