July 14th, 2008
When working with large columns, you can run into some indexing issues. For instance, there’s a limit on how many characters a btree index entry can use, which poses problems for indexing text fields, since inserts and updates will fail if they use more characters than the index can use. Also, you can run into performance and disk consumption issues with large columns. The solution for this is to index columns using the hashtext() function. so, ie:
CREATE INDEX table1_column1_hash ON table1 hashtext(column1);
Then in your queries, do:
SELECT * FROM table1 WHERE hashtext(column1) = hashtext('foo');
There’s some issues with this solution, mainly being that you can only have exact matches for the column, so you can’t do range scans or LIKE queries, however, if you need searching features you should probably be using full text searching, which is available in PostgreSQL 8.2+.
Posted in PostgreSQL, SQL, General | No Comments »
July 4th, 2008
Since everyone else seems to have a list of how to speed up PHP Apps, I thought it was about time I made one. Here’s some basic rules for speeding up PHP Applications:
- Avoid including large files/classes. The more PHP has to process, the slower it is. A lot of frameworks have needlessly large classes. It’s recommended that you split things up into smaller “modules” and include them as needed. Also, only include what you need. PHP has to process everything, even the stuff you don’t use. With some sites, I’ve seen the number requests per second increase, 4 times, simply by moving away from needlessly large frameworks and unneeded includes.
- Use PHP’s built-in functions whenever you can. PHP has a lot of functions for working with strings, arrays, etc. A lot of times, they can save you headaches, and improve performance over user created functions.
- Avoid Regular Expressions. Instead, use functions like str_replace, in place of preg_replace, or strstr, in place of preg_match, when possible.
- Be careful of the way you reference objects. PHP is known to have trouble with garbage collection under certain situations, such as when you’re dealing with classes that are circular-referenced. These types of situations usually go unnoticed, until PHP start exceeding it’s memory limit.
- Avoid functions like file_get_contents(), and file_put_contents(), when working with large files… Unless you want to run out of memory.
- Use memcached, aggressively if possible. Hmm, lets see, 100 requests/second vs 1000 requests/second… On the same server. It’s a no brainer.
- Don’t ever use persistent connections. They may seem like a good idea, and they may appear to make your app faster in benchmarks, but it will severely cripple your ability to handle high traffic… Let me explain, persistent connections and connection pooling aren’t the same. Persistent connections are per user, and users will often end up with 2 or 3 persistent connections open for them. Nobody else can use these connections; this leads to your server running out of database connections with a relatively small number of users (think 40-60). If you really need this functionality, use an actual connection pooler.
- Avoid making your database grumpy; most bottlenecks are related to the database. This typically means avoiding sequential scans, aggregate functions, etc. With MySQL, avoid complex JOINs, as well as date range scans, and “WHERE column IN (SELECT …)” queries. With PostgreSQL, avoid “SELECT COUNT(*)”, and keep the total number of queries to a minimum.
- Get more servers. Single servers can only handle so many requests per second. When you get to a point where your server is struggling under the load, and there’s no clear bottlenecks, the only thing you can do is scale to your needs.
Posted in PHP | 2 Comments »
June 27th, 2008
I’ve searched high and low, but haven’t found any solid documentation on how to work with BLOB/BYTEA columns with PDO. Here’s what I’ve figured out.
By default, when using the query method, PDO will return a Resource ID for binary objects. The Resource ID must be used together with fread(). So, ie:
$c = $db->query('SELECT blob FROM table WHERE id = 21');
$res = $c->fetch(PDO::FETCH_ASSOC);
$buf = null;
while (!feof($res['blob'])) {
$buf .= fread($res['blob'], 2048);
}
fclose($res['blob']);
You can also fetch binary objects as a string, by binding the column with PDO::PARAM_STR:
$buf = null;
$st = $db->query('SELECT blob FROM table WHERE id = 21');
$st->bindColumn('blob', $buf, PDO::PARAM_STR);
$st->fetch();
It’s hard to believe there’s no documentation on this. I really hope I just missed it; this would be a pretty big thing to have no documentation on.
Update: The documentation is here, hidden by obscurity. It sounds like it’s talking about handling external large objects, and not internal blob columns, inside the table.
Posted in PostgreSQL, PHP, SQL | No Comments »
June 23rd, 2008
I was bored, and I’ve been learning Python, so I decided to write a Unix Daemon Skeleton, complete with comments explaining why things are done the way they’re done. It’s attached at the bottom of this entry.
I’ve also been playing around with GUI programming in Python, with wxWidgets, on my MacBook Pro. Hopefully that will turn up something useful; I’ve been considering writing a project management tool, to simplify some operations.
Python Daemon Skeleton
Posted in Python | No Comments »
June 18th, 2008
Yes, that’s right, the PostgreSQL team is planning to add built in replication. This has come, in part, do to the demand of replication solutions, and the lack of easy-to-setup, easy-to-maintain replication for PostgreSQL. Many projects have moved to MySQL, simply because developers don’t want to deal with the headaches of Slony-I or PGCluster.
The features are planned to ship with PostgreSQL 8.4, however, do to technical issues, read-only queries to slaves may not be ready until 8.5.
This is going to be a huge leap for PostgreSQL.
The full announcement is on the PostgreSQL mailing list.
Posted in PostgreSQL, General | No Comments »
June 16th, 2008
Comcast PowerBoost is useless. They ramble on about how Comcast Internet is so much faster than DSL, with PowerBoost giving you speeds up to 12mbit. Read the fine print, you’ll notice that PowerBoost only works for the first 10MB of a file (attached), and even that isn’t guaranteed. To put this into perspective, if I downloaded a 10MB file on my 7mbit DSL, and did the same on Comcast Internet with PowerBoost, Comcast would be a whopping 5 seconds faster, assuming both services work as advertised. Now if I were to download a 700MB iso with my 7mbit DSL, that would take ~14 minutes. If I did the same with 6mbit Comcast, compensating for PowerBoost, it would take ~16 minutes, DSL being about 2 minutes faster. Even if you had 8mbit Comcast Internet, the speed difference with DSL would still only be about 2 minutes.
Comcast relies on trickery to try and win over customers, by saying their internet service is way faster than DSL. In reality, PowerBoost does nothing to make downloads faster, there’s almost no difference between DSL and Cable download speeds (assuming that both services deliver exactly what they advertise), and Comcast’s methods of advertising are shady, to say the least.
Posted in Comcast, Rantings, General | 10 Comments »
June 11th, 2008
I’ve been getting hit with a lot of web form spam lately, but I hate Captcha, so I thought of a simple solution for it. All I did was add an extra field to my form, and I changed the CSS so it’s set as “display: none;“. Then when the form is submitted, the validator checks to see if there’s anything in the extra field. If there is, the submitted fields are stored in a separate table, just for spam, along with information like IP address, User Agent, etc. It’s working pretty well, so far; hopefully the Spam Bots won’t adapt, but if they do, I’ll be ready.
I’m hoping to eventually collect enough information from these Spam Bots, to create a spam filtering system for web forms
Posted in Spam, General | 1 Comment »
June 11th, 2008
Regular expressions are pretty easy to use, however, different DBMS’s handle them differently.
In PostgreSQL, regular expression searches are done via the “~” character. For instance:
SELECT * FROM table1 WHERE column1 ~ '^(.?)unq([a-z]{4})([1-9]+)$';
In MySQL, this is done via REGEXP:
SELECT * FROM table1 WHERE column1 REGEXP '^S([c-z])n(.*)$';
In Oracle, this is done via the REGEXP_LIKE function:
SELECT * FROM table1 WHERE REGEXP_LIKE(column1, '[[:digit:]]{2}$');
Posted in MySQL, PostgreSQL, SQL | 1 Comment »