My Qwest Internet Nightmare

August 9th, 2008

So, when it came time to move, I called Qwest to have them transfer my services. They were supposed to be setup when I arrived, and guess what… They weren’t. Not only that, but I had to wait on the phone for 2 hours, just to schedule them to come by, at their convenience, to fix the problem, while listening to “Your call is important to us”… Yeah right, more like your money is important to us.

Not only that, but it took them a week just to show up. Not only that, when they finally got around to hooking everything up, they routed me to the wrong ISP. And so here I am, waiting 10 days, and counting, for them to fix the problem. The only reason they can get away with this, is because they are a monopoly.

Lack Of Solutions To Deter SSH Brute Force Attacks

July 17th, 2008

I’ve been really discouraged by the overall lack of decent solutions for minimizing SSH brute force attacks. There’s the typical and obvious one’s, that just about every article states, like setting AllowUsers/DenyUsers, and setting PermitRootLogin = no. The problem with those, is while they may strengthen security, they don’t stop bots from trying to brute force your server, which does nothing from a performance standpoint. Since having multiple bots trying to brute force your server can cause server load to go through the roof, this can be very important.

Disabling passwords, and only accepting key based logins can make you more secure, however, it still doesn’t deal with performance issues. And also, if you have a drive failure, and loose your SSH key, how do you login? If nobody else can make you a new key pair, and login to copy it over, then you’re locked out of your own server. Also, how do you login from someone else’s computer?

You can move SSH to a different port, but then you have to update the config for all your users. This can be a big pain in the ass to handle, but it can solve both the security and performance issues with brute force attacks. Overall it’s pretty weak, though, since some bots have started probing other ports, like 2222, to see if SSH is running on it. It surely wouldn’t work for anything big.

Then there’s log-based learning, like fail2ban. I have a problem with this too, as log injection attacks can be used to trick the learning daemon into causing a denial of service attack. Overall, this is a pretty weak solution too. There’s already been proof-of-concept log injection attacks for just about every log based learning daemon, and there’s sure to by more in the future.

And on to iptables based solutions. By filtering at firewall level, you can see clients that are trying to connect to SSH. By far, the biggest weakness of this setup is it can’t determine if a login was successful or not, it can only see SYN packets. If you have it setup to block clients that try to connect 3 or more times in a minute, then three successful logins would cause a legitimet client to be blocked. While it may sound absurd that a client would login that frequently, it isn’t when you consider that tab-completion is available in SSH and SCP, and for each tab-completion event, SSH logs in then back out; the limit can easily be exhausted, and cause legit clients to be blocked.

There really needs to be a better solution. It seems like the only real way to solve this, is to build some kind of configurable adaptive firewall into SSH, itself, to detect and deny brute force attacks.

PostgreSQL Quick Tip: Indexing Large Columns

July 14th, 2008

When working with large columns, you can run into some indexing issues. For instance, there’s a limit on how many characters a btree index entry can use, which poses problems for indexing text fields, since inserts and updates will fail if they use more characters than the index can use. Also, you can run into performance and disk consumption issues with large columns. The solution for this is to index columns using the hashtext() function. so, ie:

CREATE INDEX table1_column1_hash ON table1 hashtext(column1);

Then in your queries, do:

SELECT * FROM table1 WHERE hashtext(column1) = hashtext('foo');

There’s some issues with this solution, mainly being that you can only have exact matches for the column, so you can’t do range scans or LIKE queries, however, if you need searching features you should probably be using full text searching, which is available in PostgreSQL 8.2+.

Speeding Up PHP Applications

July 4th, 2008

Since everyone else seems to have a list of how to speed up PHP Apps, I thought it was about time I made one. Here’s some basic rules for speeding up PHP Applications:

  • Avoid including large files/classes. The more PHP has to process, the slower it is. A lot of frameworks have needlessly large classes. It’s recommended that you split things up into smaller “modules” and include them as needed. Also, only include what you need. PHP has to process everything, even the stuff you don’t use. With some sites, I’ve seen the number requests per second increase, 4 times, simply by moving away from needlessly large frameworks and unneeded includes.
  • Use PHP’s built-in functions whenever you can. PHP has a lot of functions for working with strings, arrays, etc. A lot of times, they can save you headaches, and improve performance over user created functions.
  • Avoid Regular Expressions. Instead, use functions like str_replace, in place of preg_replace, or strstr, in place of preg_match, when possible.
  • Be careful of the way you reference objects. PHP is known to have trouble with garbage collection under certain situations, such as when you’re dealing with classes that are circular-referenced. These types of situations usually go unnoticed, until PHP start exceeding it’s memory limit.
  • Avoid functions like file_get_contents(), and file_put_contents(), when working with large files… Unless you want to run out of memory.
  • Use memcached, aggressively if possible. Hmm, lets see, 100 requests/second vs 1000 requests/second… On the same server. It’s a no brainer.
  • Don’t ever use persistent connections. They may seem like a good idea, and they may appear to make your app faster in benchmarks, but it will severely cripple your ability to handle high traffic… Let me explain, persistent connections and connection pooling aren’t the same. Persistent connections are per user, and users will often end up with 2 or 3 persistent connections open for them. Nobody else can use these connections; this leads to your server running out of database connections with a relatively small number of users (think 40-60). If you really need this functionality, use an actual connection pooler.
  • Avoid making your database grumpy; most bottlenecks are related to the database. This typically means avoiding sequential scans, aggregate functions, etc. With MySQL, avoid complex JOINs, as well as date range scans, and “WHERE column IN (SELECT …)” queries. With PostgreSQL, avoid “SELECT COUNT(*)”, and keep the total number of queries to a minimum.
  • Get more servers. Single servers can only handle so many requests per second. When you get to a point where your server is struggling under the load, and there’s no clear bottlenecks, the only thing you can do is scale to your needs.

PHP PDO and Bytea/Blob Columns

June 27th, 2008

I’ve searched high and low, but haven’t found any solid documentation on how to work with BLOB/BYTEA columns with PDO. Here’s what I’ve figured out.

By default, when using the query method, PDO will return a Resource ID for binary objects. The Resource ID must be used together with fread(). So, ie:


$c = $db->query('SELECT blob FROM table WHERE id = 21');
$res = $c->fetch(PDO::FETCH_ASSOC);
$buf = null;

while (!feof($res['blob'])) {
    $buf .= fread($res['blob'], 2048);
}

fclose($res['blob']);

You can also fetch binary objects as a string, by binding the column with PDO::PARAM_STR:

$buf = null;
$st = $db->query('SELECT blob FROM table WHERE id = 21');
$st->bindColumn('blob', $buf, PDO::PARAM_STR);
$st->fetch();

It’s hard to believe there’s no documentation on this. I really hope I just missed it; this would be a pretty big thing to have no documentation on.

Update: The documentation is here, hidden by obscurity. It sounds like it’s talking about handling external large objects, and not internal blob columns, inside the table.

Python Daemon Skeleton

June 23rd, 2008

I was bored, and I’ve been learning Python, so I decided to write a Unix Daemon Skeleton, complete with comments explaining why things are done the way they’re done. It’s attached at the bottom of this entry.

I’ve also been playing around with GUI programming in Python, with wxWidgets, on my MacBook Pro. Hopefully that will turn up something useful; I’ve been considering writing a project management tool, to simplify some operations.

Python Daemon Skeleton

PostgreSQL Planing Built In Replication

June 18th, 2008

Yes, that’s right, the PostgreSQL team is planning to add built in replication. This has come, in part, do to the demand of replication solutions, and the lack of easy-to-setup, easy-to-maintain replication for PostgreSQL. Many projects have moved to MySQL, simply because developers don’t want to deal with the headaches of Slony-I or PGCluster.

The features are planned to ship with PostgreSQL 8.4, however, do to technical issues, read-only queries to slaves may not be ready until 8.5.

This is going to be a huge leap for PostgreSQL.

The full announcement is on the PostgreSQL mailing list.

Comcast PowerBoost Is PR BS

June 16th, 2008

Comcast PowerBoost is useless. They ramble on about how Comcast Internet is so much faster than DSL, with PowerBoost giving you speeds up to 12mbit. Read the fine print, you’ll notice that PowerBoost only works for the first 10MB of a file (attached), and even that isn’t guaranteed. To put this into perspective, if I downloaded a 10MB file on my 7mbit DSL, and did the same on Comcast Internet with PowerBoost, Comcast would be a whopping 5 seconds faster, assuming both services work as advertised. Now if I were to download a 700MB iso with my 7mbit DSL, that would take ~14 minutes. If I did the same with 6mbit Comcast, compensating for PowerBoost, it would take ~16 minutes, DSL being about 2 minutes faster. Even if you had 8mbit Comcast Internet, the speed difference with DSL would still only be about 2 minutes.

Comcast relies on trickery to try and win over customers, by saying their internet service is way faster than DSL. In reality, PowerBoost does nothing to make downloads faster, there’s almost no difference between DSL and Cable download speeds (assuming that both services deliver exactly what they advertise), and Comcast’s methods of advertising are shady, to say the least.