PHP Is Slow… Really Slow
Lately, I’ve been trying to figure out what’s slowing down a site that’s currently in development… It didn’t take long to figure out what it was.
There’s a script on the server that handles image requests transparently, ie, utilizing mod_rewrite. Basically what it does is this:
- Connect to database.
- Query the database for the meta info of the requested file (created, modified, filetype, etc).
- Check if the client has a cached copy of the image, and if it does, send a 304: Not Modified, an ETag, and return.
- If not, grab the file from server-side cache if it exists, send it, and return.
- Else, fetch the image from the database, cache it locally on disk, and send the data.
99% of the time, all it does it grab the meta data, and send a 304 Not Modified, yet, it is extremely slow, so I decided to do a quick benchmark of PHP and Python, comparing how fast they deliver a small image (about 10kb) from a PostgreSQL database… The results were shocking.
The setup is very modest, an Athlon 64 3200+, 2GB ram, and a 500GB 7200.10 SATA II hard drive. The test is very simple: connect to the database, of which the connections are pooled by pgpool, fetch and send the image, then close the database connection.
The code used is as followed.
PHP:
// Note it's port 5433. That's the port that pgpool listens on.
$con = pg_connect("dbname=testdb user=testuser host=127.0.0.1"
. " password=password port=5433");
$res = pg_query("SELECT data, filetype FROM images WHERE id = 1");
$res = pg_fetch_assoc($res);
header('Content-Type: ' . $res['filetype']);
echo pg_unescape_bytea($res['data']);
pg_close($con);
And now Python:
import psycopg2
from mod_python import apache
def handler(req):
con = psycopg2.connect("dbname='testdb' user='testuser'"
+ " host='127.0.0.1' password='password' port='5433'")
cur = con.cursor()
cur.execute("SELECT data, filetype FROM images WHERE id = 1")
row = cur.fetchone()
con.close()
req.content_type = row[1]
req.write(row[0])
return apache.OK
Pretty simple, right? I ran the test using ab, or Apache Benchmark, with the command:
ab -c 10 -t 15 http://localhost/<scriptname>
I ran the test three times for each script, here’s the results:
PHP - Maximum Requests/Second: 145.32
Python - Maximum Requests/Second: 1088.64
Apache2 Static Image - Max Requests/Second - 2778.54
WOW! Python is nearly 8 times faster than PHP in this test! I was so intrigued, I decided to dig deeper to see exactly what the bottleneck is in PHP.
I created a new test: Connect to the database, and close the connection immediately, don’t send any data whatsoever.
Here’s the results from that:
PHP - Max Requests/Second: 205.00
Python - Max Requests/Second: 1089.32
So PHP gains ~50 requests/second, and Python has no improvement at all. But what if we just leave the scripts blank? Basically just return and do nothing at all? Here’s the results from that:
PHP - Max Requests/Second: 1566.77
Python - Max Requests/Second: 1819.46
So basically, PHP sucks, at least when it comes to PostgreSQL connections. What’s strange is PHP doesn’t seem to have the same problem with MySQL connections, where by simply connecting to the database, performance drops significantly, so something is certainly wrong with the way PHP handles PostgreSQL. I bet there’s a lot of “MySQL vs. PostgreSQL” benchmarks out there that wield wildly inaccurate results because of this. Actually, if you compare MySQL and PostgreSQL purely in Python, you’ll find that PostgreSQL is quite a bit faster at handling binary data, such as images. I will be back if I find more information or a fix for this behavior.
UPDATE: it looks like the problem lies specifically with the pg_connect function. This problem doesn’t exists with pg_pconnect, however, you cannot connect to a database pool like pgpool/pgbounder with pg_pconnect, as that degrades performance.
