PHP And pg_connect: Post Mortem

In my previous post, I tried to find out why a PHP script was so slow. Here’s an update from then.

Basically, pg_connect is slow. There’s probably a lot of PHP overhead to connect to a database plus it probably expects to be opening a fresh connection everytime, as using pooled connections doesn’t seem to speed it up very much (it goes from ~50 requests a second to ~200). Persistent connections seem to be much faster, albeit much slower than connecting to a pool with Python (450 requests a second compared to 1000), however, if you understand the inner workings of persistent connections you will know that using them with PostgreSQL is seriously broken, as a small number of users connecting to different sites on the same server can easily exhaust max_connections. Using pg_pconnect to connect to a connection pool doesn’t work either; it’s like combining the worst of both: the slowness of pg_connect, and the broken-ness of pg_pconnect.

So there’s pretty much no solution to this problem at this point in time, except maybe to port the script to Python.

humongous professional douchebag

2 Responses to “PHP And pg_connect: Post Mortem”

  1. Ian Eure Says:

    Persistent connections should be avoided at all costs. Because the connections are held open after they’re not needed, the DB server ends up with a much larger number of connections, and (at least with MySQL) will start refusing them when it hits it’s limit.

    You might try experimenting with other authentication methods on the DB server to speed up the pg_connect() call. The PHP function is a very thin wrapper around the PostgreSQL library’s pg_connect(), so there shouldn’t be much difference between PHP and any other binding. Perhaps PyDB is lazy-connecting, or shuffling that latency in some other way?

    I’d be interested to see how the PHP pg_connect() compares with C.

    Oh, and if you’re connecting to a name (db.foo.com) vs an IP, change. DNS lookups could be slowing things down. You could also try connecting via a socket instead of TCP/IP.

  2. admin Says:

    Yeah, connecting through /var/lib/postgresql has half the overhead of 127.0.0.1, but you would either have to setup an ident user or reconfigure pg_hba.conf.

    I’m starting to think the reason Python was so much faster is because PostgreSQL was running out of connections. After giving pgpool a hard limit of 90 to the maximum number of open connections, performance of the Python script dropped significantly, to ~300 requests a second (still faster than PHP). It may just be that my setup isn’t up to par in order to test it, since there really shouldn’t be too many open connections with a concurrency of ten. I’ll have to give it a shot on a Core 2 Duo, when I get some time.

    I know all about the evils of persistent connections. Even with a very small number of users, you can easily run out of connections. Since connections are kept open per-apache-process, and not pooled across all Apache processes like many believe, benchmarks should be taken with a grain of salt, as persistent connections work in favor of the benchmark.

Leave a Reply