Improving Dynamic Images [Theoretically]
Something I’ve been thinking about lately, is how to improve the performance of images served from a database. The model I’ve deployed in the past, looks like this:
- Apache Rewrites all requests to /Images to /image-script.php?image=$1
- The script then includes all base files, some not necessary, and performs a sanity check.
- The script grabs the meta-info from the database, for the requested file.
- An ETag, as well as other HTTP headers, such as Cache-Control, Content-Disposition, etc, are generated.
- The script checks for a cached copy of the data (using Cache_Lite), and if it’s not found or it’s expired, it fetches the image data from the database, and caches it.
- Either the headers and data is sent, or a 304 Not Modified is sent.
The flaw with this design: things are included, but not used, and the database is touched for every request. The new design that I’m thinking about would be a very minimalistic class, and it would only access the database if the cache doesn’t exist, or if it’s expired. It would look more like this:
- Class is initialize, and sanitizes all input.
- When called upon, the class checks the cache for the serialized database result, and unserialize()’s it if it’s found.
- If not found, it opens the database, fetches the data, then caches the serialize()’d result.
- Either a 304 Not Modified is then sent, or the headers are generated and sent along with the data.
- The backend administration panel clears the cache [per virtual file] when UPDATEs or DELETEs occur. This is to avoid having an updated image not hit live until the cache is expired (the previous model used the meta info from the database to find out if this happened or not).
My initial tests show that I can get around 200-500 requests a second on a real world server by using this model. Not too bad in my opinion.

March 21st, 2008 at 1:30 pm
The problem with JIT in this case is that when you deploy in a high-traffic environment with a cold cache, you end up with race conditions, duplicated work, and a database stampede.
So, say it takes .3s to do your transformation and cache your data. At 500 requests/second, that means the first 160 of those requests happen with a cold cache, which causes a rush on your DB (160 reads of the original data), a CPU spike as those 160 are transformed, and writing the same data to cache 160 times over. You can use a mutex to block until the first transform is complete, but those carry their own set of problems (what if your code crashes before releasing the lock?). They also don’t help you once you need to scale beyond one webserver.
The solution, in my opinion, is to recognize that Apache is far better at serving static content than PHP will ever be. Your best bet is really to front-load the operation, performing whatever scaling you need on the data when it gets uploaded, and writing it to disk. Then you can just link to those files and let Apache take care of the rest.
Clearly, this works for unchanging data, such as thumbnails, overlays, and the like. If you’re generating truly dynamic images (say, text of the current user’s name overlaid on an image), that front-loading isn’t going to work. But for most operations (scaling, optimizing, matting, cropping), it’s the way to go.
March 21st, 2008 at 1:41 pm
Good observation.
I have been considering using Python to handle this. Basically letting it grab files from the database, write it locally, then use the req.sendfile() method to deliver them. Or maybe having the backend write the image to an area, above the htdocs directory. The sendfile() method is much faster than the PHP way of doing it, BTW.
One of my concerns about letting regular users upload images to a server, is if someone finds a hole in the code, they could potentially upload a PHP script to the server, and exploit it from there.
And yeah, I realize Apache is far better at handling this, and I use it in most cases, but when you have users uploading their own Avatars that need to be scaled and cropped to various sizes, you have to have to process them is some kind of dynamic language, and keep track of the images in the database.