Archive for the ‘PostGIS’ Category

PostGIS: Bounding Box Indexing

Friday, December 19th, 2008

You may have seen from my previous post, that I’ve been playing around with PostGIS. I had some more time to dig deeper into it, and I found that, although functions like distance_sphere/ST_distance_sphere aren’t indexable, bounding-box lookups are. Bounding box indexing can be used in conjunction with those functions, to improve the performance of queries, fairly easily.

This only works for GiST indexes, which can be confusing if you’re not familiar with indexes in PostgreSQL. Say the GEOMETRY column you want to index is geom on the table zip. Basically, you just do:

CREATE INDEX zip_geom_idx ON zip USING GIST (geom);

For searching for points that are within a certain radius of another, PostGIS provides a function called expand/ST_expand, which expands a single point to a box, which can then be used in an index lookup.


The queries essentially break down to:

SELECT ... FROM ... WHERE
geom && expand('some point', 1)
AND distance_sphere('some point', geom) <= (15 * 1609.344);

The first part of the WHERE clause, geom && expand(’some point’, 1), expands “some point” 1 degree in all directions to create a box, and finds points that are inside or on the border of the expanded box. This is the part that indexing is used on.

The second part of the WHERE clause, distance_sphere(’some point’, geom) <= (15 * 1609.344), refines the selection (the rows that were found in the index scan), down to points that are within 15 miles of 'some point'.

I usually just use subqueries and a join, rather than doing multiple fetches, so in my case it looks more like:

SELECT t.* FROM mytable t
JOIN zip z ON (z.zip = t.zip)
WHERE z.geom && expand(
    (SELECT geom FROM zip WHERE zip = '10001'), 1)
AND distance_sphere(geom,
    (SELECT geom FROM zip WHERE zip = '10001')) <= (15 * 1609.344);

PostGIS: Calculate Distance Between ZIP Codes

Friday, November 14th, 2008

Followup Post: PostGIS: Bounding Box Indexing

Recently, I’ve been playing around with PostGIS (that’s geographic information system). The function, distance_sphere (or ST_distance_sphere, depending on the version), is of particular interest to me. Here’s what the documentation has to say about it:

“ST_distance_sphere(point, point), Returns linear distance in meters between two lat/lon points. Uses a spherical earth and radius of 6370986 meters. Faster than distance_spheroid(), but less accurate. Only implemented for points.”

This function is especially useful for calculating the approx. distance between zip codes. Lets say you have a table like this:

CREATE TABLE zip (
    zip CHAR(5) NOT NULL PRIMARY KEY,
    latitude FLOAT8 NOT NULL,
    longitude FLOAT8 NOT NULL
);

You could simply add a new column like so:

ALTER TABLE zip ADD COLUMN geom GEOMETRY;

Then populate the new column:

UPDATE zip SET geom = makepoint(longitude, latitude);

The function distance_sphere, calculates the distance in meters. In order to convert the result to miles, it needs to by divided by 1609.344. Lets say, you want to get all zip codes within 15 miles another zip code, say 10001. You could simply do:

SELECT zip FROM zip WHERE
    distance_sphere(geom,
        (SELECT geom FROM zip WHERE zip = '10001'))
    <= (15 * 1609.344);

There doesn’t seem to be a way to use an index to speed the query up, but it is still significantly faster than other methods. On the application side, you could use something like Memcached to cache the results, if applicable.

There are more accurate ways to find the distance between lat/lon points, however, they also require more work, and are slower.