Archive for the ‘General’ Category

Merging Tables To Kill JOINs

Thursday, June 25th, 2009

I was working on this project that has several large tables, and many JOINs. It had reached the point where the JOINs were becoming too costly, but fortunately, some of the tables could be merged. The problem, though, was figuring out how to get the data merged. I came up with a solution that uses views and rules… Let’s say there’s 2 tables, “main_table” and “tbl1″, and “tbl1″ needs to be merged with “main_table”:

-- Create new columns in main_table
ALTER TABLE main_table ADD COLUMN col1 VARCHAR(32);
ALTER TABLE main_table ADD COLUMN col2 VARCHAR(32);

-- Create view for tbl1
CREATE VIEW tbl1_view AS SELECT * FROM tbl1;

-- Create rule for view, tbl1_view
CREATE RULE tbl1_view_upd AS ON UPDATE TO tbl1_view DO INSTEAD (
    UPDATE main_table SET col1 = NEW.col1, col2 = NEW.col2
        WHERE id = NEW.refid;
);

The view comes in useful, because it doesn’t affect updates to tbl1 while you’re working. Now, if you simply do an update to the view, it will update the newly created columns in “main_table” to reflect the ones in “tbl1″. The view and old table can then be safely dropped once the data is merged.

UPDATE tbl1_view SET id = id;
DROP VIEW tbl_view;
DROP TABLE tbl1;

Another view with rules can then be created to represent the data that was previously in “tbl1″ and redirect writes to “main_table”, so things don’t break while backend code is being updated. My solution turned out to be a simple way to merge the tables. I would definitely consider a different approach if you’re working with millions of rows, though.

More On Comcast’s Move To Digital

Thursday, April 9th, 2009

Comcast’s plan to transition to digital is well on it’s way in Seattle/Tacoma. Infrastructure is being upgraded in many areas to increase HDTV capacity, which mostly involves upgrading old systems to an 860Mhz system. In some areas, analog channels are already in the process of being shut off. They’re not shut off yet; currently they display a screen with a phone number to call to order a digital cable box, but they’re sure to be shut off in the near future.

The space freed by the analog channels will be used primarily for HDTV. This is something Comcast desperately needs in order to compete with DirecTV and Verizon FiOS. Brier Dudley of the Seattle Times, posted a blog entry that explains which areas are expected to have their analog shut off.

Asking Who For What?

Thursday, April 2nd, 2009

I recently stumbled across this gem, from a website called Minda News… I think the Moro Islamic Liberation Front should consider a name change. Click the image for full view.

Moro Islamic Liberation Front (MILF)

Note that “cell phone” is also horribly misspelled.

PostgreSQL And SELECT DISTINCT On Large Tables

Thursday, January 22nd, 2009

I’ve been noticing lately that PostgreSQL’s SELECT DISTINCT operation seems to be un-optimized for large tables. Take a look:

# EXPLAIN ANALYZE SELECT DISTINCT mycol FROM mytable;
QUERY PLAN
------------------------
 Unique (time=3393.085..3615.420 rows=20799 loops=1)
 ->  Sort  (time=3393.082..3514.374 rows=204494 loops=1)
       Sort Key: mycol
       ->  Seq Scan (time=9.455..289.479 rows=204494)
 Total runtime: 3626.724 ms
(5 rows)

Now compare DISTINCT to GROUP BY:

# EXPLAIN ANALYZE SELECT mycol FROM mytable GROUP BY mycol;
QUERY PLAN
------------------------
 HashAggregate (time=415.129..427.497 rows=20799)
   ->  Seq Scan (time=8.127..269.190 rows=204494)
 Total runtime: 433.836 ms
(3 rows)

That’s quite an improvement. 3626.724 ms compared to 433.836 ms. Even with ordering, GROUP BY is still faster:

# EXPLAIN ANALYZE SELECT mycol FROM mytable GROUP BY mycol
ORDER BY mycol;
QUERY PLAN
------------------------
 Sort  (time=642.126..654.545 rows=20799 loops=1)
   Sort Key: mycol
   ->  HashAggregate  (time=408.922..423.040 rows=20799)
         ->  Seq Scan (time=7.519..263.577 rows=204494)
 Total runtime: 661.537 ms
(5 rows)

The main problem seems to be that PostgreSQL is choosing a bad query plan. Lets try forcing PostgreSQL to use an index scan on SELECT DISTINCT:

# SET enable_seqscan = off;
SET
# EXPLAIN ANALYZE SELECT DISTINCT mycol FROM mytable;
QUERY PLAN
------------------------
Unique (actual time=0.013..361.917 rows=20799 loops=1)
   ->  Index Scan (actual time=0.012..266.856 rows=204494)
 Total runtime: 368.296 ms
(3 rows)

# SET enable_seqscan = on;
SET

Wow! Big improvement. Lets see how that compares to GROUP BY with sequential scans turned off:

# SET enable_seqscan = off;
SET
# EXPLAIN ANALYZE SELECT mycol FROM mytable GROUP BY mycol
ORDER BY mycol;
QUERY PLAN
------------------------
 Group (actual time=0.014..362.013 rows=20799)
   ->  Index Scan (actual time=0.011..267.001 rows=204494)
 Total runtime: 368.201 ms
(3 rows)

# SET enable_seqscan = on;
SET

So, at this point, the two queries are comparable. Tweaking the configuration sometimes causes PostgreSQL to choose a bad plan on other queries, so it seems like if PostgreSQL is choosing a bad plan for a query like this, you can simply turn sequential scans off for it.

Versus / Golf Split, And Comcast Phasing Out Analog

Monday, December 8th, 2008

The Versus / Golf split is complete. For Comcast, in Seattle / Tacoma, the new channels are:

  • 625 Golf HD
  • 626 Versus HD 

The former hybrid channel Versus/Golf, is pending removal.

Also I read an interesting article about Comcast shutting off analog in Seattle / Tacoma (link). Basically, it says channels 2-29 will remain broadcasting in analog for the time being, but everything over 29 will be phased out over the next year. They could start the transition as soon as sometime this month, around the time they roll out their new Wideband Internet service. They are also lowering the price of their Digital Cable service, to the price of the current Extended-Basic service (the analog package, which is being phased out).

Comcast Channel Shuffling

Saturday, December 6th, 2008

If you’re wondering why there’s nothing on MOJO HD, it’s because MOJO HD has been cancelled (iN DEMAND confirms). Now the question is, what will it be replaced with.

In the Seattle/Tacoma area, Comcast has been shuffling channels around.

  • MHD has been renamed to Palladia.
  • ESPN HD is now being mapped to both 173 (old) and 623 (new), the old channel presumably pending removal.
  • ESPN2 HD is now being mapped to both 174 (old) and 624 (new), the old channel presumably pending removal.
  • FSN HD has been rolled out (using the space freed by MOJO???). It’s channel 627.
  • The hybrid channel, VS/Golf, is going to be split into two separate channels (VS and Golf). Whether or not both (or which, if any) will be carried in this area, is unconfirmed at this point.
  • King-5 Weather Plus (channel 115) is pending removal, scheduled to be removed on 12-31-2008. The reason is, the parent company has decided to pull the plug on it.

Spurious Rumors: Comcast Ramping Up HD?

Wednesday, November 26th, 2008

I’ve been following some spurious rumors, that Comcast could be preparing to ramp up their HD capacity in the Seattle/Tacoma area. A technician said that next month, they’re supposedly going to cut the number of analog channels they provide. Why is this important? Because the old analog channels take up twice as much space as the average HD channel.

A single analog channel consumes a 6mhz slot (while 2-3 HD channels can occupy the same slot). If they’re running 70 analog channels on a 860mhz system, then the analog channels consume roughly half of it’s capacity. Even removing a few of those channels frees up a lot of space for more HD channels. In some cities that are all digital, Comcast runs 50+ HD channels.

I have my fingers crossed.

Confusing Airport Message

Thursday, October 30th, 2008

I found this while setting up my Airport Extreme.

Airport

It’s a little unclear why having the status light blink amber for a couple seconds, is such a big issue, that it needs it’s own confirmation dialog. I’m sure they could have come up with an explanation that, you know, actually elaborates on what the actual problem is, or at least conveys some sort of semi-useful message.