Archive for the ‘General’ Category

Confusing Airport Message

Thursday, October 30th, 2008

I found this while setting up my Airport Extreme.

Airport

It’s a little unclear why having the status light blink amber for a couple seconds, is such a big issue, that it needs it’s own confirmation dialog. I’m sure they could have come up with an explanation that, you know, actually elaborates on what the actual problem is, or at least conveys some sort of semi-useful message.

PostgreSQL: Handling Ratings, Without Aggregates

Friday, September 26th, 2008

This is something you shouldn’t have to learn the hard way: How to create a ratings system without using aggregate functions. Specifically, what I’m talking about by an aggregate, is the AVG() SQL function. The problem with using AVG(), is it reads every row in the table, which works well when you’re not dealing with very many rows, but as your data grows, the query speed progressively slows down. This is because aggregate functions essentially perform a sequential scan on the table. The solution for this, is to store the average for each individual item in a table, and use simple math calculations for updates.

Such a table may look something like this:

CREATE TABLE item_ratings (
    item_id INTEGER NOT NULL PRIMARY KEY,
    rating DOUBLE PRECISION,
    votes INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX item_ratings_rating_key ON item_ratings (rating);
CREATE INDEX item_ratings_votes_key ON item_ratings (votes);

There’s a number of options for adding the initial entry, that updates will eventually be done on, such as doing it in code or via a trigger/rule. For the sake of simplicity, lets say there’s already a blank entry for item #1.

INSERT INTO item_ratings (item_id, rating, votes) VALUES (1, 0, 0);

Now, instead of adding a new entry for each vote cast, you simply update the previous entry. Lets say someone casts a vote, giving it a 4 star rating. That would be handled as such:

UPDATE item_ratings SET
    rating = ((rating * votes) + 4) / (votes + 1),
    votes = (votes + 1)
    WHERE item_id = 1;

SELECT * FROM item_ratings WHERE item_id = 1;
 item_id | rating | votes
---------+--------+-------
      1          4        1

That’s pretty much it. Now all ratings are stored per item, instead of generated by an aggregate. This can save major headaches when dealing with large datasets, since AVG() in PostgreSQL tends to be slow when working with such datasets.

My Qwest Internet Nightmare Part III

Saturday, August 23rd, 2008

I’m experiencing a number of different issues. For one, the connection speed is wildly inconsistent. Sometimes I get 5mbps, other times I get under 1mbps. The connection is always dropping, and reconnecting. Sometimes it spends 30+ minutes disconnecting and reconnecting, before it finally gets a stable connection. Large downloads are constantly being interrupted because of the connection dropping.

This is crazy. I’m giving them another angry phone call on Monday.

My Qwest Internet Nightmare Part II

Wednesday, August 13th, 2008

Qwest finally got my Internet working, *phew*, after an agonizing 13 days. Now a new problem has reared it’s ugly head: My internet connection is ridiculously slow. I mean seriously slow. My gateway shows the downlink speed at 1100kbps. At my old apartment, it was usually around 6900. What’s worse: my actual connection speed is slower than what the gateway displays.

SpeedTest.net gives me an average of ~800kbps, just barely faster than my uplink speed! The meter jumped wildly between 1200kbps and 200kbps, during the test. This is absolutely ridiculous. I haven’t called yet, but I’m hoping it will end up being a simple fix. This whole experience has left me dismayed and flabbergasted. This is the first time I’ve ever considered Comcast as a serious alternative.

Lack Of Solutions To Deter SSH Brute Force Attacks

Thursday, July 17th, 2008

I’ve been really discouraged by the overall lack of decent solutions for minimizing SSH brute force attacks. There’s the typical and obvious one’s, that just about every article states, like setting AllowUsers/DenyUsers, and setting PermitRootLogin = no. The problem with those, is while they may strengthen security, they don’t stop bots from trying to brute force your server, which does nothing from a performance standpoint. Since having multiple bots trying to brute force your server can cause server load to go through the roof, this can be very important.

Disabling passwords, and only accepting key based logins can make you more secure, however, it still doesn’t deal with performance issues. And also, if you have a drive failure, and loose your SSH key, how do you login? If nobody else can make you a new key pair, and login to copy it over, then you’re locked out of your own server. Also, how do you login from someone else’s computer?

You can move SSH to a different port, but then you have to update the config for all your users. This can be a big pain in the ass to handle, but it can solve both the security and performance issues with brute force attacks. Overall it’s pretty weak, though, since some bots have started probing other ports, like 2222, to see if SSH is running on it. It surely wouldn’t work for anything big.

Then there’s log-based learning, like fail2ban. I have a problem with this too, as log injection attacks can be used to trick the learning daemon into causing a denial of service attack. Overall, this is a pretty weak solution too. There’s already been proof-of-concept log injection attacks for just about every log based learning daemon, and there’s sure to by more in the future.

And on to iptables based solutions. By filtering at firewall level, you can see clients that are trying to connect to SSH. By far, the biggest weakness of this setup is it can’t determine if a login was successful or not, it can only see SYN packets. If you have it setup to block clients that try to connect 3 or more times in a minute, then three successful logins would cause a legitimet client to be blocked. While it may sound absurd that a client would login that frequently, it isn’t when you consider that tab-completion is available in SSH and SCP, and for each tab-completion event, SSH logs in then back out; the limit can easily be exhausted, and cause legit clients to be blocked.

There really needs to be a better solution. It seems like the only real way to solve this, is to build some kind of configurable adaptive firewall into SSH, itself, to detect and deny brute force attacks.

PostgreSQL Quick Tip: Indexing Large Columns

Monday, July 14th, 2008

When working with large columns, you can run into some indexing issues. For instance, there’s a limit on how many characters a btree index entry can use, which poses problems for indexing text fields, since inserts and updates will fail if they use more characters than the index can use. Also, you can run into performance and disk consumption issues with large columns. The solution for this is to index columns using the hashtext() function. so, ie:

CREATE INDEX table1_column1_hash ON table1 hashtext(column1);

Then in your queries, do:

SELECT * FROM table1 WHERE hashtext(column1) = hashtext('foo');

There’s some issues with this solution, mainly being that you can only have exact matches for the column, so you can’t do range scans or LIKE queries, however, if you need searching features you should probably be using full text searching, which is available in PostgreSQL 8.2+.

PostgreSQL Planing Built In Replication

Wednesday, June 18th, 2008

Yes, that’s right, the PostgreSQL team is planning to add built in replication. This has come, in part, do to the demand of replication solutions, and the lack of easy-to-setup, easy-to-maintain replication for PostgreSQL. Many projects have moved to MySQL, simply because developers don’t want to deal with the headaches of Slony-I or PGCluster.

The features are planned to ship with PostgreSQL 8.4, however, do to technical issues, read-only queries to slaves may not be ready until 8.5.

This is going to be a huge leap for PostgreSQL.

The full announcement is on the PostgreSQL mailing list.

Comcast PowerBoost Is PR BS

Monday, June 16th, 2008

Comcast PowerBoost is useless. They ramble on about how Comcast Internet is so much faster than DSL, with PowerBoost giving you speeds up to 12mbit. Read the fine print, you’ll notice that PowerBoost only works for the first 10MB of a file (attached), and even that isn’t guaranteed. To put this into perspective, if I downloaded a 10MB file on my 7mbit DSL, and did the same on Comcast Internet with PowerBoost, Comcast would be a whopping 5 seconds faster, assuming both services work as advertised. Now if I were to download a 700MB iso with my 7mbit DSL, that would take ~14 minutes. If I did the same with 6mbit Comcast, compensating for PowerBoost, it would take ~16 minutes, DSL being about 2 minutes faster. Even if you had 8mbit Comcast Internet, the speed difference with DSL would still only be about 2 minutes.

Comcast relies on trickery to try and win over customers, by saying their internet service is way faster than DSL. In reality, PowerBoost does nothing to make downloads faster, there’s almost no difference between DSL and Cable download speeds (assuming that both services deliver exactly what they advertise), and Comcast’s methods of advertising are shady, to say the least.