Profile

Paul Annesley This is the personal website of Paul Annesley, senior developer at 99designs in Melbourne, Australia. You can follow Paul on Twitter.

Recent Bookmarks

  • Semantic Versioning » Simple version number specification for systems which expose a public API. The format is major.minor.patch (e.g. 3.0.12); major indicates backwards incompatible, minor indicates backwards compatible, and 0.x.x indicates rapid development.
  • The Go Programming Language » New programming language from Google: performance like C, dynamic like Python, concurrent like Erlang.
  • node.js » Event driven network IO for V8 JavaScript.
  • v8 JavaScript Engine » Google's JavaScript engine as seen in Chrome, runs standalone or embedded in C++
  • jaml - GitHub » Jaml tries to emulate Ruby’s Haml library, making it easy to generate HTML in your JavaScript projects.
  • proxymachine - GitHub » Awesome looking Ruby/EventMachine TCP proxy from GitHub that does content-based routing to a backend. Opens a proxy to a backend once the read buffer contains enough information for a ruby block to return the desired backend address.
  • sstrudeau's jquery-dropper at master - GitHub » jQuery plugin, uses Canvas to provide an "eye dropper" color picker for same-domain images based on their pixel data.
  • Cloudvox - API-driven phone calls » Make and receive calls with JSON over RESTful HTTP, with flexible pricing.

People

  • James Annesley » Maker and purveyor of fine jazz saxophone music in Melbourne, Australia

Conflict free DNS and routes with multiple DHCP interfaces

27 October 2009

The Problem

Running DHCP on two or more network interfaces inevitably leads to conflicting or unpredictable DNS and default route settings.

For development at home and work, I use an Ubuntu virtual machine running on Mac OS. To ensure I have a predictable IP address regardless of what network I'm on, the VM primary network interface is NATed, so it gets an IP address from VMware's DHCP server. To let my co-workers access HTTP on my virtual machine, I have a second network interface which is bridged

The biggest symptom of the problem is complete loss of connectivity when I switch between home and office, and the default route from the previous location is retained.

The Solution That Should Work

DHCP client configuration lets you specify which details you want to request from the DHCP server.

The request statement

request [ option ] [, ... option ];

The request statement causes the client to request that any server responding to the client send the client its values for the specified options. Only the option names should be specified in the request statement - not option parameters. By default, the DHCP server requests the subnet-mask, broadcast-address, time-offset, routers, domain-name, domain-name-servers, host-name, nis-domain, nis-servers, and ntp-servers options.

So it should be possible to omit 'routers' and 'domain-name-servers' from the 'request' statement of the bridged interface, and all should be good. However, it seems that some DHCP servers (like in my Linksys router at home) send a 'router' anyway, and the DHCP client respects it despite not having requested it.

The Solution That Does Work

The solution that seems to work reliably is to write a simple dhclient-enter-hook to unset any unwanted details before they are processed.


# /etc/dhcp3/dhclient-enter-hooks.d/bridged-eth1

if [ "$interface" = eth1 -a -n "$new_routers" ]; then
        echo Discarding eth1 routers: $new_routers
        unset new_routers
fi
if [ "$interface" = eth1 -a -n "$new_domain_name_servers" ]; then
        echo Discarding eth1 dns servers: $new_domain_name_servers
        unset new_domain_name_servers
fi

A quote from Guido van Rossum

23 February 2009

Although I had hoped to provide something similar in Python, it quickly became clear that such an approach would be impossible because there was no way to elegantly distinguish instance variables from local variables in a language without variable declarations.

— Guido van Rossum, The History of Python: Adding Support for User-defined Classes


# The Greeter class
class Greeter
  def initialize(name)
    @name = name.capitalize
  end 
  def salute
    puts "Hello #{@name}!"
  end
end

— Ruby example, ruby-lang.org front page

Owned.

A quote from _why

23 January 2009

On a related note, I would like to say something to all of the parties working on languages atop abstracted runtimes, such as the JVM and .Net and Parrot.

The first thing is: you should try targetting x86. It's very popular!

— why the lucky stiff, potion INTERNALS

Visual Refreshment 2009

09 January 2009

To bring in the new year on a bright note, I decided to turf my old, tired, dark, noisy blue design. Welcome to paul.annesley.cc — 2009 edition, with a refreshed, dare I say "cleaner" white design.

Dropping a column and removing all the graphics, backgrounds, borders and so on was a simple task — a testament to CSS and the separation of structure and presentation. Adjustments to the navigation and the addition of an about page were quick thanks to Django.

Enjoy, and happy new year!

Very blue website screenshot Very white website screenshot

CLI History Meme

15 December 2008

I just came across Rob Hudson's Command Line History, which in turn reminded me of Mark Pilgrim's History meme from back in April.

After waiting a fashionably-late eight months, I'll use this as an excuse to find out if I can still publish to my site since hastily patching it up to Django 1.0 three months ago. I suspect that says something about my technology vs. writing priorities.


paul@macbuntupro:~$ uname -a
Linux macbuntupro 2.6.24-21-server #1 SMP Wed Oct 22 00:18:13 UTC 2008 i686 GNU/Linux

paul@macbuntupro:~$ cat .bash_history | awk '{print $1}' | sort | uniq -c | sort -rn | head
    371 git
     32 ruby
     27 cd
     15 ls
     13 mysql
      8 php
      6 less
      5 irb
      4 vi
      4 chmod

Git wins.

Flexihash - Consistent Hashing for PHP

30 April 2008

After reading Tom Kleinpeter's article on consistent hashing last month, I was inspired to write Flexihash, an open source consistent hashing implementation for PHP, as I couldn't see anything decent around that fit the bill.

But first, a little bit of background and a use case example. C'mon, humor me…

Balancing Act

There's a few reasons why a large website would want to spread their images (and other resources) out across several URLs. One reason is that web browsers limit the amount of concurrent connections per hostname, and the speed benefits of greater concurrency can be achieved by using multiple hostnames throughout a page. Another reason is so that the requests can be spread across multiple servers without funneling though a single load balancer.

To spread the images across multiple URLs, there needs to be a way to map a given URL path to one of the available servers. Once downloaded, the image will be cached in users browsers based on its URL, so every effort must be made to serve it from the same hostname in future. This rules out picking one of the servers at random each time.

Hashing

A hash function consistently maps given data (anything, but in this case an image path) to a relatively small integer. Hash functions are consistent by definition, but don't confuse them with consistent hashing - read on!

Using a hash function (or a hash-like function such as CRC32), it would seem the goal is easily achievable. Hash the image path to an integer, modulus it by the number of servers, and look up the server at that index.

servers = [s1, s2, s3]
servers[HASH(/path/to/image.png) % servers.length] # => s2

Thrashing

The trouble with hashing is that it turns into thrashing the moment the list of servers changes. If another server is added to handle increased load, it'll certainly get a good workout, as suddenly most of the image URLs are hashing to a different hostname, missing the users browser cache, missing the front proxy cache and pummeling the servers. If a server is removed, even worse, as there'll be less cannon fodder for flood of HTTP.

Consistent Hashing

Onto the good stuff…

As with most problems in life, the logical solution starts with drawing a ring to represent your hash address space as a continuum. I solve just about everything that way. Randomly placing a dot somewhere on the enormous ring for each of a handful of servers will likely result in uneven distribution, which can be fixed by adding another hundred or so randomly placed dots for each server.

The whole idea sounds pretty strange, until you realise that you can now hash any of the image paths to a point on that ring, and then moving clockwise, pick the closest server. If a server is added, it wont get in the way of many lookups, and if one is removed it wont be sorely missed.

paulbookpro:flexihash paul$ php tests/runtests.php --with-benchmark
All Tests
NonConsistentHash: 92% of lookups changed after adding a target to the existing 10
NonConsistentHash: 90% of lookups changed after removing 1 of 10 targets
ConsistentHash: 6% of lookups changed after adding a target to the existing 10
ConsistentHash: 9% of lookups changed after removing 1 of 10 targets
Distribution of 100 lookups per target (min/max/median/avg): 71/126/109/100
…

Of course, consistent hashing has far more useful applications than distributing URLs. In applications like server-side distributed caching (e.g. memcached), consistent hashing has another great property - redundant writes. When picking a server to write a cache item to, rather than stopping at the closest server on the ring, continuing to move clockwise will find the second best server for that item. Writing the cache item to the first and second best server means removing the first one causes little to no cache misses.

// write cache item to both targets in case cache-2 is removed
$hash->lookupList('object', 2); // [cache-2, cache-4]
$hash->removeTarget('cache-2');
$hash->lookup('object'); // cache-4

Check It Out

Update Feb 2009: ignore the following paragraph, find Flexihash at GitHub!

You can grab Flexihash from Google code, either as a source archive, a single bundled PHP file, or via Subversion (I'm not cool enough to be using Git just yet). There's also some pretty graphs, stats and information at Flexihash on Ohloh which suggest Flexihash is crap. Don't believe the lies!

<?php
require_once('flexihash.php');

$hash = new Flexihash();

// bulk add
$hash->addTargets(array('cache-1', 'cache-2', 'cache-3'));

// simple lookup
$hash->lookup('object-a'); // cache-1
$hash->lookup('object-b'); // cache-2

// add and remove
$hash
	->addTarget('cache-4')
	->removeTarget('cache-1');

// lookup with next-best fallback (for redundant writes)
$hash->lookupList('object', 2); // [cache-2, cache-4]

// remove cache-2, expect object to hash to cache-4
$hash->removeTarget('cache-2');
$hash->lookup('object'); // cache-4

CSS Naked Day 2008

08 April 2008

Tomorrow is CSS Naked Day '08, and websites around the intertubes will be stripping their styles and showing off their semantic markup.

I figured I'd get the ball rolling a little early, to balance the fact that I'll inevitably forget to re-enable the styles for at least a week afterwards.

WordPress Upgraded, Atom Feed via FeedBurner

23 December 2007

I've just upgraded WordPress, and while I was there it seemed a good time to switch from RSS to Atom feeds, and deliver them via FeedBurner.

I've always been interested to see how different feed readers handle a feed switching from RSS to Atom - I guess I'll find out. The new Atom entry IDs are the same as the old RSS item IDs, so hopefully we wont see duplicated posts...

Django and Python Imaging Library (PIL) on Leopard

19 November 2007

Chapter 5 of the excellent Django Book makes use of django.db.models.URLField().

When running under the default Python 2.5 environment shipped with Mac OS 10.5 Leopard, this causes an error something along these lines:

$ ./manage.py validate
Error: One or more models did not validate:
books.author: "headshot": To use ImageFields, you need to install the Python Imaging Library. Get it at http://www.pythonware.com/products/pil/ .

Assuming you've installed XCode (go on, you know you want to..) this can be solved like so:

Install libjpeg

wget http://www.ijg.org/files/jpegsrc.v6b.tar.gz
tar zxvf jpegsrc.v6b.tar.gz
cd jpeg-6b/
./configure
make
sudo make install-lib

Install Python Imaging Library (PIL)

wget http://effbot.org/media/downloads/Imaging-1.1.6.tar.gz
tar zxvf Imaging-1.1.6.tar.gz 
cd Imaging-1.1.6
sudo python setup.py install

Not too hard, but still, sucks to be us Apple users sometimes. If my Ubuntu Server virtual machine didn't make my laptop run so damn hot, I'd be using that. Are tickless kernels and PowerTOP helping anyone on this front yet?

If it all looks too hard, I'm sure MacPorts (or is it DarwinPorts? who knows) will hold your hand.

Great Webate - Tables Have No Place In Web Page Layout

25 August 2007

As promised, I'm finally posting my notes from my second debate in the Melbourne Web Standards Group's first ever Great Webate.

The proposition was that Tables Still Have A Place In Web Page Layout, and it was up to James Edwards and I to convince the crowd that such a statement just wasn't going to fly in the 21st century. Our unfortunate opponents were Kevin Yank and Andrew Krespanis, and I must say they made a brave albeit sometimes desperate attempt, given the proposition and the audience.

My argument was along these lines…

Once upon a time, when most sites boasted "Best viewed in Netscape 4" or "Powered by Frontpage", tables were great. Possibly even as good as framesets! With just a handful of table rows, plenty of data cells, and a sprinkling of spacer GIFs, you could place your spinning mailbox icon anywhere you liked!

But thankfully those days are gone, and with them, the need to use tables for web page layout. Jump forwards a few years, and we're in a whole new era.

"The use of tables is now actually interfering with building a better, more accessible, flexible, and functional Web."

http://hotdesign.com/seybold/everything.html

Semantics

HTML, in its various forms and versions, is quite rich with semantics. The correct use of heading levels and paragraphs alone can introduce valuable structure to a document. Correct use of tables—for tabular data, rather than layout—is also an invaluable tool. But when all of the content is contained in layers of nested tables, determining the document structure based on the markup alone becomes near impossible. When you have a rich set of tags for describing your content, why try to jam it all into tables?

Design—Locked In, Inflexible

"Forgetting semantics—because that's essentially what you're doing when you use tables for page layout—CSS based layout still has the massive benefit of separating content and presentation. A table will lock you into a design."

http://www.htmldog.com/ptg/archives/000049.php

Page Size & Cacheability

Using tables for layout means very verbose markup that needs to be defined again and again on every page, rather than in a central (and cacheable) style sheet. Apart from making a site very expensive to maintain and redesign, this also means that every page load will be larger. Now, you might say that a few extra kilobytes per page doesn't matter, but this is important for mobile Internet devices. Coupled with the fact that they do not scale well on small screens, table based layouts are clearly inappropriate for mobile devices.

Markup as an API

More and more, the value of cleanly structured markup is becoming apparent. The standardization of various microformats allows web authors to add small pieces of metadata to their markup to introduce rich semantics which are easily discovered and interpreted by machines. This has led to the view of markup as an API. Personally, I don't fancy working with any API that lives in seven layers of nested tables.

User Style Sheets and User Scripts, Accessibility

Even if you don't plan on utilizing microformats in your content, you are giving your users much greater value by providing clean, semantic markup. Users with special needs will be able to easily apply user style sheets and scripts to make your website accessible.

User style sheets aside, the WCAG1 states Authors should use style sheets for layout and positioning. The message is quite clear: Do not use tables for layout...