This is the personal website of
Paul Annesley,
senior developer at
99designs
in
Melbourne, Australia.
You can follow Paul on Twitter.
27 October 2009
Running DHCP on two or more network interfaces inevitably leads to conflicting or unpredictable DNS and default route settings.
For development at home and work, I use an Ubuntu virtual machine running on Mac OS. To ensure I have a predictable IP address regardless of what network I'm on, the VM primary network interface is NATed, so it gets an IP address from VMware's DHCP server. To let my co-workers access HTTP on my virtual machine, I have a second network interface which is bridged
The biggest symptom of the problem is complete loss of connectivity when I switch between home and office, and the default route from the previous location is retained.
DHCP client configuration lets you specify which details you want to request from the DHCP server.
The request statement
request [ option ] [, ... option ];
The request statement causes the client to request that any server responding to the client send the client its values for the specified options. Only the option names should be specified in the request statement - not option parameters. By default, the DHCP server requests the subnet-mask, broadcast-address, time-offset, routers, domain-name, domain-name-servers, host-name, nis-domain, nis-servers, and ntp-servers options.
So it should be possible to omit 'routers' and 'domain-name-servers' from the 'request' statement of the bridged interface, and all should be good. However, it seems that some DHCP servers (like in my Linksys router at home) send a 'router' anyway, and the DHCP client respects it despite not having requested it.
The solution that seems to work reliably is to write a simple dhclient-enter-hook to unset any unwanted details before they are processed.
# /etc/dhcp3/dhclient-enter-hooks.d/bridged-eth1
if [ "$interface" = eth1 -a -n "$new_routers" ]; then
echo Discarding eth1 routers: $new_routers
unset new_routers
fi
if [ "$interface" = eth1 -a -n "$new_domain_name_servers" ]; then
echo Discarding eth1 dns servers: $new_domain_name_servers
unset new_domain_name_servers
fi
23 February 2009
Although I had hoped to provide something similar in Python, it quickly became clear that such an approach would be impossible because there was no way to elegantly distinguish instance variables from local variables in a language without variable declarations.
— Guido van Rossum, The History of Python: Adding Support for User-defined Classes
# The Greeter class class Greeter def initialize(name) @name = name.capitalize end def salute puts "Hello #{@name}!" end end
— Ruby example, ruby-lang.org front page
Owned.
23 January 2009
On a related note, I would like to say something to all of the parties working on languages atop abstracted runtimes, such as the JVM and .Net and Parrot.
The first thing is: you should try targetting x86. It's very popular!
— why the lucky stiff, potion INTERNALS
09 January 2009
To bring in the new year on a bright note, I decided to turf my old, tired, dark, noisy blue design. Welcome to paul.annesley.cc — 2009 edition, with a refreshed, dare I say "cleaner" white design.
Dropping a column and removing all the graphics, backgrounds, borders and so on was a simple task — a testament to CSS and the separation of structure and presentation. Adjustments to the navigation and the addition of an about page were quick thanks to Django.
Enjoy, and happy new year!
15 December 2008
I just came across Rob Hudson's Command Line History, which in turn reminded me of Mark Pilgrim's History meme from back in April.
After waiting a fashionably-late eight months, I'll use this as an excuse to find out if I can still publish to my site since hastily patching it up to Django 1.0 three months ago. I suspect that says something about my technology vs. writing priorities.
paul@macbuntupro:~$ uname -a
Linux macbuntupro 2.6.24-21-server #1 SMP Wed Oct 22 00:18:13 UTC 2008 i686 GNU/Linux
paul@macbuntupro:~$ cat .bash_history | awk '{print $1}' | sort | uniq -c | sort -rn | head
371 git
32 ruby
27 cd
15 ls
13 mysql
8 php
6 less
5 irb
4 vi
4 chmod
30 April 2008
After reading Tom Kleinpeter's article on consistent hashing last month, I was inspired to write Flexihash, an open source consistent hashing implementation for PHP, as I couldn't see anything decent around that fit the bill.
But first, a little bit of background and a use case example. C'mon, humor me…
There's a few reasons why a large website would want to spread their images (and other resources) out across several URLs. One reason is that web browsers limit the amount of concurrent connections per hostname, and the speed benefits of greater concurrency can be achieved by using multiple hostnames throughout a page. Another reason is so that the requests can be spread across multiple servers without funneling though a single load balancer.
To spread the images across multiple URLs, there needs to be a way to map a given URL path to one of the available servers. Once downloaded, the image will be cached in users browsers based on its URL, so every effort must be made to serve it from the same hostname in future. This rules out picking one of the servers at random each time.
A hash function consistently maps given data (anything, but in this case an image path) to a relatively small integer. Hash functions are consistent by definition, but don't confuse them with consistent hashing - read on!
Using a hash function (or a hash-like function such as CRC32), it would seem the goal is easily achievable. Hash the image path to an integer, modulus it by the number of servers, and look up the server at that index.
servers = [s1, s2, s3]
servers[HASH(/path/to/image.png) % servers.length] # => s2
The trouble with hashing is that it turns into thrashing the moment the list of servers changes. If another server is added to handle increased load, it'll certainly get a good workout, as suddenly most of the image URLs are hashing to a different hostname, missing the users browser cache, missing the front proxy cache and pummeling the servers. If a server is removed, even worse, as there'll be less cannon fodder for flood of HTTP.
Onto the good stuff…
As with most problems in life, the logical solution starts with drawing a ring to represent your hash address space as a continuum. I solve just about everything that way. Randomly placing a dot somewhere on the enormous ring for each of a handful of servers will likely result in uneven distribution, which can be fixed by adding another hundred or so randomly placed dots for each server.
The whole idea sounds pretty strange, until you realise that you can now hash any of the image paths to a point on that ring, and then moving clockwise, pick the closest server. If a server is added, it wont get in the way of many lookups, and if one is removed it wont be sorely missed.
paulbookpro:flexihash paul$ php tests/runtests.php --with-benchmark All Tests NonConsistentHash: 92% of lookups changed after adding a target to the existing 10 NonConsistentHash: 90% of lookups changed after removing 1 of 10 targets ConsistentHash: 6% of lookups changed after adding a target to the existing 10 ConsistentHash: 9% of lookups changed after removing 1 of 10 targets Distribution of 100 lookups per target (min/max/median/avg): 71/126/109/100 …
Of course, consistent hashing has far more useful applications than distributing URLs. In applications like server-side distributed caching (e.g. memcached), consistent hashing has another great property - redundant writes. When picking a server to write a cache item to, rather than stopping at the closest server on the ring, continuing to move clockwise will find the second best server for that item. Writing the cache item to the first and second best server means removing the first one causes little to no cache misses.
// write cache item to both targets in case cache-2 is removed
$hash->lookupList('object', 2); // [cache-2, cache-4]
$hash->removeTarget('cache-2');
$hash->lookup('object'); // cache-4
Update Feb 2009: ignore the following paragraph, find Flexihash at GitHub!
You can grab Flexihash from Google code, either as a source archive, a single bundled PHP file, or via Subversion (I'm not cool enough to be using Git just yet). There's also some pretty graphs, stats and information at Flexihash on Ohloh which suggest Flexihash is crap. Don't believe the lies!
<?php
require_once('flexihash.php');
$hash = new Flexihash();
// bulk add
$hash->addTargets(array('cache-1', 'cache-2', 'cache-3'));
// simple lookup
$hash->lookup('object-a'); // cache-1
$hash->lookup('object-b'); // cache-2
// add and remove
$hash
->addTarget('cache-4')
->removeTarget('cache-1');
// lookup with next-best fallback (for redundant writes)
$hash->lookupList('object', 2); // [cache-2, cache-4]
// remove cache-2, expect object to hash to cache-4
$hash->removeTarget('cache-2');
$hash->lookup('object'); // cache-4
08 April 2008
Tomorrow is CSS Naked Day '08, and websites around the intertubes will be stripping their styles and showing off their semantic markup.
I figured I'd get the ball rolling a little early, to balance the fact that I'll inevitably forget to re-enable the styles for at least a week afterwards.
23 December 2007
I've just upgraded WordPress, and while I was there it seemed a good time to switch from RSS to Atom feeds, and deliver them via FeedBurner.
I've always been interested to see how different feed readers handle a feed switching from RSS to Atom - I guess I'll find out. The new Atom entry IDs are the same as the old RSS item IDs, so hopefully we wont see duplicated posts...
19 November 2007
Chapter 5 of the excellent Django Book makes use of django.db.models.URLField().
When running under the default Python 2.5 environment shipped with Mac OS 10.5 Leopard, this causes an error something along these lines:
$ ./manage.py validate Error: One or more models did not validate: books.author: "headshot": To use ImageFields, you need to install the Python Imaging Library. Get it at http://www.pythonware.com/products/pil/ .
Assuming you've installed XCode (go on, you know you want to..) this can be solved like so:
Install libjpeg
wget http://www.ijg.org/files/jpegsrc.v6b.tar.gz
tar zxvf jpegsrc.v6b.tar.gz
cd jpeg-6b/
./configure
make
sudo make install-lib
Install Python Imaging Library (PIL)
wget http://effbot.org/media/downloads/Imaging-1.1.6.tar.gz
tar zxvf Imaging-1.1.6.tar.gz
cd Imaging-1.1.6
sudo python setup.py install
Not too hard, but still, sucks to be us Apple users sometimes. If my Ubuntu Server virtual machine didn't make my laptop run so damn hot, I'd be using that. Are tickless kernels and PowerTOP helping anyone on this front yet?
If it all looks too hard, I'm sure MacPorts (or is it DarwinPorts? who knows) will hold your hand.
25 August 2007
As promised, I'm finally posting my notes from my second debate in the Melbourne Web Standards Group's first ever Great Webate.
The proposition was that Tables Still Have A Place In Web Page Layout
, and it was up to James Edwards and I to convince the crowd that such a statement just wasn't going to fly in the 21st century. Our unfortunate opponents were Kevin Yank and Andrew Krespanis, and I must say they made a brave albeit sometimes desperate attempt, given the proposition and the audience.
My argument was along these lines…
Once upon a time, when most sites boasted "Best viewed in Netscape 4" or "Powered by Frontpage", tables were great. Possibly even as good as framesets! With just a handful of table rows, plenty of data cells, and a sprinkling of spacer GIFs, you could place your spinning mailbox icon anywhere you liked!
But thankfully those days are gone, and with them, the need to use tables for web page layout. Jump forwards a few years, and we're in a whole new era.
"The use of tables is now actually interfering with building a better, more accessible, flexible, and functional Web."
HTML, in its various forms and versions, is quite rich with semantics. The correct use of heading levels and paragraphs alone can introduce valuable structure to a document. Correct use of tables—for tabular data, rather than layout—is also an invaluable tool. But when all of the content is contained in layers of nested tables, determining the document structure based on the markup alone becomes near impossible. When you have a rich set of tags for describing your content, why try to jam it all into tables?
"Forgetting semantics—because that's essentially what you're doing when you use tables for page layout—CSS based layout still has the massive benefit of separating content and presentation. A table will lock you into a design."
Using tables for layout means very verbose markup that needs to be defined again and again on every page, rather than in a central (and cacheable) style sheet. Apart from making a site very expensive to maintain and redesign, this also means that every page load will be larger. Now, you might say that a few extra kilobytes per page doesn't matter, but this is important for mobile Internet devices. Coupled with the fact that they do not scale well on small screens, table based layouts are clearly inappropriate for mobile devices.
More and more, the value of cleanly structured markup is becoming apparent. The standardization of various microformats allows web authors to add small pieces of metadata to their markup to introduce rich semantics which are easily discovered and interpreted by machines. This has led to the view of markup as an API. Personally, I don't fancy working with any API that lives in seven layers of nested tables.
Even if you don't plan on utilizing microformats in your content, you are giving your users much greater value by providing clean, semantic markup. Users with special needs will be able to easily apply user style sheets and scripts to make your website accessible.
User style sheets aside, the WCAG1 states Authors should use style sheets for layout and positioning.
The message is quite clear: Do not use tables for layout...