paul.annesley.cc

Mac OS X: Launchd Is Cool

One of the core components of Mac OS X is launchd, and it turns out it can do some cool things.

I particularly like the idea of using QueueDirectories to monitor and act upon files dropped into a directory, without having to run any extra daemons. The files could be uploaded to S3, transcoded to a different video format, gzipped… anything.

Anyway, I recently fell into the launchd documentation, and came out with this write-up. Let me know if you find it useful.

Overview

The first thing that the Mac OS kernel runs on boot is launchd, which bootstraps the rest of the system by loading and managing various daemons, agents, scripts and other processes. The launchd man page clarifies the difference between a daemon and an agent:

In the launchd lexicon, a “daemon” is, by definition, a system-wide service of which there is one instance for all clients. An “agent” is a service that runs on a per-user basis. Daemons should not attempt to display UI or interact directly with a user’s login session. Any and all work that involves interacting with a user should be done through agents.

Daemons and agents are declared and configured by creating .plist files in various locations of the system:

~/Library/LaunchAgents         Per-user agents provided by the user.
/Library/LaunchAgents          Per-user agents provided by the administrator.
/Library/LaunchDaemons         System-wide daemons provided by the administrator.
/System/Library/LaunchAgents   Per-user agents provided by OS X.
/System/Library/LaunchDaemons  System-wide daemons provided by OS X.

Perhaps best of all, launchd is open source under the Apache License 2.0. You can currently find the latest source code on the Apple Open Source site.

launchd as cron

The Mac OS crontab man page says:

Although cron(8) and crontab(5) are officially supported under Darwin,
their functionality has been absorbed into launchd(8), which provides a
more flexible way of automatically executing commands.

Turns out launchd has a simple StartInterval <integer> property, which starts the job every N seconds. However the true cron-like power lies in StartCalendarInterval:

StartCalendarInterval <dictionary of integers or array of dictionary of integers>

This optional key causes the job to be started every calendar interval as
specified. Missing arguments are considered to be wildcard. The semantics
are much like crontab(5).  Unlike cron which skips job invocations when the
computer is asleep, launchd will start the job the next time the computer
wakes up.  If multiple intervals transpire before the computer is woken,
those events will be coalesced into one event upon wake from sleep.

     Minute <integer>
     The minute on which this job will be run.

     Hour <integer>
     The hour on which this job will be run.

     Day <integer>
     The day on which this job will be run.

     Weekday <integer>
     The weekday on which this job will be run (0 and 7 are Sunday).

     Month <integer>
     The month on which this job will be run.

Lets find the shortest example of this in action:

pda@paulbook ~ > grep -rl StartCalendarInterval \
                   /Library/Launch* /System/Library/Launch* | \
                   xargs wc -l | sort -n | head -n1 | awk '{print $2}' | xargs cat

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>Label</key>
        <string>com.apple.gkreport</string>
        <key>ProgramArguments</key>
        <array>
                <string>/usr/libexec/gkreport</string>
        </array>
        <key>StartCalendarInterval</key>
        <dict>
                <key>Minute</key><integer>52</integer>
                <key>Hour</key><integer>3</integer>
                <key>WeekDay</key><integer>5</integer>
        </dict>
</dict>
</plist>

Better than cron? Apart from better handling of skipped jobs after system wake, it also supports per-job environment variables, which can save writing wrapper scripts around your cron jobs:

EnvironmentVariables <dictionary of strings>

This optional key is used to specify additional environmental variables to
be set before running the job.

So, anything XML is obviously worse than 0 52 3 * 5 /path/to/command, but launchd is packing more features than cron, so it can pull it off.

launchd as a filesystem watcher

Apart from having an awesome daemon/agent manager, Mac OS X also has an excellent Mail Transport Agent called postfix. There’s a good chance your ISP runs the same software to handle millions of emails every day. We’ll be using it as an example of how launchd can start jobs based on filesystem changes.

Because your laptop isn’t, and shouldn’t be, a mail server, you don’t want postfix running all the time. But when messages are injected into it, e.g. by a script shelling out to /usr/sbin/sendmail or /usr/bin/mail, you want them to be delivered straight away.

Here’s how Mac OS X does it (/System/Library/LaunchDaemons/org.postfix.master.plist):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>org.postfix.master</string>
    <key>Program</key>
    <string>/usr/libexec/postfix/master</string>
    <key>ProgramArguments</key>
    <array>
        <string>master</string>
        <string>-e</string>
        <string>60</string>
    </array>
    <key>QueueDirectories</key>
    <array>
        <string>/var/spool/postfix/maildrop</string>
    </array>
    <key>AbandonProcessGroup</key>
    <true/>
</dict>
</plist>

We’ll start with the simple part. ProgramArguments passes -e 60 to postfix, described thusly:

-e exit_time
              Terminate the master process after exit_time seconds.
              Child processes terminate at their convenience.

So postfix is told to exit after running for 60 seconds. The mystery (to me, earlier today, at least) is how it gets started. It could be on a cron-like schedule, but (a) it isn’t, (b) that would suck, and (c) it would result in delayed mail delivery. It turns out the magic lies in QueueDirectory, which I initially overlooked thinking it was a postfix option. The launchd.plist man page says:

WatchPaths <array of strings>
This optional key causes the job to be started if any one of the listed
paths are modified.

QueueDirectories <array of strings>
Much like the WatchPaths option, this key will watch the paths for
modifications. The difference being that the job will only be started if
the path is a directory and the directory is not empty.

The Launchd Wikipedia page actually goes into more detail:

QueueDirectories
Watch a directory for new files. The directory must be empty to begin with,
and must be returned to an empty state before QueueDirectories will launch
its task again.

So launchd can monitor a directory for new files, and then trigger an agent/daemon to consume them. In this case, the postfix sendmail(1) man page tells us that “Postfix sendmail(1) relies on the postdrop(1) command to create a queue file in the maildrop directory”, and the man page for postdrop(1) tells us that /var/spool/postfix/maildrop is the maildrop queue. launchd sees new mail there, fires up postfix, and then stops it after 60 seconds. This might cause deferred mail to stay deferred for quite some time, but again; your laptop isn’t a mail server.

launchd as inetd

Tranditionally the inetd and later xinetd “super-server daemon” were used to listen on various ports (e.g. FTP, telnet, …) and launch daemons on-demand to handle in-bound connection, keeping them out of memory at other times. Sounds like something launchd could do…

Lets create a simple inetd-style server at ~/Library/LaunchAgents/my.greeter.plist:

<plist version="1.0">
<dict>
  <key>Label</key><string>my.greeter</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/bin/ruby</string>
    <string>-e</string>
    <string>puts "Hi #{gets.match(/(\w+)\W*\z/)[1]}, happy #{Time.now.strftime("%A")}!"</string>
  </array>
  <key>inetdCompatibility</key><dict><key>Wait</key><false/></dict>
  <key>Sockets</key>
  <dict>
    <key>Listeners</key>
    <dict>
      <key>SockServiceName</key><string>13117</string>
    </dict>
  </dict>
</dict>
</plist>

Load it up and give it a shot:

pda@paulbook ~ > launchctl load ~/Library/LaunchAgents/my.greeter.plist
pda@paulbook ~ > echo "My name is Paul." | nc localhost 13117
Hi Paul, happy Friday!

launchd as god!

You can use launchd to ensure a process stays alive forever using <key>KeepAlive</key><true/>, or stays alive under the following conditions.

  • SuccessfulExit — the previous run exited successfully (or if false, unsuccessful exit).
  • NetworkState — network (other than localhost) is up (or if false, down).
  • PathState — list of file paths exists (or if false, do not exist).
  • OtherJobEnabled — the other named job is enabled (or if false, disabled).

These can be combined with various other properties, for example:

  • WorkingDirectory
  • EnvironmentVariables
  • Umask
  • ThrottleInterval
  • StartOnMount
  • StandardInPath
  • StandardOutPath
  • StandardErrorPath
  • SoftResourceLimits and HardResourceLimits
  • Nice

More?

There’s some more information at developer.apple.com, and the launchd and launchd.plist man pages are worth reading.

Let me know if you find any of this useful… I’m @pda on Twitter.

You can leave comments on Hacker News if that’s more your thing.

Simple Dependency Injection and MiniTest::Mock

I recently wrote a Ruby client for Amazon Alexa’s APIs, and thought I’d pull out an example of nice, simple dependency injection to facilitate unit testing. Nothing revolutionary or complicated, just good practice.

The example is based around a UriSigner class, normally used by the calling code like this:

UriSigner.new(*credentials).sign_uri(uri)

The calling code doesn’t know or care that UriSigner depends on Base64, OpenSSL::Digest::SHA256 and OpenSSL::HMAC. The unit test for UriSigner, however, cares for two reasons.

Firstly, they’re external dependencies, and only need to be tested for correct usage, not correct implementation.

Secondly, these dependencies represent an encoder and a cryptographic hash function; they’re deterministic, but they return very opaque data which can make tests and their failure messages difficult to understand.

So instead of testing against magical (computed in advance) Base64 strings and HMAC hashes, I’ve used simple attr_writer dependency injectors:

The unit test can then inject MiniTest::Mock instances in place of the real Base64 and HMAC implementations, setting expected method calls and their return values:

As simple as that.

The same approach is used by the HTTP Client class to stub out actual HTTP calls via Net::HTTP. There’s great libraries like VCR, WebMock and FakeWeb for handling this, but sometimes it’s easier to keep it lo-fi:

This kind of dependency injection is one of many basic techniques which aren’t fancy enough to get a lot of press, but go a long way to keeping your objects and tests in order.

Got any thoughts? Hit me up, I’m @pda on Twitter, where I generally write about this kind of thing.

Fast RSpec/Rails: Tiered spec_helper.rb

Slow Rails startup time is the TDD killer.

paul@paulbookpro ~/project ⸩ time rspec spec/lib/method_hunting_delegator_spec.rb
..
Finished in 0.00078 seconds
2 examples, 0 failures
rspec spec/lib/method_hunting_delegator_spec.rb -f d  6.76s user 1.64s system 91% cpu 9.225 total

Holy crap, that’s 9 seconds of Rails startup, for 0.00078 seconds worth of RSpec. And this class/test doesn’t even use Rails! We can do better.

The culprit? That require "spec_helper" at the top of every spec file which loads the entire of Rails:

# This file is copied to spec/ when you run 'rails generate rspec:install'
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'rspec/autorun'
# etc ...

There’s a few ways to deal with this, each with their own pitfalls. After trying many approaches, I’ve settled on a tiered RSpec initializer (spec_helper.rb and friends) which I can choose when invoking RSpec.

All the spec files still require "spec_helper", but it looks more like this:

Which means we can select different initializers using the SPEC environment variable. The following spec_helper_unit.rb is perfect for the method_hunting_delegator_spec which took 9 seconds earlier, because there’s no dependencies on Rails.

The result?

paul@paulbookpro ~/project ⸩ time SPEC=unit rspec spec/lib/method_hunting_delegator_spec.rb
..
Finished in 0.00079 seconds
2 examples, 0 failures
SPEC=unit rspec spec/lib/method_hunting_delegator_spec.rb  0.81s user 0.08s system 99% cpu 0.890 total

Under a second (0.890) is much more like it, and we still get class autoloading provided by ActiveSupport. I use this mode for just about everything except subclasses of Rails components, and those I keep a slim as possible. Moving logic into SOLID classes is something you’ll benefit from anyway, and these faster tests provide extra incentive. This example was a spec for a standalone class living in RAILS_ROOT/lib/ but I use it for all sorts of classes under app/models/, app/presenters/, app/forms/ etc.

But this zero-Rails initializer doesn’t help with testing your ORM-subclasses (we’ll begrudgingly call them “models”) which depend on ActiveRecord:

paul@paulbookpro ~/project ⸩ SPEC=unit rspec spec/models/book_spec.rb
/Users/paul/project/app/models/book.rb:4:in `<top (required)>': uninitialized constant ActiveRecord (NameError)

And having tasted sub-second tests, 12 seconds is clearly unacceptable:

paul@paulbookpro ~/project ⸩ time rspec spec/models/book_spec.rb
.................
Finished in 0.67016 seconds
17 examples, 0 failures
rspec spec/models/book_spec.rb  8.08s user 1.85s system 78% cpu 12.698 total

But if you write your classes carefully, they don’t need to depend on much from Rails except ActiveRecord. So let’s write a spec_helper which loads & configures ActiveRecord, plus a few other bits and pieces useful for testing database-persisted models.

You’ll have to excuse the Devise hackery; it was the one component tightly coupled into a model (User), and like most Rails app, that particular model is at the center of the whole relationship graph. Perhaps there’s a better solution, but this got me fast model tests for all but the user_spec itself.

Lets run that model spec again, this time boosted by SPEC=model:

paul@paulbookpro ~/project ⸩ time SPEC=model rspec spec/models/book_spec.rb
.................
Finished in 0.58512 seconds
17 examples, 0 failures
SPEC=model rspec spec/models/book_spec.rb  6.50s user 0.21s system 98% cpu 6.844 total

Note that your model classes can still depend on external gems, but they’ll need to e.g. require "money" at the top. I suspect this explicit declaration of dependency isn’t a bad idea anyway.

Of course, there’s always going to be specs which depend on the whole stack, such as acceptance tests. For those, here’s the default spec_helper_full.rb; basically like the original spec_helper.rb:

God speed.

MethodHuntingDelegator

I was implementing search in a Ruby app where the results objects were instances of a mix of model classes. Each one had what could be considered a title and a description, but the method names were inconsistent.

Wrapping each result in a SearchResult decorator to normalize the interface seemed like a good idea. Ruby provides an abstract Delegator and a concrete SimpleDelegator which gets most of the way there.

To normalize the method interface, I wrote an extension of SimpleDelegator with a #hunt_and_call(*candidates) method, which finds and calls the first method of the candidate list which the delegate responds to.

Here’s the example calling code:

And the MethodHuntingDelegator implementation:

And of course:

Completely insane? Useful enough for a gem? Better way of going about it? Give me hell in the gist comments.