A Beginner's Guide to Practical Syntactic Magic: the tale of Hpricot's sudo-constructor

12 June, 2007

I spent much of today working with Hpricot. And so, as when spending significant solo time with any of why the lucky stiff's code, I found myself admiring all the neat little syntactic nicknacks strewn about to cozy up the place.

One of the best is the way you get started. Hpricot is a toolkit for parsing and manipulating XHTML. So, obviously enough, just about every time you invoke it, you're going to want to pass it an XHTML document so it can, you know, prep it for parsing and manipulation. And how do you do that? What's the syntax?

Hpricot(my_document)

That's it. There's no "Hpricot::Base.new(my_document).parse" nonsense, or any of the other more or less torturous common options. Not a single character of syntax is wasted.

But, if you're a mere Ruby mortal, like me, you're probably looking at that code and going: 'Huh?' Isn't Hpricot a constant? It's capitalized. But it's taking an argument like a method. How is that even valid Ruby? How can the parser tell if it's a constant or a method?

Well, it turns out that there's no rule against having capitalized method names; the parser can tell it's a method because it's got an argument. And that's all that's required for it to be sent off to method- instead of constant-dispatch (as Chris pointed out, this is one advantage of not having Ruby be "turtles all the way down"; Smalltalk couldn't do this).

Beyond providing fodder for a Language Nerd Attack, though, what's the upshot? How's this fact help the man on the street? Well: there's nothing actually sophisticated going on here. So: you can do it too.

Here's an admittedly contrived (and useless) example:

class Dogger
def initialize
puts "dog"
end
end
def Dogger()
Dogger.new
end

a simple class definition followed by a simple method invoking it.

Which leaves us with the ability to write two snippets of code that, while they may look nearly the same, do very different things:

>> Dogger
=> Dogger
>> Dogger()
dog
=> #<Dogger:0x15d2478>

and that is exactly from where _why's use of this little quirk derives its leverage. This trick makes you feel like you're invoking a constructor or calling some other kind of class method when you are, in fact, doing nothing of the sort. Just as our Dogger() method above needn't have done anything remotely related to the Dogger class, _why could have named his method Clown() or ChunkyBacon() while still calling Hpricot.parse(input, opts) inside it (which is exactly what Hpricot() does).

But his chosen usage is particularly inspired. In one fell swoop, he gives his whole complex feature-ful library a single welcoming point of entry. You need never concern yourself with the internal machinery; just heave a document over the transom and let the library figure out what to do with it. And this is the wider lesson of _why: real power comes from combining the playfulness (better: the insouciance) needed to probe, question, and even bend the limits of the language with the discipline and aesthetic sense required to use what you find not to obfuscate and confuse, but to write elegant and, above all, more humane code.

I mean, Hpricot would definitely not be a better library if that method was called ChunkyBacon(). Right?

Tagged: , , , , ,