This Saturday, I attended the LL2 conference at MIT. LL2 is dedicated to "lightweight" programming languages, a delibrately loose category including (1) any pleasant, easy-to-use scripting language and (2) any academic language which makes it easier to prototype and write software quickly. LL2 is a small, informal workshop with audience participation. The attendees are a diverse bunch, and enjoy goring each other's sacred cows. You have been warned.

A short summary of each talk follows. If I've misunderstood the point of your argument--or misspelled your name--please let me know.

Concurrency-Oriented Programming in Erlang

Joe Armstrong presented Erlang, a "concurrency-oriented programming language" used by Ericson and Nortel to create some highly successful (and incredibly reliable) telecom equipment. Armstrong defined concurrency-oriented programming to include the following tenents:

  • Process are totally independent, and share no data.
  • All communication between processes involves sending and receiving messages.
  • If you know the name of a process, you can send it a message.
  • You can't guess the name of a process; you need to be told.
  • Message passing is unreliable because networks are unreliable.
  • You can monitor the status of another process to see if it dies. This is the distributed equivalent of exception-handling.
  • Processes are your fundamental unit of abstraction--an Erlang programmer uses processes to model enitities in the world in much the same way an object-oriented programmer uses objects.

Erlang's runtime supports around 30,000 threads with message-passing times of a few microseconds. Erlang programs are often distributed across many machines in a wide area network. The most reliable Erlang-based systems process millions of telephone calls a day with 99.9999999% uptime (that's right--nine 9's reliability, or a few milliseconds a year to perform software upgrades).

Erlang is basically a dynamically-typed functional programming language with single-assignment and a number of ML features. Erlang also appears to have some mechanism for transmitting code between machines. However, it looks as if the concurrency features of Erlang could be incorporated into more traditional languages without much work--certainly with less work than is required to support multithreading with shared data.

Lightweight Languages as Lightweight Operating Systems

Matthew Flatt of the PLT project argued that operating systems and programming languages serve similar purposes (they're both interfaces to the machine), but with an important difference: Operating system designers have have typically focused on isolation, whereas language designers have often been more interested in co-operation. He argued that programming languages should provide better support for isolation, but in a flexible way.

Matthew described the techniques used by the DrScheme IDE to isolate student-written code from the IDE itself. His recipie for flexible isolation included the following ingredients:

  • A "safe" programming language (i.e., one which can't scribble on random memory).
  • Control of the APIs available to untrusted code.
  • Multiple threads with thread-local variables.
  • Multiple "event spaces", each of which receives a separate stream of events from the underlying OS.
  • A hierachy of "custodians". Custodians know how to kill threads, clean up OS resources, and ask nested custodians to do the same. Custodians are generally used to shut down runaway user code.
  • A hierachy of "inspectors", providing precise control of which code can introspect which data structures.

As always, Matthew's demos were impressive and flawless. His presentation software is written in Scheme, and actually demonstrates all the features he describes.

(I use the PLT Scheme tools at work, and will probably find ways to to use of some of these isolation features.)

Supporting Persistent Objects in Python

Jeremy Hylton spoke about the Zope Object Database (ZODB), and how it acheived (semi-)transparent persistence for Python objects. This material is well-covered elsewhere on the web, but I'll mention a few highlights:

  • ZODB is used by the Zope application server, a Python system for building elaborate web applications.
  • ZODB is a persistent object database with full support for distributed applications and atomic transactions.
  • Python objects can either be serialized automatically, or they may inherit from the Persistent class, and implement more elaborate storage strategies and conflict-resolution policies.
  • Objects are initially loaded as "ghost" objects. When the programmer tries to access the object, ZODB loads the rest of the object as a regular object, and all the objects it points to as "ghosts". When memory runs low, an object may be ghostified and garbage collected.

This strategy works well for Zope, because each HTTP request can be treated as a separate transaction. If an error occurs while rendering the web page, the transaction will fail, and all objects will automatically revert to their old states. If the transaction succeeds, the changes will be committed to the database, and will become visible to the other web servers. In other words, the "request-response" pattern of web software works well with transactional databases (more on this in a later talk).

Safe, Asynchronous Exceptions for Python

Stephan Freund and Mark Mitchell descibed a problem with Python's asynchronous exception-handling, and presented an elegant solution. An "asynchronous exception" is typically generated when another process signals a Python program, or when a user hits Control-C. As the name implies, an asynchronous exception can occur almost anywhere, which makes it very hard to write robust code.

For example, this code from their slides is safe in the absence of asynchronous exceptions:

f = open("file.txt")
try:
  data = f.read()
finally:
  f.close()

But what if the user hits Control-C at an inconvenient time?

f = open("file.txt")
# Not a good place for Control-C.
try:
  # We'd like to support Control-C here.
  data = f.read()
finally:
  # Another awkward place for Control-C.
  f.close()

Freund and Mitchell experimented with various ways of temporarily blocking signals and other asynchronous events, and concluded that new syntax would help considerably:

block:
  f = open("file.txt")
  try:
    unblock:
      data = f.read()
  finally:
    f.close()

They proposed several alternative approaches, including:

initially:
  # Signals blocked here...
  f = open("file.txt")
try:
  data = f.read()
finally:
  # ...and here.
  f.close()

At this point, several audiences members stood up and pointed out that LISP-style macros make it much easier to build new features such as block and unblock without having to actually hack your Python interpreter. (Audience participation at the LL workshops tends to be intense, with lots of passionate discussion and the occasional flame war. This is a Good Thing<tm>.)

Interlude 1: Programmatic Macros

C programmers are familiar with preprocessor macros, and many modern C projects use them to good effect. Preprocess macros are typically used to encapsulate hairy definitions or tricky control flow, neither of which can be isolated inside a function. Here's an example using the wxWindows toolkit, where a series of macros are used to build an event-handling table for a class named Stage:

// Stage inherits from wxWindow.
BEGIN_EVENT_TABLE(Stage, wxWindow)
    // Map each event to the member function which handles it.
    EVT_IDLE(Stage::OnIdle)
    EVT_MOTION(Stage::OnMouseMove)
    EVT_ERASE_BACKGROUND(Stage::OnEraseBackground)
    EVT_PAINT(Stage::OnPaint)
    EVT_LEFT_DOWN(Stage::OnLeftDown)

    // Install a handler for a child widget with ID
    // FIVEL_TEXT_ENTRY, so we don't have to subclass it.
    EVT_TEXT_ENTER(FIVEL_TEXT_ENTRY, Stage::OnTextEnter)
END_EVENT_TABLE()

This code sets up a fairly hairy event-dispatching table and registers it appropriately, but I don't need to know the details--wxWindows uses BEGIN_EVENT_TABLE to provide a specialized language for declaring event handlers. In essence, macros allow library developers to temporarily turn a general-purpose language into a domain-specific language.

Once you start looking for macros, you see them everywhere. Lex and Yacc are basically giant, programmatic macros that transform lexer and parser descriptions into C code. Autoconf is a mess of M4 macros that translate a configuration language into highly-portable sh scripts. Boost::Python uses C++ template hacks to transform a declaration language into Python bindings. ZODB transforms Python member-variable lookup into database queries by overloading __getattr__ and friends. JSP transforms HTML plus embedded code snippets into Java servlets. It seems that systems hackers have a ubiquitous and unquenchable desire to write programs which write programs.

Unfortunately, many of these invaluable tools are massive hacks. They involve running an external translator over source files, giving the programmer two separate parsers with subtly different semantics, and all the problems that entails. If we really believed that programmers should write programs to write programs, then we could invent an extensible compiler. This compiler could pass source code to special subroutines, and those subroutines could transform the code as they pleased before handing it back to the compiler. This is the idea behind LISP macros.

Here's an example of block in Scheme:

(define-syntax block
  (syntax-rules ()
    [;; The pattern to transform.
     (block body ...) 
     ;; The ugly code to transform it into.
     ;; (dynamic-wind is basically try/finally,
     ;; but with an "initially" clause, and
     ;; lambda creates an anonymous function).
     (dynamic-wind
       (lambda () (block-interrupts))
       (lambda () body ...)
       (lambda () (unblock-interrupts)))]))

(block
  (do-something)
  (do-something-else))

The arguments for macros are simple:

  1. Our group's code would be much cleaner with a few good domain-specific languages.
  2. We can learn to use power without abusing it.

The arguments against macros are equally simple:

  1. My co-workers are morons, and should never be allowed to add new control structures to a programming language under any circumstances.
  2. Macros are surprisingly hard to implement well in languages using traditional infix syntax.

I have a fair bit of sympathy for both sides.

Disruptive Programming Language Technologies

Todd Proebsting from Microsoft Research gave a talk on "disruptive programming language technologies", inspired by the book Innovator's Dilemma. He argued that most compiler implementors are obsessed with making their compilers generate 10% faster code (which Moore's law will do in a few months), and relatively few are interested in making their languages more useful in other areas. Following the book, he defined a disruptive language as any language which:

  • had lower performance than the existing alternatives, but
  • offered signicantly better more useful features in a low-profile niche market.

He suggested a variety of niche markets:

  • Compilers which log all function calls and results into a buffer (expiring data over time), with the goal of making user's core files far more useful to developers.
  • Languages with built-in support for transactions, which would make Undo much easier to implement. (Actually, ZODB and Zope are a great example of this.)
  • Languages with excellent support for concurrency, such as Erlang.

By selling a language to a poorly-served niche market, an organization could gain market share (and money), which could later be used to enter larger markets.

Interlude 2: Python and Perl are Based on Real Research

Todd Proebsting took a couple of potshots at Perl and Python, suggesting they were based on decades-old research. Some members of the Scheme community made similar remarks at LL1 (but mostly refrained from doing so at LL2). In general, this is a pretty typical attitude among academic language designers (at least those who haven't worked extensively with Perl and Python). In fact, both languages are related to some serious research.

Perl was designed by Larry Wall, who is a linguist (and missionary!) by training. Wall supports tagmemics, a linguistic theory which claims that human languages are irregular and non-orthognal because the human brain is good at extracting plausible meanings from grammatically odd phrases, and doesn't need a rigidly orthogonal language. Perl is similar to a human language in this respect--very few people understand the grammar, but millions of people can program in it, because a significant portion of the language consists of special cases for the things people frequently want to say.

Python was designed by Guido van Rossum, whose research goal is to make programming languages generally accessible (much the same goal as the Teach Scheme project, actually). He's done an excellent job--I can teach basic Python to non-programmers in hours, and they can be writing useful hacks almost immediately. (Guido's language isn't just for beginners, however: many talented LISP hackers have a soft spot for Python, despite the lack of closures and macros. May we all design languages with such a broad range of appeal.)

The Needle Programming Language

Neel Krishnaswami gave a short talk on his soon-to-be-released Needle language. Like Dylan, Needle is an object-oriented, infix language with roots in LISP. Unlike Dylan, Needle is a statically-typed programming language with parameterized types (a.k.a. templates).

Neel may have figured out some clever ways to make a Dylan-like language statically typed without littering it with type declarations. If you're into languages which are extremely efficient, but which are still well-suited to rapid prototyping, this may be work to watch. We'll see how well his type inferencer holds up once the LL2 attendees start banging on it.

(I have to like any talk which (1) mentions Dylan and (2) uses the Rock-Paper-Scissors example from my thesis to explain generic function dispatch.)

Leveraging Libraries in Lightweight Languages

Kenneth Anderson, Timothy Hickey, Geoffrey Knauth and Gary Kratkiewicz described a real-world application of multiple lightweight languages. They've been working on a logistics application for the DOD which has saved $100 million to date. Their application consists of many modules, written in C, Java, Perl, etc., and tied together with Scheme and some Scheme macros.

Some of their Scheme code looked decidely odd to me--and their Scheme macros reminded me of the kind of C++ code people were writing in 1994--but their system is a smashing, real-world success. I drew the following conclusions from this talk: (1) lightweight languages are a huge win in general, (2) Scheme is a great glue language, (3) Scheme macros, even if applied in very funky ways, are an enormous win, and (4) we need more books on advanced LISP/Scheme style.

The Ruby Programming Language

Yukihiro Matsumoto gave an excellent introduction to the Ruby programming language. Ruby is something like a cross between Perl and Python: it has a clean, orthogonal syntax with special features for text processing. It has primitive iterators (not as powerful as Python's) and full continuations, but no macros.

Yukihiro argued that a lightweight language should require minimal brain power from programmers. He measured brain power in two ways: average brain power over time, and maximum brain power at any one time. He claimed that LISP did an especially good job at reducing average brainpower--it allowed programmers to build very complicated systems with surpising little total effort--but that it occasionally required the brief expenditure of great intelligence.

One of the PLT Scheme researchers asked why Yukihiro included continuations (one of the highest-brain-power features you can put in a language), but not macros, which are easier to understand and much more useful. Yukihiro said that the people who'd make am awful mess with macros wouldn't even dare to touch continuations, proving great laughter from the PLT folks.

Yukihiro is a sharp and funny guy--he can provoke great laughter from a tough audience while arguing (in a foreign language) against the locally-prevailing convential wisdom.

IBM Lightweight Services

Christopher Vincent presented a server-side JavaScript environment used to glue together various transactional legacy systems and a variety of web services. Again, he found that it's relatively feasible to add semi-transparent persistence and transactions to event-based applications.

Why Extension Programmers Should Stop Worrying About Parsing and Start Thinking About Type Systems

David Beazley is the primary author of SWIG, a tool which reads C and C++ headers and generates extension modules for Perl, Python and other scripting languages. His talk included both a presentation of his work so far, and a plea for help from the language design community.

Beazley said that the authors of extenion-module-generating tools were typically application developers (not language designers or compiler hackers), and that they tended to focus too much on the actual parsing of headers. He felt this focus was dangerously misleading and would suck up unimaginable amounts of developer time.

His recent work on SWIG, however, has de-emphasized parsing and focused on translating between the type systems of C++ and common scripting languages. For example, there are now tools which map C++ smart pointers onto regular Python objects, and which allow C++ classes to subclassed by scripting languages. Beazley feels that this an extremely beneficial direction to pursue.

Many developers would love to migrate from C and C++ to a high-level language, but to make the transition, they'll need excellent tools for accessing their old code from their new language (throwing out working code is often an expensive mistake). Mapping between different type systems is an intrisically interesting--if extremely ugly--problem, and lots of application developers are interested in seeing it solved.

If you have any good ideas (or useful war stories), please get in touch with Beazley. And if you want people to adopt your sexy new language, start thinking about how to interface with legacy code.

The Laszlo Application Description Language, LZX

Oliver Steele gave a short talk on LZX, a soon-to-be-released tool for generating interactive web content. LZX is an XML-and-JavaScript-based language for building three-tier applications. The LZX environment actually generates SWF files for use with the Flash plugin, a welcome change from similar tools, which typically require a custom plugin. LZX is likely to be an enterprise development product with a price tag to match. (But it's also further incentive for someone to write an open-source implementation of the Flash plugin, which actually has a documented file format and a few nice features.)

Summary

LL2 has been another smashing success. Greg Sullivan's timer (with the air raid siren) kept speakers to their alloted times this year, and the Perl and Scheme communities didn't spend quite so much time taking potshots at each other. Audience participation continued to be robust, and the debates extremely vigorous--"we can have five more minutes of macro discussion after the next speaker"--despite this year's adequate seating and ventilation.

I had interesting talks with Dan Sugaliski, the PLT crew, the author of Rotor (Microsoft's FreeBSD implementation of the CLR) and one of the Mono hackers from Ximian. I didn't get chance to talk with Joe Armstrong about Erlang (does it really transmit code between nodes, and how do you spawn 80,000 threads?) or with David Beazley about SWIG.

There's a movement afoot to resurrect the ACM's SIGPLAN in the Boston area for a quarterly lecture series, and I've promised to visit lots of people down in Boston.

I'm going to have to learn Erlang, enhance my project at work to use PLT Scheme custodians, download Needle, think about how tools like SWIG should work, and find time to do some research into macros for infix languages.

Many thanks to Greg for chairing such an excellent workshop.