-
Jul 19, 2015
Proving sorted lists correct using the Coq proof assistant
About 15 years ago, I was hanging out at the MIT AI Lab, and there was an ongoing seminar on the Coq proof assistant. The idea was that you wouldn't have to guess whether your programs were correct; you could prove that they worked correctly.
The were just two little problems:
- It looked ridiculously intimidating.
- Rumor said that it took a grad student all summer to implement and prove the greatest common divisor algorithm, which sounded rather impractical.
So I decided to stick to Lispy languages, which is what I was officially supposed to be hacking on, anyway, and I never did try to sit in on the seminar.
Taking another look
I should have taken a look much sooner. This stuff provides even more twisted fun than Haskell! Also, projects like the CompCert C compiler are impressive: Imagine a C compiler where every optimization has been proven correct.
Even better, we can write code in Coq, prove it correct, then export it to Haskell or several other functional languages.
Here's an example Coq proof. Let's start with a basic theorem that says "If we know
Ais true, and we knowBis true, then we knowA /\ B(bothAandB) is true."Theorem basic_conj : forall (A B : Prop), A -> B -> A /\ B. Proof. (* Give names to our inputs. *) intros A B H_A_True H_B_True. (* Specify that we want to prove each half of /\ separately. *) split. - apply H_A_True. (* Prove the left half. *) - apply H_B_True. (* Prove the right half. *) Qed.But Coq proofs are intended to be read interactively, using a tool like CoqIDE or Emacs Proof General. Let me walk you through how this proof would really look.
Proof.At this point, the right-hand pane will show the theorem that we're trying to prove:
1 subgoals, subgoal 1 (ID 1) ============================ forall A B : Prop, A -> B -> A /\ B -
May 17, 2015
Unscientific column store benchmarking in Rust
I've been fooling around with some natural language data from OPUS, the “open parallel corpus.” This contains many gigabytes of movie subtitles, UN documents and other text, much of it tagged by part-of-speech and aligned across multiple languages. In total, there's over 50 GB of data, compressed.
“50 GB, compressed” is an awkward quantity of data:
- It's large enough so that Pandas can't suck it all into memory.
- It's large enough that PostgreSQL stops being fun, and starts feeling like work. (Although cstore_fdw might help.)
- It's too small to justify cloud-based tools like Hadoop. As the saying goes, “If it fits on your laptop’s SSD, it’s not big data.” I have USB sticks large enough to hold 50 GB!
Let's look at various ways to tackle this.
-
Apr 05, 2015
Migrating from Heroku (and Linode) to Docker on AWS
I've long been a huge fan of Heroku. They've made it super easy to deploy and scale web applications without getting bogged down in server administration. Also, their free tier has been very generous, which made Heroku a perfect place to run weekend projects. (And my clients have happily paid plenty of money to Heroku over the years, so nobody's been losing out.)
Heroku's costs and limitations
Lately, the costs of using Heroku for weekend projects have been creeping upwards:
-
Sep 19, 2014
Rust lifetimes: Getting away with things that would be reckless in C++
Over the years, I've learned to be cautious with C++ pointers. In particular, I'm always very careful about who owns a given pointer, and who's in charge of calling
deleteon it. But my caution often forces me to write deliberately inefficient functions. For example:vector<string> tokenize_string(const string &text);Here, we have a large string
text, and we want to split it into a vector of tokens. This function is nice and safe, but it allocates onestringfor every token in the input. Now, if we were feeling reckless, we could avoid these allocations by returning a vector of pointers intotext:vector<pair<const char *,const char *>> tokenize_string2(const string &text);In this version, each token is represented by two pointers into
text: One pointing to the first character, and one pointing just beyond the last character.1 But this can go horribly wrong:// Disaster strikes! auto v = tokenize_string2(get_input_string()); munge(v);Why does this fail? The function
get_input_stringreturns a temporary string, andtokenize_string2builds an array of pointers into that string. Unfortunately, the temporary string only lives until the end of the current expression, and then the underlying memory is released. And so all our pointers invnow point into oblivion—and our program just wound up getting featured in a CERT advisory. So personally, I'm going to prefer the inefficienttokenize_stringfunction almost every time.Rust lifetimes to the rescue!
Going back to our original design, let's declare a type
Token. Each token is either aWordor anOther, and each token contains pointers into a pre-existing string. In Rust, we can declare this as follows:#[deriving(Show, PartialEq)] enum Token<'a> { Word(&'a str), Other(&'a str) } -
Sep 17, 2014
Deploying Rust applications to Heroku, with example code for Iron
Now with support for Iron, Cargo and
Cargo.lock!You can deploy an example Rust application to Heroku using this button:
If you'd prefer to use the command line, you'll need
gitand the Heroku toolbelt. Once these are installed, run:git clone https://github.com/emk/heroku-rust-cargo-hello.git cd heroku-rust-cargo-hello heroku create --buildpack https://github.com/emk/heroku-buildpack-rust.git git push heroku masterThis will download the example application, create a new Heroku project, and deploy the code to Heroku. That's it!
How it works
Our server is based on the Iron middleware library. We parse URL parameters and dispatch HTTP requests to the appropriate handler routine using Iron's router module:
fn main() { let mut router = Router::new(); router.get("/", hello); router.get("/:name", hello_name); Iron::new(router).listen(Ipv4Addr(0, 0, 0, 0), get_server_port()); }The
hellofunction returns an HTTP status code and the content to send to the user:fn hello(_: &mut Request) -> IronResult<Response> { Ok(Response::with(status::Ok, "Hello world!")) }The
hello_namefunction is similar, but we look up the value of the:nameparameter that we declared to the router, above.fn hello_name(req: &mut Request) -> IronResult<Response> { let params = req.extensions.find::<Router,Params>().unwrap(); let name = params.find("name").unwrap(); Ok(Response::with(status::Ok, format!("Hello, {}!", name))) }The final piece needed to run on Heroku is a function to look up up our port number:
fn get_server_port() -> Port { getenv("PORT") .and_then(|s| from_str::<Port>(s.as_slice())) .unwrap_or(8080) }The full source code is available on GitHub. To learn more about Rust, see the excellent Rust guide.
Keep reading for notes on building the program locally and on configuring your own programs to run on Heroku.
-
May 30, 2014
Deploying Rust applications to Heroku, with example code for Rustful
Update, 24 June 2014: I'm planning to add support for building projects with Cargo shortly. I'll announce it here and on Twitter when it's ready.
Update, 12 July 2014: Highly experimental Cargo support is now available.
Update, 17 September 2014: This post is obsolete. Please see Deploying Rust to Heroku, with example code for Iron.
If you have
gitand the Heroku toolbelt on your system, this should be easy:git clone https://github.com/emk/heroku-rust-hello.git cd heroku-rust-hello heroku create --buildpack https://github.com/emk/heroku-buildpack-rust.git git push heroku masterIf you want to create your own Rust web server from scratch, keep reading. But fair warning: Rust is not yet ready for serious web development.
-
May 30, 2014
Installing Rust nightly builds on Ubuntu 10.04 Lucid
Rust is a systems programming language designed around speed and safety. It sits roughly halfway between Go and Haskell. In particular, it combines precise, safe control over memory with high-level functional programming. Haskell programmers, for example, will notice that Rust's
and_thenworks much likebindin Haskell'sMaybemonad:use std::os::getenv; use std::io::net::ip::Port; /// Look up our server port number in PORT. fn get_server_port() -> Port { getenv("PORT") .and_then(|s| from_str::<Port>(s.as_slice())) .unwrap_or(8080) }Anyway, I spent this morning trying to get Rust working on Ubuntu 10.04 Lucid, as part of a larger effort to deploy a Rust application on Heroku. (More on that soon.) On Ubuntu 10.04,
rustcfails looking forlibstdc++.so.6: -
May 28, 2014
My personal tool choices for rich JavaScript applications (May 2014)
Today, it's possible to build rich, sophisticated applications in the browser. Everybody's familiar with GMail and Google Maps, of course, but have you seen stuff like Mozilla's PopcornMaker?
This just blows me away. And applications like this are appearing all over, and they're well within the reach of any tech startup.
Of course, the major problem with building sophisticated applications is that now you need to maintain them. And just hacking everything together with jQuery is probably going to make a mess.
I've built a few rich applications during the last twelve months, both for clients and for myself. Based on that experience, here's a list of tools that have worked well, and tools that look promising for the coming year. This list is shamelessly subjective, and probably already obsolete. Please feel free to contact me with better suggestions; I'll add a bunch of them to the article.
Personal choices: Table of contents
- Model View Controller: Ember.js (or Angular.js)
- Asynchronous callback handling: Promises/A+
- Module system: ECMAScript 6 modules and a transpiler
- Package management: Bower.js
- Unit testing: QUnit if I must, Jasmine or Mocha if I can
- DOM manipulation and compatibility: jQuery
- Syntax: CoffeeScript (OK, OK) or a couple of other alternatives
- The bad news: These recommendations will be obsolete quickly
The standard disclaimer applies: In 95% of all cases, it's better to stick with what you're already using, unless something else will offer you an order of magnitude better productivity.
Keep reading for code snippets, rationales and some promising alternatives.
-
May 22, 2014
Learning Middle Egyptian with Anki, slowly
Although I don't usually mention it here, one of my hobbies is learning languages. French is my strongest by far, but I've been experimenting with seeing just how slowly I can learn Middle Egyptian. Normally, I need to reach a certain minimum degree of obsession to actually make progress, but it turns out that software can help a bit, as I explain in this post on the Beeminder blog.
But when I decided to learn Egyptian, I was faced with a dilemma: I couldn't justify spending more than an hour per week on it. Hierogylphs are cool, but come on—it's a dead language. Unfortunately, it's hard to learn a language in slow motion, because two things always go wrong:
- I get distracted, and I never actually put in that hour per week…
- I forget everything I learn between lessons…
Of course, one key tool here is Anki, which clever exploits the spacing effect of human memory. To oversimplify, if I'm forced to recall something shortly before I would have otherwise forgotten it, I'll remember it at least twice as long the next time. This allows remembering things for O(2^N) time for N effort, which is a nice trick.
Hierogloss
On a related note, I have a new toy up on GitHub: hierogloss, which extends Markdown with support for interlinear glosses rendered using JSesh:
H: z:A1*Z1 | 𓊃:𓏏*𓁐 | 𓂋:𓏤-𓊪𓅱 L: s | s.t | r-pw G: man | woman | whichever T: either [a] man or [a] woman
I love being able to toss together a tool like this on a whim. Programming's fun.
-
May 21, 2014
Scraping your Fitocracy score with capybara-webkit
Fitocracy is a great site for tracking exercise, one which manages to have both a very friendly culture and an impressively gung-ho attitude. But they've never gotten around to implementing any kind of official API. If you want to look up your Fitocracy score from inside a script, you need to jump through a surprising number of hoops.
What we need is a generic web scraping tool like mechanize, but with the abilitity to deal with a rich JavaScript UI. It turns out the easiest way to do this is to use a headless web browser.
First, let's create a Ruby
Gemfile. We'll usecapybara-webkit, which is normally used for testing Ruby websites:source "https://rubygems.org" gem "capybara" gem "capybara-webkit" # Optional, if you want to debug using save_and_open_page. #gem 'launchy' - May 21, 2014 Site update in progress
- May 21, 2014 "Build Your Own Probability Monads" paper back online
- Jan 18, 2012 Best article I've seen on SOPA
- Jun 05, 2011 Screencast: Use Rails and RDF.rb to parse Best Buy product reviews
- Jun 03, 2011 Heroku "Celadon Cedar" review
- May 20, 2011 Derivatives of algebraic data structures: An excellent tutorial
- May 12, 2011 What do these fixed points have in common?
- Apr 25, 2011 AWS outage timeline & downtimes by recovery strategy
- Dec 20, 2010 The state of Ruby, RDF and Rails 3
- Oct 13, 2010 Feedhose demo: Real-time RSS using Node.js and Socket.io
- Dec 29, 2009 Visualizing WordNet relationships as graphs
- Dec 28, 2009 Experimenting with NLTK
- Dec 28, 2009 Interesting Python libraries for natural language processing
- Nov 21, 2009 Wave Hackathon
- Sep 12, 2009 Upgrading randomhacks.net
- Sep 01, 2009 Real-time text annotation with Google Wave
- May 08, 2009 Write a 32-line chat client using Ruby, AMQP & EventMachine (and a GUI using Shoes)
- May 05, 2009 Financial crisis background and Munger on the banks
- Apr 30, 2009 Designing programs with RSpec and Cucumber (plus a book recomendation)
- Apr 30, 2009 Remote root holes reported as "denial of service"
- Jan 11, 2009 Installing TortoiseGit
- Sep 01, 2008 Ubiquitous Hoogle
- Oct 02, 2007 Probability monads at Hac 07 II
- Sep 18, 2007 Freiburg in October: Scheme, Dylan, and probability monads
- Sep 18, 2007 September 8th, 2007
- Jul 01, 2007 Ruby-style metaprogramming in JavaScript (plus a port of RSpec)
- Apr 28, 2007 Bowling in Haskell: A response to Ron Jeffries
- Apr 19, 2007 Robot localization using a particle system monad
- Mar 15, 2007 How to make Data.Set a monad
- Mar 12, 2007 Monads in 15 minutes: Backtracking and Maybe
- Mar 10, 2007 8 ways to report errors in Haskell
- Mar 07, 2007 Jim Hefferon's Linear Algebra: A free textbook with fascinating applications
- Mar 05, 2007 Three things I don't understand about monads
- Mar 03, 2007 Smart classification using Bayesian monads in Haskell
- Feb 22, 2007 Bayes' rule in Haskell, or why drug tests don't work
- Feb 21, 2007 Refactoring probability distributions, part 2: Random sampling
- Feb 21, 2007 Refactoring probability distributions, part 1: PerhapsT
- Feb 10, 2007 Probabilistic Functional Programming is cool
- Feb 10, 2007 Map fusion: Making Haskell 225% faster
- Feb 09, 2007 The first Carnival of Mathematics
- Feb 08, 2007 Haskell: Queues without pointers
- Feb 07, 2007 Do early adopters use IE?
- Feb 02, 2007 Haskell: What happens when you divide infinity by 2?
- Feb 01, 2007 Some useful closures, in Ruby
- Jan 22, 2007 High-Performance Haskell
- Jan 20, 2007 13 Ways of Looking at a Ruby Symbol
- Feb 15, 2006 Selenium on Rails, Reloaded: Client-Side Tests in Ruby
- Dec 03, 2005 Why Ruby is an acceptable LISP
- Nov 15, 2005 Moving a blog to Typo
- Nov 13, 2005 Typo sidebars: Recent Comments and Tagged Articles
- Oct 11, 2005 Random Hacks is back online
- Oct 11, 2005 McCarthy's Ambiguous Operator
- Jul 07, 2003 Preparing for the Winter Garden
- Jun 30, 2003 Tomato Update: Weeding and Irrigation
- Jun 30, 2003 Mason Update: The Weaver Has Woven
- Jun 27, 2003 Comments on "Putting open source on trial"
- Jun 27, 2003 Responses to "The Missing Future"
- Jun 22, 2003 The Missing Future
- Jun 22, 2003 PLUG Protests at SCO
- Jun 22, 2003 About the Author
- Jun 20, 2003 15 Minutes and 150MB of RAM to Compare Unix and Linux
- May 30, 2003 Tomato Progress
- May 30, 2003 Checking Code Ownership
- Mar 14, 2003 Update on the Strange SCO Case
- Mar 10, 2003 SCO Goes Nuclear
- Jan 21, 2003 Tomato Advice
- Jan 20, 2003 wxWindows Multimedia Work
- Jan 20, 2003 Last of the Tomatoes
- Jan 19, 2003 Winter Weather
- Jan 19, 2003 Back on the Slopes
- Dec 31, 2002 Wireless Weblogging
- Dec 31, 2002 Hardware Fun With Linux
- Dec 15, 2002 Open Source Consultants
- Dec 11, 2002 Contractor Hiring Tips
- Dec 10, 2002 wxWindows Experiences
- Nov 10, 2002 Lightweight Languages 2 Conference (MIT, 2002)
- Sep 30, 2002 Fromberger spam filtering paper
- Sep 29, 2002 Bayesian Whitelisting: Finding the Good Mail Among the Spam
- Sep 24, 2002 Macintosh Developer Pain
- Sep 23, 2002 Using Bogofilter with Spam Assassin
- Sep 23, 2002 Machine Learning Links
- Sep 23, 2002 FTC Spam Archive
- Sep 22, 2002 How To Test a Trainable Spam Filter
- Sep 20, 2002 Things I Hate About CodeWarrior, Part I
- Sep 19, 2002 EU free software study
- Sep 16, 2002 Weekend Spam Update
- Sep 13, 2002 Why Hygienic Macros Rock
- Sep 13, 2002 Bogofilter: A New Spam Filter
- Sep 13, 2002 Back from France
- Aug 27, 2002 Busy for a While
- Aug 13, 2002 RedHat Bill Update
- Aug 12, 2002 California Open Source Bill: A Really Bad Idea
- Aug 06, 2002 SpamAssassin: An Decent Spam Filter
- Jul 23, 2002 Internet Explorer still broken
- Jul 22, 2002 Yet Another PHP Security Hole
- Jul 22, 2002 Panopticon
- Jul 22, 2002 On the Air
- Jul 22, 2002 Common RSS Bugs
- Jul 21, 2002 Ogg Theora link
- Apr 12, 2002 Unix is a zombie
- Apr 12, 2002 Random Hacks Site Design
- Dec 07, 1998 IMAP multiplexing
- Dec 04, 1998 IMAP engine progress
- Dec 03, 1998 IMAP command loop
- Dec 02, 1998 IMAP and gperf
- Dec 01, 1998 IMAP progress report
- Nov 30, 1998 Linuxconf and IMAP
- Nov 25, 1998 More Gwydion Dylan and BlitzMail
- Nov 23, 1998 Gwydion Dylan and BlitzMail
- Jul 15, 1998 Parsing C headers with Gwydion Dylan's Melange
- Jul 06, 1998 BlitzMail architecture
- Jul 04, 1998 Trying out Glade
- Jul 03, 1998 Enlightenment and Guile
- Jul 02, 1998 Guile's looking much better
- Jul 01, 1998 MathMap custom spread filter
- Jun 30, 1998 Random Hacks goes online! (and a randomized emboss in MathMap)
Subscribe via RSS.

