13 Ways of Looking at a Ruby Symbol

Posted by Eric Kidd Sat, 20 Jan 2007 03:20:00 GMT

New Ruby programmers often ask, “What, exactly, is a symbol? And how does it differ from a string?” No one answer works for everybody, so–with apologies to Wallace Stevens–here are 13 ways of looking at a Ruby symbol.

A Ruby symbol is:

  1. the name of something, not just a blob of text
  2. a label in a free-form enumeration
  3. a constant, unique name
  4. an “interned” string
  5. an object with O(1) comparison
  6. a Lisp identifier
  7. a Ruby identifier
  8. the keyword for a keyword argument
  9. an excellent choice for a hash key
  10. like a Mac OSType
  11. a memory leak
  12. a clever way to store only a single copy of a string
  13. a C typedef named “ID”

1. A Ruby symbol is the name of something, not just a blob of text

In Ruby, we would generally use symbols when referring to things by name:

find_speech(:gettysburg_address)

But to represent large chunks of text, we would use strings:

"Four score and seven years ago..."

2. A Ruby symbol is a label in a free-form enumeration

In C++ (and many other languages), we can use “enumerations” to represent families of related constants:

enum BugStatus { OPEN, CLOSED };
BugStatus original_status = OPEN;
BugStatus current_status  = CLOSED;

But because Ruby is a dynamic language, we don’t worry about declaring a BugStatus type, or keeping track of the legal values. Instead, we represent the enumeration values as symbols:

original_status = :open
current_status  = :closed

3. A Ruby symbol is a constant, unique name

In Ruby, we can change the contents of a string:

"foo"[0] = ?b # "boo"

But we can’t change the contents of a symbol:

:foo[0]  = ?b # Raises an error

Similarly, we can have two different strings with the same contents:

# Same string contents, different strings.
"open".object_id != "open".object_id

But two symbols with the same name are always the same underlying object:

# Same symbol name, same object.
:open.object_id == :open.object_id

4. A Ruby symbol is an “interned” string

In Ruby, we can convert a string to a symbol using intern:

"foo".intern # returns :foo

intern maintains a hash table mapping strings to the corresponding symbol. The first time intern sees a string, it creates a new symbol and stores it the hash table. The next time intern sees a string, it retrieves the original object.

We could implement our own version of Symbol and intern as follows:

class MySymbol
  TABLE={}
  def initialize(str) @str = str end
  def to_s() @str end
  def ==(other)
    self.object_id == other.object_id
  end
end

class String
  def my_intern
    table = MySymbol::TABLE
    unless table.has_key?(self)
      table[self] = MySymbol.new(self)
    end
    table[self]
  end
end

"foo".my_intern

5. A Ruby symbol is an object with O(1) comparison

To compare two strings, we potentially need to look at every character. For two strings of length N, this will require N+1 comparisons (which computer scientists refer to as “O(N) time”).

def string_comp str1, str2
  return false if str1.length != str2.length
  for i in 0...str1.length
    return false if str1[i] != str2[i]
  end
  return true
end
string_comp "foo", "foo"

But since every appearance of :foo refers to the same object, we can compare symbols by looking at object IDs. We can do this with a single comparison (which computer scientists refer to as “O(1) time”).

def symbol_comp sym1, sym2
  sym1.object_id == sym2.object_id
end
symbol_comp :foo, :foo

6. A Ruby symbol is a Lisp identifier

The earliest ancestors of Ruby symbols are Lisp symbols. In Lisp, symbols are used to represent “identifiers” (variable and function names) in a parsed program. Let’s say we have a have a file named double.l containing a single function:

(defun double (x)
  (* x 2))

We can parse this file using read:

(read "double.l")
;; Returns '(defun double (x) (* x 2))

This returns a nested list containing the symbols defun, double, *, x (twice) and the number 2.

7. A Ruby symbol is a Ruby identifier

In Ruby, we can look up identifiers (variable, function and constant names) while the program is running. This is typically done using symbols.

class Demo
  # The stuff we'll look up.
  DEFAULT = "Hello"
  def initialize
    @message = DEFAULT
  end
  def say() @message end

  # Use symbols to look up identifiers.
  def look_up_with_symbols
    [Demo.const_get(:DEFAULT),
     method(:say),
     instance_variable_get(:@message)]
  end
end

Demo.new.look_up_with_symbols

8. A Ruby symbol is the keyword for a keyword argument

When passing keyword arguments to a Ruby function, we specify the keywords using symbols:

# Build a URL for 'bug' using Rails.
url_for :controller => 'bug',
        :action => 'show',
        :id => bug.id

9. A Ruby symbol is an excellent choice for a hash key

Typically, we’ll use symbols to represent the keys of a hash table:

options = {}
options[:auto_save]     = true
options[:show_comments] = false

10. A Ruby symbol is like a Mac OSType

The MacOS uses four-character abbreviations to represent open-ended enumerations:

enum {
  kSystemFolderType  = 'macs',
  kDesktopFolderType = 'desk',
  // ...and so on...
  kTrashFolderType   = 'trsh'
};
OSType folder = kSystemFolderType;

In Ruby, we’d typically use symbols for the same purpose:

:system_folder
:desktop_folder
:trash_folder

11. A Ruby symbol is a memory leak

Because of the way Ruby symbols are stored, they can never be garbage collected. So if we create 10,000 one-off symbols that we’ll never use again, we’ll never get the memory back.

Some Scheme implementations use a clever version of intern that looks up symbols using a weak hash table. This allows symbols to be garbage collected without destroying their uniqueness properties.

12. A Ruby symbol is a clever way to store only a single copy of a string

(For a similar idea, see this article.)

Let’s say we’re working on natural language parser that tries to understand breakfast orders. We have a corpus of 30,000 sentences that represent real-world breakfast orders, and we’re trying to find the patterns.

But even though we have a huge number of sentences, the actually vocubulary is fairly limited. We don’t want to store 15,000 copies of the word “bacon” in memory! Instead, we can use symbols to represent the individual words:

corpus = [
  [:i, :want, :some, :bacon],
  [:i, :want, :some, :eggs],
  [:give, :me, :some, :bacon],
  [:chunky, :bacon],
  # ... 29,995 more phrases ...
  [:some, :toast, :please]
]

In the early days of AI, many Lisp programs used exactly this strategy for representing text.

13. A Ruby symbol is a C typedef named “ID”

Internally, Ruby 1.8 represents symbols using the type ID. This is a typedef for an unsigned integer. An ID represents an entry in Ruby’s symbol table.

typedef unsigned long ID;

Some interesting symbol-related functions include:

// Enter a C string into symbol table.
ID rb_intern(const char *name);
// Convert an ID to a Symbol object.
#define ID2SYM(x)
// Convert a String to a Symbol object.
VALUE rb_str_intern(VALUE s);

Other explanations of Ruby symbols

If none of these explanations work for you, you might have luck with one of the following:

  1. Symbols Are Not Immutable Strings
  2. Using Symbols for the Wrong Reasons
  3. Yet Another Blog About Ruby Symbols
  4. Digging into Ruby Symbols
  5. Understanding Ruby Symbols

(Update: Markus Prinz has translated this article into German. Thanks!)

Tags

Comments

  1. Michael Chermside said 13 days later:

    “In Ruby, we can convert a symbol to a string using intern:” should read “In Ruby, we can convert a string to a symbol using intern:” instead.

  2. Eric Kidd said 13 days later:

    Thanks! Fixed it.

  3. Peter Burns said 13 days later:

    You might want to go into more detail about using symbols for keyword arguments. That’s just using a convenient way that Ruby deals with hashes at the end of argument lists.

    For example: http://api.rubyonrails.org/classes/ActionController/Base.html#M000261

    The way you have it seems misleading to someone new to Ruby, suggesting that

    def foo(bar=1,baz=2)
      puts baz
    end
    foo :baz => 3 #=> prints:   2

    would output 3. Rails is just using an options hash at the end of parameter lists to do something like named arguments.

  4. Eric Kidd said 13 days later:

    Thanks for the suggestion! Here’s a sample function with keyword arguments in Ruby:

    def foo opts={}
      bar = opts[:bar] || 1
      baz = opts[:baz] || 2
      puts baz
    end
    
    foo :baz => 3

    This is a little clunky, but it works nicely enough in practice.

  5. Murray Spork said 16 days later:

    Regarding no.7 – not sure if I got the point entirely – but for the 2nd item in the arrary returned by look_up_with_symbols be:

    method(:say).call

    Otherwise you get the following array returned:

    => [“Hello”, #, “Hello”]

  6. Murray Spork said 16 days later:

    2nd array element is (now correctly escaped):
    #<Method: Demo#say>

  7. Eric Kidd said 16 days later:

    Murray: Yes, the code in number 7 is extremely useless. :-)

  8. Anonymous Cow said 16 days later:

    But I hear tell that in 2.0, Symbols will be strings, oh my!

    (Though, they will be thoroughly frozen, as to not harm the children of the world).

  9. Austin Ziegler said 17 days later:

    Um. I think that this makes post makes Symbol objects far more complex and magical than they really are. They’re not magic. There’s a few interesting facts about them (there’s only one copy of a given symbol; they’re not currently garbage-collectable; how they’re implemented in C), but nothing else really helps people understand Symbols.

    I said it in the O’Reilly blog months ago, there’s nothing magical or special about Symbols. It’s in the intent of the call you’re making. What’s magic in “attr_accessor :foo” is not :foo, but “attr_accessor”.

    Anonymous Cow, matz has backed off from that proposal, if I remember correctly.

  10. Michael said 17 days later:

    Actually, Austin, I think Eric’s post goes a long way to removing a sense of complexity and “magic” from some people’s interpretation of the symbol as a data structure, and I would argue he provides some very helpful perspectives. To some, these views may seem obvious; but everything is obvious once you’ve seen it.

    As with many of the basic features of any programming language, symbols are a simple idea that gains a kind of complexity by virtue of its generality. Therefore, I think there is real value in exploring many different views of the same simple concept, even though doing so does not change its intrinsic simplicity.

  11. Austin Ziegler said 18 days later:

    I can’t disagree more, Michael. The reality is that the more attention that is paid to what a Symbol is, the less attention is paid to the intent. Symbols aren’t magic. They’re not complex. Talking about how they’re implemented is, for most people and especially Rails folks, unnecessary and confusing.

    Symbols are names. That’s all that matters. Saying anything more than that while trying to define a Symbol is a fruitless exercise. What’s in a name? The meaning one gives to it. You could call me “Betty”, but that doesn’t mean that I’d respond to it—it doesn’t have meaning to me.

    Ten of the ways (1, 2, 5 – 10, 12) are simply ways of saying exactly what #3 says, and they’re less useful because they try to make comparisons that deal with HOW Symbols are used, not WHAT they are. #11 is important to know about dynamically generating symbols (an implication, if you will), and #12 matters ONLY to people who are diving under the covers of Ruby into its implementation. (It’s not true in any case, for JRuby, which is a damned good argument for not including it.)

  12. Charles Oliver Nutter said 18 days later:

    Austin: I think you must have meant #13 is not true for JRuby, because obviously #12 must be. Just wanted to clarify that for anyone reading…

    And I agree that Symbols are names. That’s how I describe them, and it’s a core reason I argued against making them < String.

  13. Eric Kidd said 19 days later:

    Over the years, quite a few people have asked me, “What is symbol?” I usually tell them, “A symbol is a name.” And for some people, this clicks instantly.

    Other people quite sensibly reply, “Well, why is there one kind of strings for ‘names,’ and another kind of strings for, well, strings? How do I decide when to use which?”

    And those aren’t such easy questions to answer. Rails confuses the question further by making symbols and strings pretty much interchangeable.

    What I’ve found is that different explanations work for different people. One programmer said, “Oh! They’re just enumeration labels! That makes sense.” And when I was a novice Lisp programmer, #10 was the one that made things click for me.

    So please, take whatever explanation works for you, and ignore the rest. And if you’d prefer a simpler explanation, one which only focuses on a single aspect of symbols, please feel free to link to it from the comment thread!

Comments are disabled