Symbols and Internalising Strings

Before we talk about Symbol….

Before we explain Symbols, let’s get familiarised with the behaviour of strings (instantiation, storage, etc.) in all these languages. let x be "afsal" and it's a string type (not language specific). x will be allocated a memory space M and is identified by an id (sometimes we call object_id). Invoking x is getting the contents of x identified by id.

Some code in Python

Let us use Python here as an example of getting the id of a variable. You may not see this feature in all languages. But the concept remains the same.

String with same contents share same id?

As seen above, the strings with the same content share the same id.

  • It means, when we tried to define y with the same content as that of x, the application looked up the heap/memory and identified that y could reuse the contents of x.
  • If y needs to re-use x, obvious that they should have the same object_id. id (x) == id (y)
  • No extra copy of y is created, and we saved some space.

In Java/Scala?

  • In computer science, the behaviour is termed as string interning. That is, internalising the strings will ensure that all strings with same contents share the same memory.
  • One source of drawbacks is that string interning may be problematic when mixed with multithreading, but this discussion is out of scope.
  • In the above examples with python/ruby, the string x is internalized automatically, so is for Java/Scala. (However, you will find differences in behavior across these languages soon)
  • In java, String.intern() internalise the strings forcefully, but you don’t do this generally.
  • The intern() method returns a canonical representation of the string object.

Automatic String interning: naive testing in Scala (Skip through if you don’t care)

Let’s do a simple test on the automatic interning of Strings. For simplicity, let’s use Scala console. Let’s create ten strings, each with 10000000 characters of ‘a’. All ten long strings have same contents, and ideally, they should share the same id.

Is interning a property of Strings?

Yes, it is a property of strings. You won’t find an intern method for an Integer variable. However, you may note that ids are constant for certain primitives. Ex: id of integer 1 is always the same.

Is this a consistent behaviour for Strings?

Surprisingly, the answer is NO. In java world, we say there is no guarantee that strings would be internalised and it depends on JVM’s whim, and probably the content itself. Python doesn’t intern strings with special characters for example.

No default string intern in Ruby?

In Ruby, you would see something known as object_id. It is the equivalent of id in python. But as per documentation, the object_ids always differ for 2 active objects. Hence, the following code gives different object_id (or id) for two variables with the same content afsal. There is no string interning happening here!

How to intern strings in Ruby?

The answer is “Symbol”. To make it further simpler, call the method intern for a string in Ruby, and you get a "Symbol" in return.


The concept of symbol is not language specific, but they differ in some or other ways. Let us explore.

Symbol in Ruby

As mentioned in above examples, the way Ruby handles intern is by converting it into symbols. i.e, Symbol representation of String "afsal" is :afsal.

Some performance comparison:

Let us define a string 100000000 times and similarly a symbol 100000000 times, and see which one performs better:

Usage of symbols:

Many times symbols are used as identifiers.Example: Every method name in Ruby is saved as a symbol under the hood. Symbols are widely used as keys in your hash. By using symbols as keys, Ruby need to compare only the object_ids of the already stored key with the new ones, and not its contents/compute-hash-of-each-value. It could be used anywhere with-in your application.

Are symbols always better than strings?

Be aware using excessive use of Symbols results in lots of memory usage. The frequent casting of Symbols to Strings can also slow down your application. Said that memory leakage due to the usage of Symbols is not a concern in Ruby anymore (for version > 2.2) as there is symbol garbage collector now.

Symbol in Python

There is no python equivalent for Ruby’s symbols. However, as you have seen from the above examples, they are interned by default. We have also seen a few examples of strings that were not interned by default in Python. But we can force the intern of those strings using the intern function.

Symbols in Clojure

Clojurists have Strings, Symbols and keywords. This trichotomy may confuse many. It may partly make sense to you as we need String and Symbol.


In dynamic languages, symbols are often used to identify things that have a stronger meaning than a string content, identifiers that are often used more than once. Moreover, in homoiconic languages like Clojure, where code can act as data, the programmer has control over manipulating functions and variables using Symbols to produce various custom behaviours. However, in statically typed languages we could argue that your comparison space is already restricted by types, and most of the times the homoiconic nature doesn’t exist. Hence, although symbols have the same meaning in the context of Java/Scala (i.e., guaranteed interning and faster equality operations), they are probably less used in practice when we compare with Ruby or Clojure.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Afsal Thaj

Afsal Thaj

A software engineer and a functional programming enthusiast at Simple-machines, Sydney, and a hardcore hiking fan.