Building a Toy Programming Language in Ruby

Do you find programming languages marvelous yet mysterious tools? What if you were given the opportunity to peek under their hood and understand what makes them function? If you're interested in the prospect of getting your hands dirty and developing a programming language from scratch, then this blog post and the following posts in this series will be helpful.

In a series of articles, we will build a very simple interpreted, dynamically typed programming language step-by-step. However, for now, don't worry if you're a bit uncertain about the exact meaning of these terms or are feeling a bit intimidated by the task at hand. We will be using the lovely Ruby programming language to implement the interpreter and explain each step clearly to ensure both novice and advanced developers can follow along. We're naming this language 'Stoffle' in tribute to a lovely South African honey badger that goes by this name.

Why Build a Programming Language?

Stoffle will probably not ever replace Python or Ruby. So, why bother developing it? In addition to being fun, what I hope to show in this series is that building a language is a great programming exercise. Multiple benefits can be gained from this experience:

It demystifies programming languages (and developer tools, by proxy) and allows us to see ourselves not only as consumers of these tools but also as creators capable of devising our own if needs or desires dictate;
This exercise will present some uncommon programming challenges that most of us never face in our daily lives;
We will also learn about components that are useful outside the realm of programming language implementation. A parser can be used instead of a million cryptic regular expressions to deal with, for example, that problematic legacy text file your boss says must now be supported as a new mechanism for importing data into the system you work on.

How Does a Programming Language Work?

Before taking a look at what lies ahead of us in this project, let's first zoom out and understand how a program runs on a computer. As you might imagine, a CPU does not directly support a high-level language like Ruby. CPUs have, however, a series of very low-level instructions that they support depending on their architecture.

If you're curious, research, for example, the x86 architecture, which very likely powers your computer if you're reading this article on a PC or Mac.

So, the task of a programming language is to have its high-level code transformed into machine code that the CPU understands. A plethora of strategies are used to accomplish this task. Stoffle will be an interpreted language, which means the interpreter will translate Stoffle's source code into machine code while the program is running.

Compiled languages, such as C, are a different beast. They have a compilation step that translates and produces binary (i.e., the source converted to machine code) ready to be executed by the target CPU. Another strategy is to compile a source file into another (often, high-level) language that already exists; this strategy is commonly referred to as 'transpilation.'

Keep in mind, though, that things are not as clear cut when dealing with real-world languages. They commonly embrace, in one way or another, aspects and techniques of all these (and other) different implementation strategies. I encourage you to research more about what path your preferred language follows after we get the basics covered.

A Bird's-eye View of Stoffle

As mentioned previously, Stoffle will be an elementary, interpreted, dynamically typed programming language. It will consist of only a few basic data types, the four elementary arithmetic operators, comparison and equality, logical operators, if / else, while loop, functions, and the ability to print to the console.

Trivia: Languages similar to Stoffle whose primary purpose is learning and experimentation are often called toy languages.

Stoffle's interpreter will be implemented using our trusted and beloved Ruby, as mentioned previously. When we fire up our interpreter (stoffle hello_world.sfe), the components and phases our source file will go through before running are as follows:

Diagram showing how our interpreter is going to work

The parts of our interpreter and what happens when a .sfe file is run.

Lexer

Also known as a scanner, the lexer's mission is to convert a plain string of characters into sensible groupings generally called 'tokens.' Imagine we declare a variable called my_var. Our lexer will read these characters and produce a Token::VARIABLE token.

Parser

When we think of source code, its nested nature is undeniable. Consider a conditional expression, for example; its true and false branches are nested and executed depending on the result of the evaluation of the condition.

The main job of the parser is to transform a flat sequence of tokens into a data structure that can represent relationships existing between them. Another important function of the parser is to tell us when we've messed up by reporting syntax errors.

Interpreter

Stoffle's interpreter will be simple and work directly with the data structure produced by the parser. The interpreter will analyze this structure piece-by-piece and execute it as it goes.

In the Stoffle programming language, the conversion into machine code is going to happen because the interpreter itself is going to be a Ruby program (and therefore be interpreted by Ruby's interpreter as we run it!).

Wrapping up

In today's article, we presented a rough overview of the steps we'll take to bring Stoffle to life. I hope you're as excited as I am and that I was able to infuse confidence in those of you who initially thought implementing a programming language was something out of reach for mere mortals.

In the next blog post in this series, we'll start getting our hands dirty by implementing Stoffle's lexer, which means that by the end of the next post, we'll have developed a Ruby program capable of reading Stoffle source code and transforming a bland sequence of characters into a more structured (and interesting!) sequence of tokens.

See you soon in the next installment of this series!