Object-oriented vs. Data-oriented Programming

Object-oriented programming (OOP) is practically gospel in some parts of the programming world today. At its most basic, it means attaching functionality to particular pieces of data. For example, if you have a bunch of information about a person, you might want to group that together and attach some person-specific functionality to it, say, the ability to send a letter to the person. Different pieces of data can represent different kinds of things, so it makes sense to couple functionality with “type.” Ruby is a great object-oriented language (my favorite), and embodies OOP quite well in my opinion. Everything in the Ruby universe is an Object, and things are instances of particular Classes (which are themselves Objects). Every object in the Ruby world has a particular set of operations that you can perform on it, and it usually contains some kind of data. Encapsulation, Polymorphism, and other Good Things arise out of the object-oriented paradigm.

On the other end of the spectrum, we have Lisp. In a Lisp, everything is data, including code. At its most basic, Lisp supports very few kinds of data, with lists being the only way to group data. Clojure (my favorite Lisp) supports a few more types, such as maps and sets, but thanks to the Sequences API, they appear fairly homogenous. Instead of coupling data with functionality with an object system, Lisp treats everything the same: many functions can act on just about any chunk of data. This way of thinking completely separates functionality and data, in stark contrast toOOP. In the Lisp way of doing things, you build general-purpose functions that you compose together and then operate on homogenous data with. You lose Encapsulation, but Polymorphism isn’t even a concept since types have mostly disappeared.

In the process of writing Computer.Build, I’ve used both paradigms to varying degrees of success. In the Ruby implementation, I build an abstract syntax tree (AST) up using a block-based internal DSL. I then ask this tree to generate itself into VHDL, and it recursively calls generate() on each node. Different nodes have different types, allowing them to generate different VHDL. In Clojure, on the other hand, you create the AST directly using a list literal, so you’ve got this giant homogenous data structure. Without an object system, how do you generate different code based on what kinds of nodes you encounter in the AST? With a dispatch function, of course! Clojure (and many other Lisps) supports a concept of “multimethods,” which are sets of functions that are dispatched to based on another function. This allows me to write a dispatch function that analyzes a particular node in the AST and decides what kind of type it is (generally by getting the unqualified name of the first element). The result of the dispatch function is then used to choose one of many different VHDL-generating functions to run against the particular node, some of which recurse through this multimethod again.

Since I started in Java, I still tend to think in an OOP style. This means the Ruby implementation of Computer.Build should be considerably easier for me, but it turns out I’m gravitating more toward the Clojure. It’s just so…clean! The syntax is terse, the structure is pure, and the multimethod-based dispatch seems so much clearer than fancy inheritance dances. As indoctrinated with the OOP style as I am, I’m starting to feel like a data-oriented style (a la Lisp) is better for some applications. When you’re trying to represent real-world objects, then yes, a type system that allows you to represent classes of things like people and trees and whatnot makes sense. Once you get into the abstract, esoteric world of software, though, the OO metaphor isn’t quite as simple. Heavy OO users tend to end up with huge, complex hierarchies of types that can be pretty confusing. If you switch to a data-oriented approach, though, you can focus on exactly what you need to get done. I can represent the AST for Computer.Build as a homogenous list, and then act on each node based on some arbitrary analysis of it. I don’t have to worry about what type things are or where they inherit from; I just dispatch and go.

I’m a firm believer that different problems require different tools to solve, and programming style is no different. At work, I’m encountering asynchronous, evented I/O for the first time, and it’s a great tool for some applications. Some problems are going to lend themselves to an OOP style, while others are going to fit a more data-oriented approach. With Computer.Build, I’m leaning toward favoring the data-oriented approach.