Yehonathan Sharvit
10 May 2021
•
5 min read
The purpose of this article is to guide you toward data enlightenment by illustrating the advantages of programming by with data instead of objects.
Data enlightenment is a 3-step journey:
You code on a language that supports only objects, like C++, Java or C#.
You code, you suffer, everything is complicated... You don't understand why...
One day, you become aware of your suffering.
You code on a hybrid language like JavaScript, Ruby or Python.
That's much more fun than before, but still objects are there and it causes you suffering.
You choose to write as much code as you can using only data.
You code on a data language like Clojure.
There is nothing to say. No words can really express your feelings.
Your heart is full of gratitude. Your are fully enlightened.
Let me start by clarifying what I mean by data and objects.
An object is an entity made of:
In this article, we focus the discussion around members and we do not deal at all with polymorphism and inheritance.
Here is an example of a common object in Java - a Product
with name
and price
:
class Product {
String name;
int price;
Product(String name, int price) {
this.name = name;
this.price = price;
}
String getName() {
return this.name();
}
void setName(String name) {
this.name = name;
}
int getPrice() {
return this.price;
}
void setPrice(int price) {
this.price = price;
}
}
Product pencil = new Product(pencil, 2);
Remark: The code is not always as verbose as in this example as there are various ways to avoid the verbosity of setters and getters, even in Java (see lombok).
In the context of this article, by Data I mean a dictionary (a.k.a hash map) with arbitrary keys and values (think about JSON).
Here is how we create a piece of Data in JavaScript:
var pencil = {
name: "pencil",
price: 2
}
The biggest issue with using objects to represent data is that one has to create a class for each piece of data.
Usually, we have different classes for similar entities in different modules. The fact that similar entities share similar fields is not easy to leverage in the object realm and there is no generic way to instantiate object of class A from object of class B even when the two classes have the same fields. By a generic way, I mean a piece of code that doesn't depend on class A and B.
In a typical e-commerce application, we would have classes for users, customers, products etc... Even worse, we would create separate classes to represent a product depending on what module handles the product. For instance:
ProductInApp
for the representation of the product when handled in the application moduleProductInDb
for the representation of the same information in a way that can be handled by our DB driverProductInApp
and ProductInDb
might have the exact same fields - maybe with different names - but it doesn't save us from creating two classes. In addition to that, there is no generic way in the realm of objects to convert from ProductInApp
and ProductInDb
. One has to write a specific ProductInDb
constructor that receives ProductInApp
as an argument. (And another UserInDb
constructor that receives UserInApp
as an argument etc...).
On the other hand, in the realm of data, we manipulate dictionaries. Dictionaries are universal. We can write generic functions to manipulate them. For instance, one can clone a dictionary without any knowledge about the fields in the dictionary. One can also add fields to a dictionary. The only thing that is required is the name of the field and the value that needs to be associated to this field.
Imagine, for instance, that before sending our data to the database, we want to add a created_at
field with the current timestamp.
This is how it might look like in JavaScript:
function addTimeStamp(data) {
var res = data.clone();
res.timeStamp = new Date();
}
Remark: There is no deep clone function available out of the box in JavaScript. Several libraries provide implementation for deep cloning (See e.g. cloneDeep in lodash)
addTimeStamp
is a generic function: it works with any kind of data: users, products etc... It doesn't matter.
We can even generalize our addTimeStamp
function by passing to it the name of the field for the timestamp (we might prefer created_at
over timestamp
in some cases). The code is still quite trivial:
function addCustomTimeStamp(data, field_name) {
var res = data.clone();
res[field_name] = new Date();
}
Imagine writing something like that in a standard Object Oriented language. It would involve super advanced tricks like reflection. While in a data language it's a simple generic function.
Communication between web frontend and backend or between http services over REST is string based. Usually, we don't pass objects over the wire.
In order to represent the information stored in an object as a string, one has to serialize the object. In order to serialize an object, one has to either:
Both are quite cumbersome.
Remark: libraries like Jackson for Java make it easier to serialize objects.
In the data realm, serialization comes for free and it works with any piece of data. For instance, Javascript provides a JSON.stringify
method:
var pencil = {
name: "pencil",
price: 2
};
var pencilStr = JSON.stringify(pencil);
What about testing?
Imagine you want to use Amazon EC2 API to create machine instances programmatically.
Let's take a look a code sample in Java using Java SDK for EC2:
RunInstancesRequest runRequest = RunInstancesRequest.builder()
.imageId(amiId)
.instanceType(InstanceType.T1_MICRO)
.maxCount(1)
.minCount(1)
.build();
RunInstancesResponse response = ec2.runInstances(runRequest);
How can you write unit tests for this code?
How can you maker sure that the various methods (imageId
, instanceType
, maxCount
and minCount
) are called with the correct arguments?
Usually, it involves mocking and the code for the unit tests becomes rapidly very complicated.
Let's compare it with a similar code sample in Javascript using JavaScript SDK for EC2:
var instanceParams = {
ImageId: 'AMI_ID',
InstanceType: 't2.micro',
MinCount: 1,
MaxCount: 1
};
var response = ec2.runInstances(instanceParams);
The big difference is that now we are in the data realm: instead of passing an object to runInstances
, we pass a dictionary.
Writing a unit test that checks the dictionary has correct keys and values is trivial. It doesn't require any mocking and the code for it is quite simple.
There are still a lot to cover and there are definitely advantages of Object Oriented programming (like type checking, refactoring tools ...) that are difficult to achieve with Data Oriented programming. That might be the topic of a future article.
We have illustrated three main advantages of the data oriented approach:
I hope that I was able to motivate you to take a step forward in your data enlightenment journey.
I wish you a happy Data Enlightenment journey!
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!