Adam Gajek
15 Oct 2020
•
7 min read
When I used to code in Java many years ago, there was a rule to examine any input in methods to check whether they weren’t referencing to null. We knew it was somewhere, but we were not aware where, so the only way is to check everything.
In Scala, there is an Option — the data structure which allows us to model the fact of lack of some data in a more explicit, and less boilerplate way.
But as Uncle Ben (Watch out! Not Bob!) once said — with the great power comes the great responsibility.
In that article, I want to share my experiences about common issues with Option I’ve experienced mostly in recent two years when working with about 10 years old system written in Scala. During that, I’ll add how I think we can protect ourselves to avoid all those problems in the future and be more responsible developers.
For many of you those problems may seem unrealistic, and examples trivial but I think these situations in some degree are present and common even in younger projects.
One of the benefits of decomposing code into a set of functions is the ability to hide their implementation details. We should be able to know what behaviour is implemented inside a function without taking a look at its body every time we spot its invocation.
By creating a function which accepts Option
as parameters, even handling None
properly we are in danger of the situation as below.
val person: Option[Person] = fetchById(id)
val personWithAccount = person.map(generateEmailAccount)
persist(personWithAccount, id)
def persist(person: Option[Person], id: Id): Option[Person] = {
val p = person.getOrElse(createPerson(id))
validate(p).map(insertIntoDb)
}
There is an object of type Person
, we pass it to persist
where in case of None
a new object of that class is created, validation takes place, and then insert operation is executed.
Then a few months later, someone will notice we have to add additional logic. Let’s say we want to create an e-mail account only for an adult person. So our code now is like that
val personWithAccount = person
.filter(_.age > 18)
.map(generateEmailAccount)
Because of that change, the possibility of bug arose because in persist
we try to create a new object representing that particular person, but in fact, that object exists already and we end up with duplication of data as validate
was not prepared for the situation of handling fresh new None value that has been just introduced into our domain space. Hopefully, our DB has some ability to detect and disallow that violation.
This example may be dumb or not realistic but its purpose is only to show the mechanism which is repeatedly occurring in our code and which is much more complicated in real-life situations.
The problem here was that we allowed Option
propagate through different abstraction layers but we cannot control how None
was interpreted in different places.
Initially, it was meant to indicate a lack of some data in our system, but then the same None
object in that special context indicates that a person is not adult yet. So in the whole scope, one object — None
has two meanings. One is the object of type Person which does not exist, second brings us information that this object was found but the data inside does not match some criteria. How to tell a difference at a certain point how it should be interpreted? I don’t know to be honest.
The way to overcome that issue is to either transform Option
as soon as possible to some domain-related data structure which will explicitly encode the fact what did it mean that we got None
or use Either
which brings value about the nature of the problem (usually provided as Left
). In that way, we will avoid situations where it’s not clear what that None
means.
If you don’t want to introduce new classes yet, then the alternative way is to limit that propagation by accepting only concrete objects in our functions.
personWithAccount.flatMap(persist(_, id))
def persist(person: Person, id: Id): Option[Person] = {
validate(person)
.map(insertIntoDb)
}
Initially, the source of Option was a get
method of Map
. Then it spread to the Repository’s methods. Those two use cases are natural and fine, we lack something — we get feedback and decide how to deal with it. But the real outbreak of Option
problems were libraries used for processing JSON documents.
All that libraries in Scala allow to automatically provide serialization and deserialization of data when we use case class as the blueprint for deserialization or the source of data passed to a serializer. That’s very handy but unfortunately comes with a cost.
Recently in my job, I worked on a piece of code responsible for holding delivery of an order and then releasing it when it is ready to go. This feature was added years after the system became alive so the initial domain model does not contain any information about that. It was added, as you may suspect — as an Option
. To not bother with migrations and anything. Easy win.
In the business world the orders can be either on hold or approved, go — no go, even on UI that was implemented as a toggle. The problem was in our code we have three states: None
, Some(hold)
, Some(approved)
.
And again your suspicions are correct, None
did not always mean: “lack of value because someone didn’t bother to separate technical concern of schema evolution so just has put it directly into domain object”. It meant: “if you have approved what was None
then do not send notification about that”, or in a different place, it was even forbidden to approve order which has None as its current approval value.
As you see, you cannot control how None
will be interpreted. It will cross your lines as in case of null or exception and condemn you to live in fear.
What I’d advice for my younger self in that situation? As mentioned before: take care of a good-old separation of concerns. If you don’t want to write custom deserializers then create a separate class to handle that which will act as a blueprint of data. Then transform all unnecessary Option
to your domain meaning values. In that case, it would be really enough to use an approved state as an initial value.
Another example of the great responsibility lying on our shoulders when dealing with Option is to never allow into such code snippets as below
case class PersonUpdated(
name: Option[String],
email: Option[String],
age: Option[Int]
)
What’s wrong with that?
The good thing is that in fact users sometimes do not want to provide all their data so what they send we have to use and Noneing absent data seems to be a good way of capturing data from the clients of our code
But the problem I see here is similar to the one created by incorporating well known DDD antipattern — anaemic domain model. We end up as just data tubes which transfer user input to the database. The practical consequence of that situation is often you are not sure what actions you should take when receiving that, the only thing most of the people do here is just updated Person’s object with the fields they receive and rest leave untouched.
The funny thing is that I’ve seen an approach where None in such case would mean — wipe data for fields you received as empty. Out of control, implicit meaning strikes us again.
In reality, we have a very useful code smell here which says — you divided your responsibilities between client and server poorly, refine yourself!
Let’s say our client is a frontend app. Frontend responsibility to capture user actions with data they provide and then communicate that to the backend. In that scenario, it could be distilled what was updated and instead of pushing one big (very convenient though) object when anything can be present or not it can be broken down into
case class UserAgeUpdated(id: ID, age: Int)
case class UserEmailUpdated(id: ID, email: String)
Of course, we could distil that information as well, e.g. like that
email.map(email => UserEmailUpdated(id, email)))
But in that way, we are just working around a real problem that will exist and rotten your system. What’s more — we’re adding another implicit layer of assumptions what the author had on his mind (maybe, in fact, this email was deleted).
One of the strengths of FP is that object at every stage of its lifetime is correct, initialized and complete, this is one of the components that all sum up into the term of local reasoning. What we have is done, and we don’t need to worry about any additional work should be done yet.
When we use mutable data structures, we allow ourselves and our colleagues to create partial objects that are initialized in multiple steps and filled with data during code execution.
Fortunately in Scala people try to do FP so by default we assume that we are safe. Unfortunately here the truth is not always aligned with our beliefs as you might expect — because of improper use of Option, again.
I’ve seen a lot of case classes where some values were Options just only because at the stage of initialization we didn’t have all the required data to fill it completely so some fields were left None and then at some stage initialized with proper data (or not). In that scenario, I’d told my younger self: when you cannot initialize your object at once then consider splitting it into two separate entities.
Additionally, do you feel the temptation to somehow use that fact and bind additional meaning to that place? A good candidate would be for example to check, let’s say, deliveryTime
and when None then assume in our code that the Order was not delivered yet.
With all of the above we see that in general, we have one root cause of all problems with Option attaching implicit meaning to the fact of the existence of the data in it. Making any assumptions on that is that same as we would assign some meaning to the fact of the size of the collection of data we received from somewhere.
Those observations lead to several actions to be taken when dealing with options — and the most of them is to always as soon as possible transform Option
your receive from somewhere to any data structure that is meaningful in the domain. We should just treat Option
as another primitive in our programming language toolkit, use it at some level of abstraction but never crop up to your domain level where everything can matter.
Sooner or later someone from your team, your successors or even older you will start to add domain meaning to the lack of value, and this meaning will be implicit and forgotten in a few months in future.
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!