Refactor Away The IF - Code Monkey Projectiles

If you have ever worked with me for any period of time you have definitely heard me say “every time you use and IF statement a kitten dies”. Although it is probably not possible to remove all if statements, it is possible to minimize them. Overuse of IF statements is one of the faster ways to make code harder to read, test, and support.

Dead Cat

So why are IF statements so bad?

IF statements are like GOTO statements. The force the reader to jump around in the code. There is usually an immediate violation of Single Responsibility Principle when an IF statement is introduced. The function is now doing two things: the code on the IF branch, and the code on the ELSE branch. When reading the function, the function must be read twice to follow the logic of the two cases. Instead of a single algorithm to analyze and support, now there are two. It doubles the testing effort that is required. And nesting IF statements increases this complexity exponentially.

IF statements are also a gateway to introducing duplication in the code. When IF statements are used it is important that the same code does not appear in the IF and ELSE branches. It may start out small. Maybe there is only an assignment statement that is duplicate. Or perhaps the same function is called with slightly different parameters. But once you get into the habit of using IF statements to solve these problems, then the IF statement becomes a golden hammer. Soon larger and larger blocks of code can be found behind the IF statements. The more code in these IF statements the more likely there is to be duplication.

Let’s jump into some code examples.


  //v1
public Employee FindOrCreateEmployee(Guid? id)
{
    Employee employee;
    if (id == null)
    {
        employee = new Employee();
    }
    else
    {
        employee = _db.GetEmployee(id.Value);
    }
    return employee;
}

It is a simple little function. It isn’t that difficult to figure out what it is doing. The function does one of two things: creates a new Employee class or looks it up from the database. It is not a bad function, but it could be better. But let’s not worry about that now. Let’s add some logging.

//v2
public Employee FindOrCreateEmployee(Guid? id)
{
    Employee employee;
    if (id == null)
    {
        _log.Information("Creating a new Employee object");
        employee = new Employee();
    }
    else
    {
        _log.Information("Querying employee from the database");
        employee = _db.GetEmployee(id.Value);
    }
    return employee;
}

OK, that’s cool. But what if we don’t find that ID in the database? Let’s change it so that it will automatically create an Employee object in this case.

//v3
public Employee FindOrCreateEmployee(Guid? id)
{
    Employee employee;
    if (id == null)
    {
        _log.Information("Creating a new Employee object");
        employee = new Employee();
    }
    else
    {
        _log.Information("Querying employee from the database");
        employee = _db.GetEmployee(id.Value);
        if (employee == null)
        {
            employee = new Employee();
        }
    }
    return employee;
}

Oh, we forgot that when we create a new Employee we have to assign it a new Guid.

//v4
public Employee FindOrCreateEmployee(Guid? id)
{
    Employee employee;
    if (id == null)
    {
        _log.Information("Creating a new Employee object");
        employee = new Employee { Id = Guid.NewGuid() };
    }
    else
    {
        _log.Information("Querying employee from the database");
        employee = _db.GetEmployee(id.Value);
        if (employee == null)
        {
            employee = new Employee { Id = Guid.NewGuid() };
        }
    }
    return employee;
}

Hmm, version 4 of this function isn’t quite as nice as version 1. In version 4, there is some obvious duplication in the code. Also, there are some inconsistencies (i.e. bugs) in that logging isn’t always done when a new Employee is created.

So where did we go wrong? In which version should we have stopped and said, “let’s clean this up a bit before continuing”. It should have been done in version 1. Here is a better version of that first implementation:

//v1b
public Employee FindOrCreateEmployee(Guid? id)
{
    var employee = (id == null)
        ? new Employee()
        : _db.GetEmployee(id.Value);
    return employee;
}

Don’t roll your eyes at me! I can hear you say, “oh come on, you just replaced the IF with a ternary operator. The ternary operator is the same as an IF.”.

Well, no, the ternary operator is not the same as an IF. The ternary operator is much more limiting than an IF, and that’s a good thing. When using IF statements be sure to get in and out of them as quickly as possible. The ternary operator is good at restricting this because it limits the IF and ELSE branch to a single line of code. When you start adding IF, ELSE, and especially curly braces there is no telling how gnarly the branching will get.

Another great thing about the ternary operator is that helps keep the code expressive about what it needs to do. It does this because the ternary operator requires both code paths to return the same type of object (an Employee in this case). When you are writing IF statements with curly braces there is no telling what the code may do. It might assign to the ‘employee’ variable; it might spawn a bitcoin mining process. With IF statements there is no implication that the code branches should be working toward a similar result.

There already was a tiny bit of duplication in version 1 of the function. Both the if and the else branches had the same assignment statement. In version 1b with the ternary operator there is only one assignment statement. We could even get rid of the assignment statement altogether and return the value immediate or convert the function to a lambda expression.

//v1c
public Employee FindOrCreateEmployee(Guid? id) =>   
    (id == null)
        ? new Employee()
        : _db.GetEmployee(id.Value);

So now let’s replay that requirement to add logging. Surely now we need to undo that ternary operator and add some IF, ELSE, & curly braces. Well, no (and don’t call me Shirley).

The ternary operator is going to make it more difficult to add logging directly to this function. However, that is a good thing. The ternary operator is doing its job. It is trying to keep us on the path of keeping IF branches small (one line) and keeping the code expressive.

To keep the ternary operator and add logging, the we realize that there might be more requirements around creating a new Employee that simply calling “new Employee()”. It is best to put that task in a function instead of embedding it in some curly braces below an IF. That function can have a name to better express the role it is playing in the code. Structure in the code emerges. Yes, it’s a small function, but that is good (see my previous post on Small Functions).

So, before we add the logging we can refactor thusly:

//v1d
public Employee FindOrCreateEmployee(Guid? id) =>
    (id == null)
        ? CreateEmployee()
        : FindEmployee(id.Value);

private Employee CreateEmployee() => new Employee();

private Employee FindEmployee(Guid id) => _db.GetEmployee(id);

Now that we have the functions in place, we have a good spot to add the logging.

//v2b    
public Employee FindOrCreateEmployee(Guid? id) =>  
    (id == null)
        ? CreateEmployee()
        : FindEmployee(id.Value);

private Employee CreateEmployee()
{
    _log.Information("Creating a new Employee object");
    return new Employee();
}

private Employee FindEmployee(Guid id)
{
    _log.Information("Querying employee from the database");
    return _db.GetEmployee(id);
}

The third requirement of creating a new employee if the ID is not found in the database is also an easy change. The FindOrCreateEmployee method will need to change.

//v3b
public Employee FindOrCreateEmployee(Guid? id) =>
    (id == null)
        ? CreateEmployee()
        : FindEmployee(id.Value) ?? CreateEmployee();

After making this change in version 3b it is extremely obvious that we just added some duplication to the code. In version 3 with the larger IF statement we added some duplication too, but it was less obvious there due to the size of the function. Small functions don’t leave places where bad code can hide.

To fix this we overload the FindEmployee function. We can apply the Postel’s law and be liberal with the inputs that FindEmployee can take. We overload it so that it can deal with Nullable as well as Guid.

//v3c
public Employee FindOrCreateEmployee(Guid? id) =>
    FindEmployee(id)
    ?? CreateEmployee();

private Employee FindEmployee(Guid? id) => id == null ? null : FindEmployee(id.Value);

private Employee FindEmployee(Guid id)
{
    _log.Information("Querying employee from the database");
    return _db.GetEmployee(id);
}

private Employee CreateEmployee()
{
    _log.Information("Creating a new Employee object");
    return new Employee();
}

In this refactoring we didn’t need to change the FindEmployee or CreateEmployee methods. I.e. we are staying true to the Open Closed Principle. You can’t break functions if you don’t touch them.

The duplication is gone from the FindOrCreateEmployee method. Furthermore, that high-level function is extremely readable. It is closer to English than C#.

Another key point in this small refactoring is that the ternary operator has moved down a function. It was in the FindOrCreateEmployee function and now it moved to the FindEmployee(Guid?) function. Moving ternary operators (or IF and Switch statements) down to lower-level functions is particularly important. This is what enables the higher-level functions to read like English.

IF statements are technical details that should be buried in low-level functions. This conforms to the “Tell, Don’t Ask” rule. The high-level function tell the FindEmployee function to find an employee. The high-level function doesn’t worry about the low-level details regarding whether or not that low-level function can handle nulls or if querying a database for a null primary key is a good optimization. Implementing knowledge at high-levels about what is safe or optimal in low-levels needlessly couples the high and low-level code together. The low-level functions worry about what they can do.

The fourth requirement to ensure that ID is initialized when a new Employee object is created is an easy change. It is painfully obvious for the maintainer of the code to know where the change needs to be made. And that is the ultimate goal, making the code dead simple to maintain. It is the heart of the KISS principle. In version 1d we have created a spot for this change and saw how that strongly discouraged any duplication in the code in versions 3b/c.

Here is the final version with the change to the CreateEmployee method:

//v4b
public Employee FindOrCreateEmployee(Guid? id) => 
    FindEmployee(id)
    ?? CreateEmployee();

private Employee FindEmployee(Guid? id) => id == null ? null : FindEmployee(id.Value);

private Employee FindEmployee(Guid id)
{
    _log.Information("Querying employee from the database");
    return _db.GetEmployee(id);
}

private Employee CreateEmployee()
{
    _log.Information("Creating a new Employee object");
    return new Employee { Id = Guid.NewGuid() };
}

By refactoring the code in version one to remove the IF statement, there never existed a spot in the code for it ever to get bad. My main reason for avoiding IF statements is that by removing them code gains structure and the design improves. Overuse of IF statements easily leads to ball-of-mud code.

The Ternary operator is just one technique to reduce the number of IF statements in your code. In my next post I’ll explore some more.