Continuing my chapter-by-chapter review of A Philosophy of Software Design By John Ousterhout, today I’m going to review Chapter 4.

Chapter 4: Modules Should Be Deep

Modular programming is a pretty great thing. When you finally start to understand, appreciate and leverage the two pillars of Encapsulation and Abstraction you become a much better programmer. The problem with the idea of “modular programming”, like so many other concepts in programming is defining what, exactly, the words mean. What exactly is a “module”? It’s the same question that arises when we talk about unit testing and are left having to define what a “unit” is. Good luck with that.

Whatever exactly a module is, John explains that it has two basic parts:

We think of each module in two parts: an interface and an implementation. …the interface describes what the module does but not how it does it. The implementation consists of the code that carries out the promises made by the interface.

Relatedly (though not exact analogues), the abstraction and the encapsulation. The inside and the outside. What you say and what you do. It’s worth noting here, especially for the C# coders among us, that “interface” here is not the same as a C# interface but is instead referring to the public surface of the module, which users of that module interact with:

  • For a subsystem, the interface is the public methods on the public Service Layer, Facade, Gateway and other exposed classes which other subsystems use
  • For a class the interface is the set of public (and, depending on design, your internal and protected) methods and properties
  • For a method, the interface is the method signature.

John also goes on to talk about the “formal” and “informal” (explicit and implicit) parts of an interface, and how having more formal and fewer informal bits to your interface is desireable. Then he talks about the “important” and “unimportant” details in an abstraction, and arrives at the conclusion that interfaces should have the smallest possible size while still including all the important details. All of this is insightful, well-worded, and uncontroversial. For more of the details I’ll refer the reader to pick up a copy of the book.

Deep Modules

Now that we’ve talked about what an interface is supposed to be (small, with few informal requirements and all the important details), we move on to what the implementation should look like.

The best modules are those that provide powerful functionality yet have simple interfaces.

In other words:

The best modules are deep: they have a lot of functionality hidden behind a simple interface. A deep module is a good abstraction because only a small fraction of its internal complexity is visible to its users.

My concern is the assertion that a good abstraction contains lots of details, and hides most of them from the user. I think that the best abstractions are the ones where you don’t care what is behind them. It shouldn’t matter whether the class is deep or shallow, whether it has lots of details or almost none, whether it is brainy and brawny or fluttering in the wind. Consider our old classroom nemesis, the Fibonacci sequence. When I call a method to get a Fibonacci number, it doesn’t matter to me (assuming it doesn’t completely tank my performance) whether it’s making a recursive call, or it’s using memoization, or even if it’s a simple table lookup. Consider this implementation:

private readonly IReadOnlyList<int> _precalculatedValues = new [] { 1, 1, 2, 3, 5, ... };

public int Fibonacci(int n)
{
    AssertIsInRange(n);
    return _precalculatedValues[n];
}

It’s a great abstraction because it appears to to the caller that it is doing work, but the caller doesn’t know and doesn’t need to know that it’s actually doing almost nothing. We could just expose the array and have every caller access the array directly, but that leaks the information about how the data is stored, which is even worse for encapsulation. So, instead of a good abstraction being one where “lots of details are effectively hidden” instead I say a good abstraction is one where “it doesn’t matter what the details are or whether there are any details at all”.

Now this does come into some conflict with his main thesis about complexity, that the existence of a class adds to complexity and the class needs to do enough work to justify its existance. I would suggest that instead of doing “work” to justify itself, the class should “provide value”. The amount of value a class provides is independent of the amount of work it does. If I have a class that implements a large complicated algorithm, we say that it is deep and does a lot of work and is a net improvement in system complexity. But if we have a serious “Eureka!” breakthrough and simplify the implementation of our algorithm by a significant amount, the value hasn’t decreased but the amount of apparent work and depth of the class does.

Where this starts to become a problem in my mind is the suggestion that deeper modules are better will lead to merging of properly abstracted modules together to artificially increase depth. A good class can do a lot of work with a small amount of streamlined code, and suggesting that we need to jam more code into an elegantly streamlined class to make it “better” is not a good one. It’s not something he says, but he does seem to imply it without caveat.

At the class level, several patterns can be and often should be thin: Facade, Bridge, Adapter and Decorator (which he does talk about later and I will cover separately). Factory can also often be thin if you are using the factory to abstract away the choice of implementation (subclass) instead of abstracting away complicated initialization logic. The purpose of a good Facade is to organize and simplify, and to insulate the caller from the changing implementation details of the underlying subsystem. If this class is shallow and easy to follow along, all the better.

He mentions a Linked List as an example class where the interface has similar size to the implementation, and describes this in negative terms. It’s hard to imagine something so powerful, effective and generally useable as a Linked List being a negative thing. He does say this:

Shallow classes are sometimes unavoidable, but they don’t provide help much in managing complexity.

I strongly disagree. One thing I’ve said before and I’ll say often again is that we need to separate classes which should contain logic from classes which should not. An MVC Controller class, for example or a Service Layer class often need to serve a purpose of gathering dependencies and orchestrating them, but the actual logic should be pushed down into pure (and therefore testable!) helper classes. The Controller and the Service, by purposefully being as thin as possible, produces benefits throughout the entire code base:

  1. Large numbers of dependencies can be aggregated in a single place, and their interactions can be effectively and simply orchestrated in a few limited places
  2. We can separate classes which are difficult to test (those with many dependencies) from classes which contain the most testable logic (“deep” pure-logic classes)
  3. We can separate out pure logic (testable) from external dependencies (DBs, web services, etc. Untestable.)

In other words, having shallow and logic-less services helps manage complexity by separating which types of complexity go where. Complex dependency graphs belong in the service, while complex pure logic belongs in helper classes. That way we’re not dealing with both problems in both places.

It’s also maybe worth mentioning that a class can be deep in concept, even if it’s not deep in actual lines of code. A deep and complex thought which turns into a relatively small amount of code seems like what we are striving for. A lot of what people think of as “Machine Learning” turns out to be relatively simple linear algebra code applied to big data sets. The implementations of some of these algorithms, especially if you have a good linear algebra package at your disposal, might be deceptively simple compared to the value that they provide.

Classitis

The conventional wisdom in programming is that classes should be small, not deep.

That might be the conventional wisdom, but that’s hardly the rule I follow. Instead, the way I think about it and the rules that I suggest others follow are:

  1. A method is a single thought. It should fit completely on a single screen so that it can fit entirely in your mind at once.
  2. Methods and classes should follow the Single Responsibility Principle: A class should represent a single concept and a method should perform a single operation within that concept.

Besides the rule about fitting on a single screen (and, unless you’re developing in an 80-column terminal, modern screens are often quite large and high-res) there’s not much in there about size. The size really doesn’t matter much so long as you can reason about the method as a single unit and the method has one responsibility. Considering that the most important use of methods is to give a block of code a descriptive name, and descriptive names are quite valuable things, the size of the method really doesn’t matter.

Conclusion

I like where he is going about the division between interface and implementation, and the idea that deep classes can be a benefit. I don’t like the prescription that classes should be deep. Instead, I prefer a more nuanced approach: Deep logic should live in pure methods and classes which are easy to test completely. Other classes, which orchestrate the deep logic absolutely should be shallow and straight-forward (with a cyclomatic complexity of 1 and a Code Integra of 0, ideally). If you do things this way, your high-value, deep, pure logic is easy to test to the hilt and everything else doesn’t need to be. This strategy, I’ve found, pays dividends in the long-run.