Whiteknight: Data Access Patterns

This is a continuation of my previous posts about the data layer and ORMs

There are a number of data access patterns out there, and if you want a good overview of how they work and how they can be set up, you could do much worse than Martin Fowler’s Patterns of Enterprise Application Architecture. I’ll cover some of these here and talk about why I do or do not like them, followed by some of my thoughts on the matter.

Active Record

Popularized by Ruby on Rails and other similar frameworks, the Active Record pattern has never been my favorite. The Active Record object represents an entity in the database and “knows” how to interact with the database for basic CRUD operations. In essence, Active Record intends to be a combination of Domain Model and Data Model (or, at least, the Data Model and the necessary mappings are a responsibility of the Domain Model itself) and all the necessary persistance logic. Here’s an example of Active Record in action:

var product1 = new Product();
/* Make changes to the product properties here */
product1.Save();

var product2 = Product.Load(id);
/* Make changes to the product properties here */
product2.Save();

Active Record has the benefit of being easy and intuitive for the application developer. To do what it does, the Active Record object typically relies on global state, such as a global data layer implementation:

public class Product : ActiveRecord {
    public void Save() {
        var dataModel = MapToDataRecord(this);
        DataLayer.SaveToDatabase(GetTableName(), dataModel);
    }

    public static Product Load(int id) {
        var dataModel = DataLayer.Load(GetTableName(), id, GetMetadata());
        return MapToDomainModel(dataModel);
    }
}

There are a few obvious problems with this pattern:

The reliance on global state instead of Dependency Injection makes it much more difficult to interact with multiple databases, or even to use this pattern in a Unit Test context.
Violation of the Single Responsibility Principle. The Product object has the responsibily to be the Domain Object AND the Data Model, and to handle mapping and other persistance details.

It probably is possible to setup a custom implementation of the data layer for the purposes of a unit test, but you still have a single global instance. Now you either can’t run unit tests concurrently, or you have to make special accomodations that each test doesn’t interfere with the operations of any other tests. Any headaches in this area often lead to having fewer unit tests and less test coverage of necessary code, which leads to compound headaches down the road. Unit testing should be as fast and easy as possible so there’s never an excuse not to do it.

Repository

Martin Fowler describes the Repository pattern:

A *Repository* mediates between the domain and data mapping layers, acting like an in-memory object collection.

So many Repository implementations that I’ve seen (and so many that I’ve previously written) fall short of this ambition, and instead try to implement the following kind of abstraction:

public interface IRepository<TData> {
    TData Load(int id);
    int Insert(TData data);
    void Update(int id, TData data);
    void Delete(int id);
}

As I’ve mentioned before, the IDbSet<T> in Entity framework is a repository implementation so adding an IRepository<T> abstraction over top of that is unnecessary. Where Repository falls short for me, is when you want to have pre-canned queries, or be able to give a descriptive name to queries you use repeatedly. Where do you put that? IDbSet<T> is closed so you’d need to either wrap it in something or you’ll need extension methods to give your repeated queries a descriptive name. (And if you have any hesitancy at all about EF as a long-term data access strategy, you’re still going to want to encapsulate the dependency and hide the whole thing behind an abstraction anyway) What you’re going to end up doing is wrapping your repository in some other layer to give you descriptive names and pre-canned queries, like a “data-layer Service” or a “data gateway” of some sort.

IDbSet<T> is an EntityFramework idea and it exposes some very low-level database concepts. Using it throughout your code creates spaghetti and headaches, so you do need to have it behind some kind of abstraction (unless your program is trivially small, in which case it doesn’t matter what you do).

Query and Command Object

Martin Fowler describes the Query Object as

An object that represents a database query.

I also like the idea of a Command Object (which Fowler doesn’t describe) as a similar concept but a Command Object would represent a database modification (insert, update, delete) which does not return results.

I like the query object and command object patterns a lot for a few reasons:

The class names can be descriptive
The classes themselves are completely reusable anywhere, where methods on a larger data access class are only reusable where the data access class is available
It is easy to parameterize by using properties on query/command object or (if you prefer immutability and dependency injection as I do) through constructor parameters

The Query and Command objects typically need some kind of Runner or Executor class which opens the connection to the database and executes the Query/Command object with all the necessary connection context information. Because the runner typically does not need to contain any immutable state, you can probably share a single instance easily.

var runner = new SqlRunner(connectionString);
var result = runner.Query(new MyQueryObject(params));

Keep in mind that you could have a Factory or Builder to setup a complicated Query object, or even an Abstract Factory to get instances of all the Query Objects for a given subdomain if you need pluggability or to select the specific type of queries to run based on context information.

Internally, your Query or Command objects could be implemented in a number of different ways, including calls to an ORM or Micro-ORM.

The Command and Query Object patterns have really become some of my favorite techniques because they allow you to encapsulate a single database interaction as a single class with a descriptive name and all the parameterization baked in.

Row Data Gateway

Probably one of the patterns I’ve used the least, I’ve never found a compelling use-case where Row Data Gateway is the best option. It acts, in my limited experience, like an Active Record object, with the dependency injection concerns ironed out. The Row Data Gateway isn’t the object itself, but it provides methods to access the persisted data of the object. The problem I have, and this is why I haven’t employed this pattern more, is that I can’t usually justify creating an entire data access object for the purpose of loading out a single domain object.

If your domain object is very complicated and represents an object hierarchy spanning multiple tables (think a Product or an Order for ecommerce situations), a Row Data Gateway could get you all the details from that entity. But then again it seems a bit misleading to call it a “Row Data Gateway” when you’re accessing more than just one row and accessing data from multiple tables. “Object Data Gateway” or “Aggregate Root Data Gateway” seem to be like better ideas, though the names don’t flow off the tongue quite as easily.

Table Data Gateway

Table Data Gateway is a bit more usable in my mind than the Row Data Gateway because it can be used to access one or many objects. Create a single Table Data Gateway and get data from one row or multiple rows as needed, using a single code path to get what you need.

Where I feel that Row Data Gateway and Table Data Gateway fall short is that they force program structure to mirror database structure, when the impediance mismatch between the two usually leads to one or the other being sub-optimal. I much prefer having a single access point over a single logical data store, which may map in a more flexible way to the physical storage of the database.

Data Gateway

I don’t think I’ve seen this pattern written about anywhere, so I’m happy to call it my own. The Data Gateway is a generalization of the underpinnings of the Row Data Gateway and Table Data Gateway patterns (and is similar to but less onerously named than the “Aggregate Root Data Gateway” I mentioned above). You have a single class which provides access to a single logical data store, without concern for the physical layout of the data.

For example, let’s say my application contains lots of data for my ecommerce application: users, products, suppliers, orders, payments, etc.

Products, as an example, are stored across multiple tables: a Product table, tables for attributes and metadata, tables for pricing and specials, tables with image information. Accessing product information is often done separately from other concerns in the application, so we would have a single ProductDataGateway class to be the arbiter of that. Every access to products, be it inserting new products, updating old products, or any of a number of possible queries, would go through the ProductDataGateway. Keep in mind also that it doesn’t matter to the application whether Product data is stored together with other data in a single monolithic database, or if it was split out into its own database, or if it is spread out across multiple small databases (one for unchanging attribute data and one for frequently-changing pricing and shipping information, for example). The application could also easily have OrderDataGateway SupplierDataGateway and UserDataGateway, to name a few. Each DataGateway would be the official source of persistance for a single area of concern.

(As an aside, it’s no accident that this pattern seems to line up nicely with the idea of a bounded subdomain or even of microservice decomposition)

Internally, the Data Gateway provides descriptive method names to do its work, which are implemented by any of a number of patterns: A Repository Pattern could be used to implement a method on the Data Gateway, and the Repository benefits from having repeated, parameterizable queries being given descriptive names as methods on the Data Gateway. We could also implement the internals as a Row or Table Data Gateway as required or even a Query Object

Currently this Data Gateway pattern, which calls Command and Query objects internally, is my favorite data access technique for almost all use cases.

In my next post I’ll talk about CastIron, a new library I’ve been developing which is an implementation of Command and Query object patterns, and which lends itself nicely to being called from a simple Data Gateway.

16 Feb 2022	ParserObjects 4.0 Development projects
13 Feb 2022	Welcome to 2022 Personal
17 Dec 2021	Good Programmers Manage Expectations Philosophy

Programming, Software and Code

About

Links

Data Access Patterns