Monday 20 April 2015

When is a repository not a repository?

I think the answer to this question is “most of the time”, at least that’s the case judging by a lot of the code I’ve seen lately.

For example, during a recent recruitment campaign we’ve been setting a fairly open technical test for the candidates, a test that required them to write some code to solve a problem. A number of candidates chose to expose data using something called a ‘repository’ but in almost every case the so-called repository didn’t look like a repository to me at all.

Does the following look like an interface for a repository to you?

public interface ISeatRepository
{
    IEnumerable<Seat> GetAllSeats();

    IEnumerable<Seat> GetSeatsForVenue(Venue venue);

    IEnumerable<Seat> GetSeatsForCustomer(Customer customer);
}

This is typical of the kind of thing I have been seeing.

My concern is that the term ‘repository’ is being applied to anything that exposes data with no real understanding of the original pattern. This is at best misleading and in most cases demonstrates a lack of understanding of the pattern and/or a laziness to use a more appropriate architecture.

Defining the Repository Pattern

Martin Fowler’s web site describes a repository in the following way:

“A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes.” [1]

I like this definition. The take-away point is that the repository “Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.” Note, ‘a collection-like interface’.

To me the pattern is quite clear: the repository should look like a collection. For example:

public interface IRepository<T>
{
    void Add(T instance);
    void Remove(T instance);
    IEnumerable<T>
    FindAll();
    T Find(int id);
    IEnumerable<T>
    Find(Expression<Func<T, bool>> predicate);
}

There are many alternative approaches to repositories but the one above serves well enough as an example.

One of the key drivers behind the repository pattern is that it isolates domain objects from details of the database access code. This can be a benefit or a drawback depending on your requirements. If for example you need tight control over the SQL then a repository may not be the best way to go; the pattern is designed to hide that sort of detail from client code.

Eric Evans in his book Domain-Driven Design, Tackling Complexity in the Heart of Software talks extensively about repositories. In DDD the repository is a key concept to be used in combination with Aggregates or Aggregate Roots. In fact, repositories should be limited to aggregate roots – you don’t have repositories for every domain type in your system! A discussion of those concepts is beyond the scope of this post but I’d encourage the user to investigate further.

Evans provides this definition:

“For each type of object that needs global access, create an object that can provide the illusion of an in-memory collection of all objects of that type. Set up access through a well-known global interface. Provide methods to add and remove objects, which will encapsulate the actual insertion or removal of data in the data store. Provide methods that select objects based on some criteria and return fully instantiated objects or collections of objects whose attribute values meet the criteria, thereby encapsulating the actual storage and query technology.” [2]
If you are interested Vaughn Vernon goes in to some detail about repositories, including collection-oriented repositories, in his book Implementing Domain-Driven Design.

Repositories - how and when

Use repositories if you want to present clients with a simple model for obtaining persistent objects and managing their life cycle. Be aware that using this pattern will provide a bland interface to the data – that’s the whole point. If you need to write optimised SQL then this probably isn’t the way to go.

In my experience Object Relational Mappers (ORMs) like Entity Framework much less successful but still possible.

Note that if you use repositories you will probably need an alternative method of managing transactionality. The Unit of Work pattern is commonly used in combination with repositories to solve this problem. Rob Connery has an interesting article on why Repositories On Top UnitOfWork Are Not a Good Idea but I think a good part of the issues here are caused by Entity Framework and the need to pass DbContext objects around.

One category of data access technology I don’t think is a good fit with repositories are the micro-ORMs (e.g. PetaPoco, Dapper). These tend to have APIs that are quite close to the SQL and just don’t have a comfortable fit with repositories. If you use these technologies and your data access API doesn’t have a collection-like interface then don’t call your data access classes repositories!

An alternative

An alternative that I like is to use is to implement Commands and Queries. Having seen the potential benefits of Command Query Responsibility Segregation (CQRS) originally put forward by Udi Dahaan the basic concept of separating operations that modify data (Commands) and operations that fetch data (Queries) seemed very attractive. Now CQRS is quite an involved architecture but it’s really an extension of Command Query Separation, a principle that’s been around for years.

Note that:

  • Queries: Return a result and do not change the observable state of the system (are free of side effects).
  • Commands: Change the state of a system but do not return a value.


Jimmy Bogard discusses a similar approach in his post Favor query objects over repositories.

By providing some simple interfaces commands and queries can be easily defined. Firstly queries:

/// <summary>
/// This interface encapsulates queries - read-only operations against a backing store.
/// </summary>
/// <typeparam name="TParameters">The query parameters to use</typeparam>
/// <typeparam name="TResult">The type of result to return</typeparam>
public interface IQuery<TParameters, TResult>
{
    /// <summary>
    /// Execute the query against the backing store.
    /// </summary>
    /// <param name="parameters">The query parameters to use</param>
    /// <returns>The result of the query</returns>
    TResult Execute(TParameters parameters);
}

/// <summary>
/// This interface encapsulates queries - read-only operations against a backing store.
/// </summary>
/// <typeparam name="TResult">The type of result to return</typeparam>
public interface IQuery<TResult>
{
    /// <summary>
    /// Execute the query against the backing store.
    /// </summary>
    /// <returns>The result of the query</returns>
    TResult Execute();
}

And now commands:

/// <summary>
/// An interface for commands. Commands write to a backing store.
/// </summary>
/// <remarks>
/// Commands do not return values because it is expected that the operation will succeed. If it fails an exception will be thrown.
/// </remarks>
public interface ICommand
{
    /// <summary>
    /// Execute the command.
    /// </summary>
    void Execute();
}

/// <summary>
/// An interface for commands with parameters. Commands write to a backing store.
/// </summary>
/// <remarks>
/// Commands do not return values because it is expected that the operation will succeed. If it fails an exception will be thrown.
/// You should create classes to represent you <typeparamref name="TParameters"/>, simple POCOs with properties for the
/// individual parameters to be passed into the <see cref="Execute"/> method.
/// </remarks>
/// <typeparam name="TParameters">The type of parameters to use</typeparam>
public interface ICommand<TParameters>
{
    /// <summary>
    /// Execute the command using the given parameters.
    /// </summary>
    /// <param name="parameters">The parameters to use</param>
    void Execute(TParameters parameters);
}

/// <summary>
/// An interface for commands with parameters. Commands write to a backing store.
/// </summary>
/// <remarks>
/// Commands do not return values because it is expected that the operation will succeed. If it fails an exception will be thrown.
/// You should create classes to represent your <typeparamref name="TParameters"/>, simple POCOs with properties for the
/// individual parameters to be passed into the <see cref="Execute"/> method.
/// </remarks>
/// <typeparam name="TParameters">The type of parameters to use</typeparam>
/// <typeparam name="TResult">The type of result to return</typeparam>
public interface ICommand<TParameters, TResult>
{
    /// <summary>
    /// Execute the command using the given parameters.
    /// </summary>
    /// <param name="parameters">The parameters to use</param>
    /// <returns>The result</returns>
    TResult Execute(TParameters parameters);
}

Interfaces such as these allow transactionality to be implemented on the Execute() methods. For example a base class could be created that automatically has transactions applied to an abstract Execute() method using an attribute (I have done this using Spring.Net which provides a TransactionAttribute).

A simple command might look like this:

/// <summary>
/// A command for saving a <see cref="User"/>.
/// </summary>
public class SaveUserCommand : ICommand<User>
{
    /// <summary>
    /// Execute the command using the given parameters.
    /// </summary>
    /// <param name="user">The user to save</param>
    public void Execute(User user)
    {
        // Save the user using your technology of choice...
    }
}
One thing I have noticed about this approach is that you quickly build up a library of commands and queries which can become difficult to manage and error prone. They all start to look the same. A case of buyer beware.

Wrapping up

If you are writing a data access API and it doesn’t have a collection-like interface then don’t call it a repository. Don’t use the name ‘repository’ as a crutch for poor, lazy design. There are alternatives.

References

[1] Repository, by Edward Hieatt and Rob Mee, http://martinfowler.com/eaaCatalog/repository.html
[2] Evans, Domain-Driven Design, Tackling Complexity in the Heart of Software, p.151-152