Fighting the Fifty-Method-Repository using Specifications

Recently I have been working on a domain specific content management system, that, like most content management systems, lets the users filter information along several dimensions and even has a fancy full-text search.
It sports a web-based UI that is implemented using an MVC-architecture. Essentially requests are being served by controller methods, that pull the data from the content repository and throw it at a template.

When writing our content repository in the traditional way (adding a method per query) we realised, that we were adding a lot of methods with signatures, that were just different combinations of the same set of parameters, which felt wrong. The next observation was, that finding meaningful names for these methods was also quite difficult. The first impulse was to describe the query, which is just paraphrasing in text, what can be expressed in a query language more elegantly (on a similar note my ex-colleague Jay Fields called test-method names a smell). If on the other hand you try to name your repository methods after the intent, these names become very similar to the controller actions, this also seems wrong as it leaks responsibility from the controller (decide what data to display for a given action) into the repository. Thirdly thirdly we found that, when we tried to review, what content we display on the respective pages, we actually had to do a lot of drilling down into the repository to figure out, what data gets retrieved.

So what was to be done about this? The first idea was to have SQL or a hibernate query straight in the controller, this actually makes it very obvious what data gets retrieved. At first glance the only thing that stands against this is agile folklore. But then looking closer at our repository we could of course identify a few genuine responsibilities for the repository. The simple and obvious cases were pagination and abstraction from the underlying data store. The third one was actually translating some of our domain notions into a set of conditions. Whether a piece of content is published depends on it not being marked as draft and on the current date being inside the publishing interval of the particular piece of content, i.e. each piece of content has a start date and an end date.

The solution we came up with was turning our query into a proper object describing the objects we want back. Eric Evans calls this a specification. One of the DDD-patterns that should be used more often – as opposed to the abominable service pattern. So using CSharp our query looks something like this:

public class ContentQuery {
 
    public ContentQuery(){
        Status = Status.All;            
        Sort = new List<ContentSort>();
        Pagination = PaginationStrategy.ALL;
    }
 
    public long? Id { get; set;}
    public Status Status { get; set; }
    public string FullText { get; set; }
    public Author Author { get; set; }
    public Set<Tag> tags { get; set; }
 
    public IList<ContentSort> Sort { get; set; }
 
    public PaginationStrategy Pagination { get; set; }
}

We are using CSharp properties, to get the syntactic suguar around object creation. Creating and executing a query comes down to:

var query = new ContentQuery{
                Status = Status.Published,
                FullText = "Java"
}
 
var content = contentRepository.Execute(query);

PaginationStrategy is a simple pair that takes the current page number and the number of items per page. The sort is for specifying a sort order. A typical query in our system looks like this:

var query = new ContentQuery {                                                                                
         FullText = "No more .net",                                                     
         Status = Status.Published,                                                     
 
         Pagination = new PaginationStrategy(1, 15),                                              
 
         Sort =   {
                 new DescendingSort(ContentSortField.StartDate),                           
                 new AscendingSort(ContentSortField.Title)
         }                                                                        
}                                                                              
 
var content = repository.Execute(query);

Even though this is quite a bit of text, I do prefer it over something like this:

var content = repository.GetPublishedContentOrderedByStartDateAndTitle("No more .net", new PaginationStrategy(1, 15));

Also, as the query is now an object, you could introduce factory methods that populate the defaults. You could have two factories, one for for admin queries and one for public front end queries, that default to showing all content and published content respectively.

The first cut of the implementation of our repository then looks like this:

public List<Content> Execute(ContentQuery query){
        var criteria = Session.CreateCriteria(typeof(Content))
 
        if (query.Status == Status.Published){
                criteria
                       .Add(Restrictions.Le("StartDate", DateTime.Today))                
                        .Add(Restrictions.Ge("EndDate", DateTime.Today));
        }
 
        if (query.Status == Status.Draft){
                criteria
                        Restrictions.Or(
                                  Restrictions.Gt("StartDate", DateTime.Today),
                                  Restrictions.Lt("EndDate", DateTime.Today)
                        );
        }
 
        if (query.Author != null) {
                criteria.Add(Restrictions.Eq("Author", query.Author)) ;
        }
 
        if (query.FullText!= null) {
                criteria.Add(Restrictions.Like("Body", "%" + query.FullText + "%")) ;
            }
 
         ...
 
         return criteria.List();
}

There is a lot of conditional logic going on, but believe it or not , they are all “good ifs” (I should write a blogpost about good and bad ifs at some point). The approach even though being based on simple conjunction of criteria was covering all our use cases. More importantly it also helped with optimisation. We actually switched from using hibernate and the database to using lucene for searching content. The only thing we had to change was this single method inside the repository. So all the special case optimisations are hidden away in the repository and not exposed to the application (I recall a project that had a method along the lines of GetAllContentForTagOptimisedForTheFirstPage(Set<Tag> tags) – that was called from application code)

Another interesting observation is, that the query object is actually a very good candidate for a model object behind a search view/ UI, with all the fields of the query object matching one element in the UI. This reminded me that back in 2006 I did something similar coming from a UI perspective.

Verdict:
The SoC score of this solution is pretty high. It decoupled our application from the database and allowed for low impact optimisations. It also made the application code a lot more readable.


Posted

in

by

Tags:

Comments

5 responses to “Fighting the Fifty-Method-Repository using Specifications”

  1. Marco Valtas Avatar

    Felix,
    It seems to me that this solution is very similar to Interpreter pattern, of so, maybe is possible to avoid some ifs. If not, can you explain the difference so I understand it?

    1. felix Avatar
      felix

      Marco,
      you could see this as an interpreter. My intent was however to show how to reduce the coupling between the application and the data access layer. As you say you could get rid of the ifs bei making the specification a bit smarter and push that responsibility into the specification (and make the specification a proper composition) or you introduce a visitor for the query object. While both solutions have there merits, the simple conditional worked for us and as I said they are all a good conditionals. And yes I do owe you the post about “The Good IF, the Bad IF, and the Ugly Case Statement”.

  2. szczepiq Avatar
    szczepiq

    Cool. I would do the same Felix :D.

    Why didn’t we do it at GU? (Maybe the repositories there were as ugly?)

    The immediate problem is massive number of overloaded/similar methods in repository with complex sets of parameters. I’m wondering if you would also suggest this pattern in ‘friendlier’ languages like python, where you have named parameters, or groovy/ruby, where you have nice dictionary literals.

    PS. I also did some stuff in .NET lately and I must say ReSharpered C# is pretty awesome.

  3. Jae Avatar
    Jae

    Well done 🙂 I suppose we often forget to make use of all the method signature (name, parameters) to make the interface expressive enough. I agree that if I have to see the implementation of the method that I’m calling to see what I can expect to be returned, then there’s something wrong.

  4. Stacy Avatar
    Stacy

    I think I’d create a query building dsl so I could write

    repository.Execute(new ContentQuery(“No more .net”)
    .WithStatus(Status.Published)
    .OrderedBy(new DescendingSort(ContentSortField.StartDate),
    new AscendingSort(ContentSortField.Title))
    .WithPagination(1, 15));

    Each method like ‘WithStatus’ is essentially a combinator, a function taking an immutable object and returning a new immutable object; I don’t see much fuss in ContentQuery being mutable though.

    Of course the ‘combinators’ I’ve provided mirror (and thus add little value to) the fields in ContentQuery, but additional combinators could be added that bridge between the low granularity you had before and the high granularity you have now, eg:

    repository.Execute(new ContentQuery(“No more .net”).RecentPublications.WithPagination(1, 15));

    Whilst I personally prefer the combinator approach to using CSharp properties it might not be worthwhile to add this until you observe duplication in the queries made.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.