Keeping code simple is an art. It may require a reasonable amount of work – refactoring and trying design experiments to keep it as simple as possible. Changes to requirements may result in the current design no longer being the simplest thing any more.
One way to attempt to keep things simple is to reduce complexity by not implementing anything that a current use case does not need. That way, changing the design only has to worry about code that is being used – there is no guessing game.
Another way to attempt to keep things simple is to reduce the paths that can be taken through your code base to reach a given point. Making small cohesive units that can be reasoned about with a well-defined interface to the rest of the code base ensures you can change those cohesive units more easily.
A way to increase complexity is adding unnecessary code. This is often done because “it’s obvious that it is the right thing to do” or “we’ll probably need it” rather than driven from any actual use case.
An example of this can often be seen when using an ORM. Given a model that is composed of attributes and a list of another type of model, a common pattern is for both sides to know about each other. The parent model will have a list of children and the child model has a reference back to the parent model. This is particularly obvious when the design is data driven (driven from the database tables) and the designer feels comfortable just grabbing any table / model and loading something from it and working from that view of the data.
A classic modelling example is an implementation of an Order. An Order may have many Line Items in a list. In the standard way that typical ORMs encourage you, the Order model will know about its Line Items and a Line Item will know about its Order. This means that I can load an Order object and expect to be able to access its Line Items. It also means that I can load an individual Line Item and directly access its Order. And then I could use that Order to get the Line Items, one of which was the original object that started the request.
What’s wrong with that?
In the simplest implementation with limited complexity this may be fine. But it could lead to some problems as the complexity of the solution grows.
One problem that can arise is cyclic dependencies – where the parent can call a child and the child can then call the parent and around we go. This may be hard to reason about – particularly as the object graph grows in size.
Maintaining a cohesive world
These convenience methods increase complexity. Having a design that can be entered from anywhere and needs to remain cohesive in any direction from that entry point increases complexity. Additional code may be written to compensate.
It can lead to needing the world to be cohesive no matter which object I load and manipulate. By allowing the Order to know about the Line Item and the Line Item to know about the Order, allows the option to add a Line Item separate from an Order, which may not be valid in the domain. We may always expect a Line Item to have an Order and this could be violated. In order to resolve this, we need to add validation on the Line Item to ensure there is an Order.
A potential design may be that changes to a Line Item may expect changes on the Order model. If we can load the Line Item first, then we need to solve the consistency of the Order model. Clearly we should add callbacks when the Line Item model saves in order to ensure the Order is up to date.
Maybe we have a tax amount stored on the Order. When this updates, all Line Item’s need their totals to be updated due to the change in the tax percentage. Clearly we should add callbacks when the Order saves to update this.
If I can access a Line Item directly, I can also change its total. This may require the Order’s total to be updated. Clearly we should add callbacks to update the Order when the Line Item is saved.
Now I’ve written a bunch of callbacks in order to keep my domain valid because I can load anything and use it. I also may have introduced a cyclic dependency and a cascade of DB updates that I don’t control very well. When I save the Order, all Line Items will be loaded and saved. Each Line Item may change the total value in the Order… and if we do this badly we’ll get stuck in an unexpected loop.
Increased testing for unsupported use cases
In an ideal world, every use case that you build should have a test. Putting in the back connections between two models should be driven by that use case. If you don’t have a failing test case, don’t write the code. If you don’t have a use for the code… don’t write the code. But if you’re writing the code, write the test. Which increases the testing burden – for unsupported use cases.
Instead of providing inconvenient convenience methods, why not contain the complexity and not provide it?
For a start, could we make all models talk in only one direction? What would that do to the code? Suddenly we have reduced complexity. In order to load one model, it can only be accessed one way. There is no expectation of the other way.
For instance, assume that the Order knows about its Line Items. But the Line Item does not have a back reference to the Order. What would that mean?
A Line Item can never be created without an Order.
A Line Item is updated through the Order, therefore any consistency that the Order needs to maintain due to changes in the Line Item are simply done in the Order.
When the Order’s tax is updated, we now have a design decision – do we update all Line Items at Order save time? Or do we update Line Items as we pass them out of the Order? Suddenly we have control over the performance characteristics of this update.
The Order / Line Item relationship is now easier to maintain and reason about as the expectations of the relationship are simpler.
Line Items may still be read directly, but the design states that there is no expectation of accessing the Order. In order to access the Order, use the order_id that may be in the Line Item object to directly load the Order and get a fresh object.
But that is more work!
Suddenly we have a little more work. The little more work is clear and obvious. We can’t load a Line Item and expect it to just get the Order.
I would argue that the backlinks have similar (and potentially significantly more) complexity – but that isn’t visible until later when you realise that the Order needs updating when you save the Line Item that you loaded directly.
Not putting in the back links acknowledges the real work to be done and the thinking required instead of pretending it isn’t there.
Let’s go further – the Aggregate Pattern
What would happen if we defined a model like the Order / Line Item relationship, but only allow any interaction to the Order / Line Item cohesive unit via the Order? What would happen if when we interact with the model, it was always fully loaded and well defined and hence our expectations would be well defined.
This is the aggregate pattern, a pattern popularised by domain driven design. The Order model would be the root of the aggregate. A Line Item would not be accessible to the domain except via the Order. A design choice could be to only allow new Line Items being created via the Order. Suddenly we have control of all interactions with the models.
There is more to the aggregate pattern and there are more implications and ideas, but I’ll write on that another time.
Keep it simple
Why not try to keep things simple instead of assuming things are necessary? Experiment with patterns that have “extreme” ideas, as maybe there is something to learn. Choosing constraints in your design that help simplicity and reasoning – even when it means slightly more explicit coding – can be a liberating thing.
Maintain your confidence in the face of change. Keep it simple.
A potential caveat
Maybe your ORM needs the backlinks to do some of its magic. If that is the case, it feels like it might be unexpected.