Keeping it Simple – unidirectional models

Keeping code simple is an art.  It may require a reasonable amount of work – refactoring and trying design experiments to keep it as simple as possible.  Changes to requirements may result in the current design no longer being the simplest thing any more.

One way to attempt to keep things simple is to reduce complexity by not implementing anything that a current use case does not need.  That way, changing the design only has to worry about code that is being used – there is no guessing game.

Another way to attempt to keep things simple is to reduce the paths that can be taken through your code base to reach a given point.  Making small cohesive units that can be reasoned about with a well-defined interface to the rest of the code base ensures you can change those cohesive units more easily.

Adding complexity

A way to increase complexity is adding unnecessary code.  This is often done because “it’s obvious that it is the right thing to do” or “we’ll probably need it” rather than driven from any actual use case.

An example of this can often be seen when using an ORM.  Given a model that is composed of attributes and a list of another type of model, a common pattern is for both sides to know about each other.  The parent model will have a list of children and the child model has a reference back to the parent model.  This is particularly obvious when the design is data driven (driven from the database tables) and the designer feels comfortable just grabbing any table / model and loading something from it and working from that view of the data.

A classic modelling example is an implementation of an Order.  An Order may have many Line Items in a list.  In the standard way that typical ORMs encourage you, the Order model will know about its Line Items and a Line Item will know about its Order.  This means that I can load an Order object and expect to be able to access its Line Items.  It also means that I can load an individual Line Item and directly access its Order.  And then I could use that Order to get the Line Items, one of which was the original object that started the request.

What’s wrong with that?

In the simplest implementation with limited complexity this may be fine.  But it could lead to some problems as the complexity of the solution grows.

Cyclic dependencies

One problem that can arise is cyclic dependencies – where the parent can call a child and the child can then call the parent and around we go.  This may be hard to reason about – particularly as the object graph grows in size.

Maintaining a cohesive world

These convenience methods increase complexity.  Having a design that can be entered from anywhere and needs to remain cohesive in any direction from that entry point increases complexity.  Additional code may be written to compensate.

It can lead to needing the world to be cohesive no matter which object I load and manipulate.  By allowing the Order to know about the Line Item and the Line Item to know about the Order, allows the option to add a Line Item separate from an Order, which may not be valid in the domain.  We may always expect a Line Item to have an Order and this could be violated.  In order to resolve this, we need to add validation on the Line Item to ensure there is an Order.

A potential design may be that changes to a Line Item may expect changes on the Order model.  If we can load the Line Item first, then we need to solve the consistency of the Order model.  Clearly we should add callbacks when the Line Item model saves in order to ensure the Order is up to date.

Maybe we have a tax amount stored on the Order.  When this updates, all Line Item’s need their totals to be updated due to the change in the tax percentage.  Clearly we should add callbacks when the Order saves to update this.

If I can access a Line Item directly, I can also change its total.  This may require the Order’s total to be updated.  Clearly we should add callbacks to update the Order when the Line Item is saved.

Now I’ve written a bunch of callbacks in order to keep my domain valid because I can load anything and use it.  I also may have introduced a cyclic dependency and a cascade of DB updates that I don’t control very well.  When I save the Order, all Line Items will be loaded and saved.  Each Line Item may change the total value in the Order… and if we do this badly we’ll get stuck in an unexpected loop.

Increased testing for unsupported use cases

In an ideal world, every use case that you build should have a test.  Putting in the back connections between two models should be driven by that use case.  If you don’t have a failing test case, don’t write the code.  If you don’t have a use for the code… don’t write the code.  But if you’re writing the code, write the test.  Which increases the testing burden – for unsupported use cases.

Containing complexity

Instead of providing inconvenient convenience methods, why not contain the complexity and not provide it?

Unidirectional linkages

For a start, could we make all models talk in only one direction?  What would that do to the code?  Suddenly we have reduced complexity.  In order to load one model, it can only be accessed one way.  There is no expectation of the other way.

For instance, assume that the Order knows about its Line Items.  But the Line Item does not have a back reference to the Order.  What would that mean?

A Line Item can never be created without an Order.

A Line Item is updated through the Order, therefore any consistency that the Order needs to maintain due to changes in the Line Item are simply done in the Order.

When the Order’s tax is updated, we now have a design decision – do we update all Line Items at Order save time?  Or do we update Line Items as we pass them out of the Order?  Suddenly we have control over the performance characteristics of this update.

The Order / Line Item relationship is now easier to maintain and reason about as the expectations of the relationship are simpler.

Line Items may still be read directly, but the design states that there is no expectation of accessing the Order.  In order to access the Order, use the order_id that may be in the Line Item object to directly load the Order and get a fresh object.

But that is more work!

Suddenly we have a little more work.  The little more work is clear and obvious.  We can’t load a Line Item and expect it to just get the Order.

I would argue that the backlinks have similar (and potentially significantly more) complexity – but that isn’t visible until later when you realise that the Order needs updating when you save the Line Item that you loaded directly.

Not putting in the back links acknowledges the real work to be done and the thinking required instead of pretending it isn’t there.

Let’s go further – the Aggregate Pattern

What would happen if we defined a model like the Order / Line Item relationship, but only allow any interaction to the Order / Line Item cohesive unit via the Order?  What would happen if when we interact with the model, it was always fully loaded and well defined and hence our expectations would be well defined.

This is the aggregate pattern, a pattern popularised by domain driven design.  The Order model would be the root of the aggregate. A Line Item would not be accessible to the domain except via the Order.  A design choice could be to only allow new Line Items being created via the Order.  Suddenly we have control of all interactions with the models.

There is more to the aggregate pattern and there are more implications and ideas, but I’ll write on that another time.

Keep it simple

Why not try to keep things simple instead of assuming things are necessary?  Experiment with patterns that have “extreme” ideas, as maybe there is something to learn.  Choosing constraints in your design that help simplicity and reasoning – even when it means slightly more explicit coding – can be a liberating thing.

Maintain your confidence in the face of change.  Keep it simple.

 

A potential caveat

Maybe your ORM needs the backlinks to do some of its magic.  If that is the case, it feels like it might be unexpected.

Advertisements

Bitten by Simple

I’ve recently done a piece of work around OAuth and integration with DotNetOpenAuth.  I wanted to build up the system doing the simplest thing at all times with the hopes of achieving the leanest, meanest code that I could for the implementation.  This worked well overall.

But… when building up a certain repository call to get a user – it was only being used by one controller (and still is) so I did the simplest thing and only built what that controller needed.  Life was good.

Then I changed my mind.  And I assumed that because my tests passed that my code was all working.  So I started using another user field that wasn’t being populated as it needed an additional DB call, and wasn’t needed initially.  And I got an unexpected bug.

But I do TDD.  My tests passed.  All should be awesome.  I have the green light.  I can deploy.

Wrong. Hmmm…

Doing the simplest thing I violated the principle of least surprise. I was surprised to not get a fully formed user from the user repository.  That was wrong.  Yes, the tests very clearly show me not testing the specific field and adding the test showed it wasn’t being set.  But the repository should always be returning a fully formed user object always. Yes, I should have remembered, but I’ve got a pretty good memory and I still assumed it was all working.

So what should I have done?

a)      Not violated the principle of least surprise!!
b)      Actually checked what the tests did – they are the specification
c)       Test higher up at the controller level where possible.
d)      Introduce integration tests

The problem with integrating to DotNetOpenAuth is that the controllers call it and then it calls us.  And there is very little code in the controller that we control in order to make it all happen.  But there is some.  In this case the combination of the login page and the resource fetch (after the OAuth interaction) broke.  But setting up a controller to have the correct stuff on it in order to give DotNetOpenAuth would involve me going very deep into understanding DotNetOpenAuth and what it needs.  I don’t actually wanting to test DotNetOpenAuth.  I do want to test my code around it.  As a result the controllers are being kept as simple as possible, but there is no automated testing around them.  And now I’ve been provably bitten for not having those.

And maybe I should actually want to test the interaction with DotNetOpenAuth – at least at the basic level.  An automated simple happy flow integration test would be useful here.  I’ve already got a manual test, but of course I got lazy.

So in future I plan to:
– Not violate the principle of least surprise… it isn’t worth it even if it is less code / simpler.
– Dig deeper into the synergy between unit tests – testing the code I’m writing – and integration tests – testing the full stack integration particularly the parts that are harder to validate otherwise.

And of course, continue to write the simplest code I can and retrospect when I get bitten next.

A simple failure

Do the simplest thing.
I’ve also heard: Do the simplest thing that you can refactor later.

In TDD, you attempt to do the simplest thing to allow the real design emerge and to not add unnecessary code that is not needed.  This tends to reduce the complexity of the code.  There is plenty of writing about this – such as http://www.extremeprogramming.org/rules/simple.html and http://martinfowler.com/articles/designDead.html#TheValueOfSimplicity.

More than a year ago I did a new piece of work.  The team completed it and I went back to look at the code and saw that I could simplify it in several ways.  One of those ways was to remove an unused parameter as it was always the same id for all the objects in the list returned.  We did the work and life was good.  I felt that we were achieving simple and changeable code.  I felt good.

A month or so later we needed to change the way we fetched these objects to now be fetched so that the single id was now actually varying.  A developer attempted this task for two days and we gave up doing the work as we had a simpler solution that the client was happy enough with.

I felt like the whole point of developing with tests and being safe was to make the code easier to change and here I failed.  Miserably.

So I retrospected…
Perhaps the removed complexity was important all along – as very rapidly it was needed.
Perhaps the code base isn’t as intuitive as it could be.
Perhaps the dev wasn’t good enough to see how to change the code easily.

All of these may be true.  In particular I observed two things:

  1. Misinterpreting the need: The usage that we had from the UI that we could see was simple – needing only the objects for 1 id at a time.  The common usage of related lists was often with varying ids.  I had assumed that was a complexity added unnecessarily, but in retrospect it actually was the needed requirement when analysing what the callers were really doing at a higher level.  The problem being that the callers aren’t doing that and the code is highly convoluted and hides that – but they should be.  Since then I have also change the way I look at building up things like this – in order to make things simple – and this would also have led me to the correct need of loading with varying ids initially.
  2. Expectation to change: I knew that the code was likely to change in this direction and I actively removed it in a pursuit of simpler code.  Perhaps looking ahead a miniscule bit is valuable to support the open/close principle in the short term.  If I had left the complexity it would already have been able to deal with this change.   I’m not 100% comfortable with this thinking.  In Agile Principles, Patterns and Practices in C#, Uncle Bob suggests only making such a change when the code changes in that way.  Guided by that principle, it shouldn’t have been there.  But in this case it was there already (it wasn’t test driven initially… this part was inherited.)  I do suspect misinterpreting the need is more the root cause, though this is worth pondering and entertaining and experimenting with the tension that gets introduced.

So maybe it is useful to think of: Do the simplest thing that you can refactor later while keeping in mind how the current design is already telling you it needs to be open for extension?

But more likely, simple is hard, and you need to dig deeper to ensure you have the depth of the real need the code is fulfilling.