Know where you’re going

A core idea in agile software development is that we don’t know enough right now so we should only build enough for what we actually do know right now.  This is sensible.  This idea underpins evolutionary / emergent / test driven design – allowing them to thrive.  If we can safely and confidently change the system in any direction from where it is now, then we are in a good place.  We are keeping the XP cost of change curve as flat as possible.

But the cost of change may become high if we need to continuously rework everything. If the knowledge is available to help make more informed design choices earlier, we should use it.

Know where the code is going

Good software developers are intentionally trying to make the code base better.  This is highly desirable.  Good software developers will refactor confidently and allow new designs to emerge.  When you have 6 good developers refactoring the same code base into what in their minds is the best solution it is plausible that you may land up with 6 very different designs going in 6 very different directions.  This might not be ideal.

What would make those 6 good developers into a great team is a shared understanding of how the code is being refactored so that we don’t land up with 6 different designs, but rather one collaborative design that everyone has contributed to – pulling it in the same general direction.

As a team, discussing about where we’re refactoring the code to is important.  A given pair might not reach the best design while working in this sprint and another pair may augment and grow that design in the next sprint.  If we share where we’re going, we can hopefully get somewhere in the region of the shared destination.  And we can more meaningfully converse about changes in design as the requirements change and agree on the next destination.

But don’t inhibit innovation and new ideas

The counter to this is that if a good developer is building something and they choose to build something different to the current design – maybe it is simpler, better, cleaner – this should not be inhibited.  Do not strive for uniformity and accidentally crush the very innovation that you need to channel from a good developer in order to have a great team.

Know where the business is going

A common way of breaking up stories is around the CRUD screens / actions.  It is sometimes easy for a Product Owner to define a series of screens that allow them to administrate some set of information. It is plausible that these screens are built without a conversation of why the screens are being built.  Possibly this is because the PO doesn’t yet know the exact details on how these things are going to interact.  But if this is the case, it is plausible that when the PO finally specifies how these CRUD things are going to interact in a meaningful way that the team could turn around and suggest that, based on how those screens have been built, that isn’t possible – or at least it is very hacky and we’re already building a legacy system with bad design.

Knowing something about where the business is going in the next sprint or 3 is useful.  Use it to track how the design is evolving and hopefully take this at least somewhat into account when designing the code.

But don’t future proof too soon

Don’t worry too much about planning for future needs that are trivial to add in a future sprint.  We don’t need to future proof the design.  We only need to know where it might go so that we can perhaps go there more easily later.  We need to know that the short/medium term goals will be supportable by the design we are creating today.  If it isn’t, it doesn’t matter too much, but the cost of change will probably go up if we need to massively refactor the entire system every sprint – so what can we do to help avoid that?

Know where the architecture is going

Systems are continuously evolving.  As architectural issues arise, an idea of what should be done to solve them should be discussed.  If we can agree on the direction that we need to move the overall architecture and understand what it could look like, then any team may be able to start building that within existing work that they have, in an opportunistic way.  If we have no clue where we would like to go then any team will probably continue to do exactly what they are doing now.

Alternatively, any team may start to innovate in different directions and again we may land up with the architecture of the system being pulling in several different disparate directions which may not be a beneficial in the long run.

If your teams discuss and plan what the architectural changes could look like, then we know where we’re going and we can start working out the baby steps to get there.  Those baby steps may be started to be done inside current work.  But without any clue of where we going we can’t even start to take any baby steps.

But don’t be too constraining

Architecturally knowing where you’re going is important – but the micro details shouldn’t be constrained.  It shouldn’t matter if we use redis or memcache in terms of an architectural solution.  Let the teams converse and decide as a group on the actual solution details when they are needing the solution and allow that to be based on the real experience and constraints they are being experienced in their teams.

Know sort of where you’re going

The precise destination isn’t important – the general one is. The details will vary.  The overview of the destination over the next 2-3 months hopefully will grow and evolve.  Despite that, hopefully it will still be comparable enough to call reasonably the same.

How abstract the destination is depends on the need.  For instance, it might be useful to sketch out the boundaries of several micro-services to have some idea of the destination and to give teams an idea of what they could break out of a larger system.  But don’t be married to those decisions if they turn out to be incorrect.

Use the knowledge just in time

Knowing the destination today should inform the code as a potential destination to refactor towards.  We shouldn’t be focusing solely on the road directly in front of our feet.  We should be looking up into the horizon and gleaning any knowledge we can of what is up ahead.

Knowing where we are going will allow the whole team to pull the code and system in a similar direction.

Knowing sort of where we are going should lead to an informed just in time decision about when to do something with that knowledge.

Advertisements

DDD, Aggregates and designing for change

I have been introducing several domain driven design patterns to the teams that I work with over the last while.  In doing so I have been struck by the repetition of a core principle for many of the DDD patterns.

An example – the aggregate pattern
The aggregate pattern focuses on the business domain, attempting to answer the desire for a software developer to work with fully formed domain objects that interact with each other in a meaningful way.

The aggregate pattern defines the aggregate as a combination of objects that interact as a single unit.  The aggregate is only interacted with via the root of the aggregate.  This ensures that the aggregate root can maintain the integrity of the whole aggregate.

If all interactions with the aggregate are via the root aggregate this means that we can control the implementation of the aggregate behind the interface of the root aggregate.  The aggregate should be designed as a cohesive unit and is decoupled from the rest of the system via the aggregate root’s interface.  The aggregate root’s interface is the API to the aggregate.

Tensions
The complexity is in keeping the aggregate small while useful.  Tension exists when the aggregate gets bigger. The aggregate interface can become complex due to the constraint that all access must occur through the root aggregate.  This tension may push one towards designing smaller interoperable aggregates rather than allowing a large ball of mud to be formed.

Unidirectional?
Just as a design choice could be to have unidirectional models; it may also be an interesting design choice to build an aggregate with unidirectional models – with the root aggregate being the model that knows about (potentially) all the child models.  This may be useful.

My personal preference is to try to keep things unidirectional for as long as it is sensible, but as soon as it isn’t sensible allow connections in both directions.  The aggregate root is controlling the access to the objects in the aggregate.  Allowing objects in the aggregate to know about each other bi-directionally increases the complexity of the aggregate as a whole but, assuming a reasonably small aggregate, the principle of small contained messes is still supported behind the interface exposed by the aggregate root.

The core
Circling back to the core principle for many of the DDD patterns – the factory pattern, the repository pattern and the aggregate pattern all define mechanisms for creating a well-defined interface and hiding the implementation details behind the interface.

The factory pattern provides a creational interface to build an object.  We get well formed objects from it and don’t need to know how they were formed.

The repository pattern provides an interface that abstracts away object storage / query.  We ask it a question and get objects back.  We tell it to persist something, and it happens.  We do not need to worry how.

The root aggregate in the aggregate pattern provides an interface that abstracts away the implementation of the aggregate.

These patterns make code simpler and reduce complexity by clearly defining what should go behind the interface and what should not.  These patterns allow the caller to not worry about the implementation details behind the interface.  All of these patterns support the idea of smaller messes.  If the implementation behind the interface is a little messy, it doesn’t matter, as long as it can be refactored safely later and the caller is not influenced at all.

Decouple the interface that the outside world uses from the implementation underneath.  And know why the code is placed behind the interface.  Design a cohesive unit behind the interface.  This is also how TDD encourages code to be written, assuming you can design the interface well.

In practice
When discussing domain driven design versus other design ideas and using test driven design to build software, there are often questions around which pattern to use or how they should interact.  These questions are often attempting to get black and white answers to a complex contextual problem that probably has many shades of grey in it.

What has struck me most about introducing DDD after introducing TDD and emergent / evolutionary design is that the core principle is to contain the messes behind the interfaces.  Have good interfaces and decouple them.  And ensure the implementation behind the interface is cohesive and can change freely as needed.  That is the core of software design and most patterns.  How do I change later?  How do I keep in control of the code so that when change comes (and it will come! Especially in unexpected ways) it is not accidentally impactful on the rest of the system.

Focus on embracing change
Worry less about the patterns, than what the patterns are trying to teach you.  Use the patterns, understand them, they are a language that can be used effectively among developers.  But the patterns are not the end game, the changeable system that they encourage is.

Keeping it Simple – unidirectional models

Keeping code simple is an art.  It may require a reasonable amount of work – refactoring and trying design experiments to keep it as simple as possible.  Changes to requirements may result in the current design no longer being the simplest thing any more.

One way to attempt to keep things simple is to reduce complexity by not implementing anything that a current use case does not need.  That way, changing the design only has to worry about code that is being used – there is no guessing game.

Another way to attempt to keep things simple is to reduce the paths that can be taken through your code base to reach a given point.  Making small cohesive units that can be reasoned about with a well-defined interface to the rest of the code base ensures you can change those cohesive units more easily.

Adding complexity

A way to increase complexity is adding unnecessary code.  This is often done because “it’s obvious that it is the right thing to do” or “we’ll probably need it” rather than driven from any actual use case.

An example of this can often be seen when using an ORM.  Given a model that is composed of attributes and a list of another type of model, a common pattern is for both sides to know about each other.  The parent model will have a list of children and the child model has a reference back to the parent model.  This is particularly obvious when the design is data driven (driven from the database tables) and the designer feels comfortable just grabbing any table / model and loading something from it and working from that view of the data.

A classic modelling example is an implementation of an Order.  An Order may have many Line Items in a list.  In the standard way that typical ORMs encourage you, the Order model will know about its Line Items and a Line Item will know about its Order.  This means that I can load an Order object and expect to be able to access its Line Items.  It also means that I can load an individual Line Item and directly access its Order.  And then I could use that Order to get the Line Items, one of which was the original object that started the request.

What’s wrong with that?

In the simplest implementation with limited complexity this may be fine.  But it could lead to some problems as the complexity of the solution grows.

Cyclic dependencies

One problem that can arise is cyclic dependencies – where the parent can call a child and the child can then call the parent and around we go.  This may be hard to reason about – particularly as the object graph grows in size.

Maintaining a cohesive world

These convenience methods increase complexity.  Having a design that can be entered from anywhere and needs to remain cohesive in any direction from that entry point increases complexity.  Additional code may be written to compensate.

It can lead to needing the world to be cohesive no matter which object I load and manipulate.  By allowing the Order to know about the Line Item and the Line Item to know about the Order, allows the option to add a Line Item separate from an Order, which may not be valid in the domain.  We may always expect a Line Item to have an Order and this could be violated.  In order to resolve this, we need to add validation on the Line Item to ensure there is an Order.

A potential design may be that changes to a Line Item may expect changes on the Order model.  If we can load the Line Item first, then we need to solve the consistency of the Order model.  Clearly we should add callbacks when the Line Item model saves in order to ensure the Order is up to date.

Maybe we have a tax amount stored on the Order.  When this updates, all Line Item’s need their totals to be updated due to the change in the tax percentage.  Clearly we should add callbacks when the Order saves to update this.

If I can access a Line Item directly, I can also change its total.  This may require the Order’s total to be updated.  Clearly we should add callbacks to update the Order when the Line Item is saved.

Now I’ve written a bunch of callbacks in order to keep my domain valid because I can load anything and use it.  I also may have introduced a cyclic dependency and a cascade of DB updates that I don’t control very well.  When I save the Order, all Line Items will be loaded and saved.  Each Line Item may change the total value in the Order… and if we do this badly we’ll get stuck in an unexpected loop.

Increased testing for unsupported use cases

In an ideal world, every use case that you build should have a test.  Putting in the back connections between two models should be driven by that use case.  If you don’t have a failing test case, don’t write the code.  If you don’t have a use for the code… don’t write the code.  But if you’re writing the code, write the test.  Which increases the testing burden – for unsupported use cases.

Containing complexity

Instead of providing inconvenient convenience methods, why not contain the complexity and not provide it?

Unidirectional linkages

For a start, could we make all models talk in only one direction?  What would that do to the code?  Suddenly we have reduced complexity.  In order to load one model, it can only be accessed one way.  There is no expectation of the other way.

For instance, assume that the Order knows about its Line Items.  But the Line Item does not have a back reference to the Order.  What would that mean?

A Line Item can never be created without an Order.

A Line Item is updated through the Order, therefore any consistency that the Order needs to maintain due to changes in the Line Item are simply done in the Order.

When the Order’s tax is updated, we now have a design decision – do we update all Line Items at Order save time?  Or do we update Line Items as we pass them out of the Order?  Suddenly we have control over the performance characteristics of this update.

The Order / Line Item relationship is now easier to maintain and reason about as the expectations of the relationship are simpler.

Line Items may still be read directly, but the design states that there is no expectation of accessing the Order.  In order to access the Order, use the order_id that may be in the Line Item object to directly load the Order and get a fresh object.

But that is more work!

Suddenly we have a little more work.  The little more work is clear and obvious.  We can’t load a Line Item and expect it to just get the Order.

I would argue that the backlinks have similar (and potentially significantly more) complexity – but that isn’t visible until later when you realise that the Order needs updating when you save the Line Item that you loaded directly.

Not putting in the back links acknowledges the real work to be done and the thinking required instead of pretending it isn’t there.

Let’s go further – the Aggregate Pattern

What would happen if we defined a model like the Order / Line Item relationship, but only allow any interaction to the Order / Line Item cohesive unit via the Order?  What would happen if when we interact with the model, it was always fully loaded and well defined and hence our expectations would be well defined.

This is the aggregate pattern, a pattern popularised by domain driven design.  The Order model would be the root of the aggregate. A Line Item would not be accessible to the domain except via the Order.  A design choice could be to only allow new Line Items being created via the Order.  Suddenly we have control of all interactions with the models.

There is more to the aggregate pattern and there are more implications and ideas, but I’ll write on that another time.

Keep it simple

Why not try to keep things simple instead of assuming things are necessary?  Experiment with patterns that have “extreme” ideas, as maybe there is something to learn.  Choosing constraints in your design that help simplicity and reasoning – even when it means slightly more explicit coding – can be a liberating thing.

Maintain your confidence in the face of change.  Keep it simple.

 

A potential caveat

Maybe your ORM needs the backlinks to do some of its magic.  If that is the case, it feels like it might be unexpected.

Discoverability – A naming choice

When reading or refactoring code, it is important to be able to easily find all callers of a method or to go to the implementation of a method.  In static languages, IDEs make this reasonably easy.  However, if reflection is being used, they may still fail us.  It is simply easier to parse a static language to know the types involved and find the right ones to tell us about.  In dynamic languages that is much harder.

In a dynamic language, a key way to find all references to an object’s method call is “Find in Files”.  This means what we choose to name things may make it harder or easier to change later.  It may also make it harder to discover who is calling the method – or even that the method exists.

A unique name

In order to refactor a uniquely named method on a class

  • search for the method name as a string
  • rename

As we know it is unique, this will work.  In fact, you might be able to run a simple find and replace in files instead of looking at each result individually.

This scenario however is unlikely.  At least it is unlikely that we emphatically know that a given method name is unique.

A more likely scenario

In order to refactor a descriptively named method such as full_price_for_tour on a Tour class

  • search for the method name as a string
  • in each search result – check all references to the method name to see if they are in fact using a Tour object
  • if this is a Tour object call, rename the method call to the new name.

This is more work as we need to look at each result.  Hopefully with a descriptively named method the number of usages will not be too high.  Even if the number of usages is high, hopefully all usages of the name will in fact be on the Tour class.

However, we do need to look at each result as this process is potentially error prone.  There could be other method definitions using the same name that we need to NOT rename.  Hopefully there are tests that will tell us if we fail.  And hopefully the number of callers to change isn’t too high due to the descriptiveness of the method so the changes to the callers is clear.

Sometimes the results are less simple

Now imagine repeating the above exercise, but now the name of the method to refactor is name.  Suddenly we may have a huge number of hits with many classes exposing a name method for their instances.  Now the ratio of search result hits that are to be updated is no longer almost 100%.  The probability of error is much higher – the greater the number of hits, the more actual choices that need to be made.

An IDE may help

Immediately the IDE lovers will point out that, using an IDE is the solution.  And yes, it could help.  But IDEs for dynamic languages are generally slow and CPU/memory intensive as the problem is a hard one to solve.  And they won’t always be correct.  So you will still need to employ strategies using a human mind.

Naming things more explicitly can help

A more useful model – even if you’re using an IDE – is to name things descriptively, without being silly.  Things like tour_name and operator_name instead of name may help someone discover where / how a method is being used more easily.

Designing code to only expose a given interface can help

Building cohesive units of code that only interact through a well defined interface makes changing behind the interface a lot easier.  However it still doesn’t discount developers reaching in behind the curtain and using internals that they should not.  So you will still need to check.  Hopefully code that breaks the design like this will be caught before it gets merged into the mainline, but you never truly know without looking.

Reducing scope where possible can help

Knowing the scope of access of the thing you need to change can make changing it easier as it reduces the area you need to look in.  For example, if something is a private method then we know that as long as all usages in this class are updated, then we are completely free to change it. Unless someone has done a private_send from somewhere else…  Or we are mixing in a module that uses the private method from there… Both of which I’d like to think no one would be that silly to do.

Testing can help

Obviously having a comprehensive test suite that calls all implemented permutations of the usage of the method will help to validate a change.  It will hopefully help us discover when we have missed updating callers.  Or when we’ve accidentally changed a caller that shouldn’t be changed.  However if there are name clashes for the new name, it is plausible that it might not give you the feedback that we expect so it isn’t a silver bullet if you aren’t naming things well.

Think! 

Think about naming.  Think about discoverability.  Is there something that will make changing this easier in the future?

Think about the cost of making discoverability harder.  Be aware of the implications of a naming choice.  Is there something that can be done to make it easier to safely refactor away from this choice later?

Can we make things worse? Discoverability is a design choice to make easier or harder.

Coding with confidence

Change is inevitable. It is the one constant in software development.

I find optimising to be confident in the face of this inevitable change valuable. I want to be confident that what I’ve done – what I’m releasing – really works.

When changing software I’ve come to ask several questions that help me to be confident in the changes that I make.

  • How will I know when something breaks? Does it matter if some given functionality breaks?
  • How easy is it to understand?
  • How localised would changing this be in the future in any new direction?
  • How confident am I? How can I be more confident?

This leads me to a list of principles that I currently value when writing software.

  • Feedback
  • Simplicity
  • Changeability
  • Ease of Understanding
  • Deliberate Intent

These principles lead me towards certain practices that I currently find useful.

Feedback
In order to know when something breaks a feedback mechanism is needed. Waiting for feedback from an end user that you messed up is far, far too late. The best time for that feedback is the instant the change was made. That is the time when the knowledge is known about what the breaking change is and hence how to fix it. This means testing. Testing at the right levels. Robust testing to ensure that the feedback is as clear, direct and immediate as possible.

Emergent design and Test Drive Development combine to provide feedback. The tests confirm that the implemented requirements are still working.

In order to build something that I’m unsure of, I want to do the smallest thing possible to validate whether what I’m trying to do will work. I cannot write tests when I’m unsure of how it will be called or of how it works. I don’t want to build something to discover that what I thought I was being called with isn’t actually want I’m being called with.

Unexpected things can happen. If I don’t expect them, I may not plan for them. But often I will implement ways to log or notify the unexpected – for when the unexpected happens. For instance using logs to log unexpected errors, or tools like Airbrake (for error catching and logging) or New Relic (for real time stats).

Simplicity
Systems become complex. Large balls of mud become very complex. If one assumes entropy in software is inevitable, any system will get messy. Instead of one large mess I try to focus on creating lots of small messes. Patterns I use to help towards small, contained messes include the Aggregate pattern; valuing Composition over Inheritance; separation of IO and CPU operations; Pipes and Filters; and Anti-Corruption layers.

Emergent design helps me keep a solution simple by only implemented the required functionality and allowing the design to emerge effectively behind the specific interface.

Changeability
I try to isolate the same domain concepts using DRY. But I try not to isolate accidentally similar concepts. As Sandy Metz puts it – prefer duplication over the wrong abstraction.

Test! Tests provide me feedback on the change that is being made. The expected tests should break. Tests that break unexpectedly should be looked at to understand the unexpected coupling in the system. I try to focus on increasing cohesion over reducing coupling. Increased cohesion provides a better-defined system – while also reducing coupling.

I aim to isolate change with design practices and test practices. The goal is to test at the right level. I try to test the interface so that everything below the interface can be allowed to emerge in whatever sensible design is needed for the requirements now. I encapsulate object creation in tests to allow the creation of the object to change over time without having to change a large number of tests. The goal is that any change in any direction is as painless as possible.  This doesn’t mean predicting the change, but rather isolating the obvious changes and hence limiting their impact when they do change.

There should be no area that is too scary or painful to change. When there is the goal becomes to find ways to reduce the fear or pain – assuming it still needs to change.

Ease of understanding
The ability of someone else (or yourself in 6 months time) to understand the existing code is highly underrated. Code is write once, read many. Every time an issue occurs near a given part of code, the code may need to be understood. If it is clear and obvious and is able to be trusted, it will be quick. If not, it could add 30 minutes or more to understand the code even if the code is not relevant to the problem. The cost of not being able to read and understand the code quickly is high over the lifespan of the system when considering the number of people who will read the code over time in their quest to make changes and solve problems.

I use Behaviour Drive Design to drive what tests to write. If I can’t write the test or the code cannot be built from the test, then I do not understand what I’m doing yet. The tests describe the intent of the code.

Discoverability – how will someone else discover that the code works like this? I try to ensure that breadcrumbs exist for a future developer to follow. If the knowledge exists for a developer to follow, then they can ask the question – do I need this.

I value explicit over implicit. If developers are joining your team, how will they know how the system works? Code is obvious. Implicit rules are not. Frameworks such as Rails come with a lot of implicit rules. These rules can be very opaque no matter how experienced the new developer is. You don’t know what you don’t know. The power of Rails is the weakness of Rails. But good Rails developers can move between Rails projects and know the implicit rules. But what if there are implicit rules that are specific to the code base? Those are hard to know. Again – you don’t know what you don’t know… until you realise that you didn’t know it and someone tells you – or you go really deep. Either way a lot of time can be unnecessarily lost grappling with the unknown.

Deliberate intent
I try to make conscious, well-reasoned design decisions. My goal is to keep the exposed interface tight and deliberately defined. For any solution I try to think of at least three ways to solve the problem and then make a choice based on the pros and cons. This ensures that I’m thinking deeply about the problem instead of simply implementing the first design thought that arose in my mind.

A continued experiment
These are my principles and practices that I’m experimenting with. Over time I imagine they may change or modify as I continue to experiment and learn.

In order to experiment I generally try applying a constraint and see what it does to the code base. Do we like it? Does it make things better? How do we feel about the result? To paraphrase Kent Beck – If it doesn’t help, stop doing it. If it does, do more!

I continue to look for the experiments that I can do to make things better along with the underlying principles and values that they represent. I hope to blog about some of those experiments in the future.

There is a lot more that could be said about each of these principles. I hope to follow up with related posts digging into different implementations and designs that I have derived from these principles.

 

Credit
Credit goes to Kent Beck’s Extreme Programing Explained that focuses on Values, Principles and Practices. I find this an incredibly useful way in which to view software development.

Libraries

Libraries are useful. Using a library in order to accelerate your development is a great thing. It is fantastic if someone has implemented a generic solution that means that my team does not need to learn and implement the significant details of that solution. The business goal is usually not for a team to learn the deep details of a solution, but rather to achieve the goal. Using a solution – for example an Excel library to export some data to the Excel format – instead of writing it ourselves – meets the business goal faster (hopefully).

But is it always a better idea to use a library than to write it yourself? And how do you choose between one library and another?

Complex vs. Trivial

It feels reasonable to include a library to do something complex that would take a significant amount of time and effort to achieve.

It feels unreasonable to include a library to do something trivial that can quickly and simply be written yourself.

Adding a library introduces a new external dependency to the code base that is currently not there. It will probably introduce the dependency for much of the rest of the lifetime of this feature – which could be a significant length of time. Dependencies come with their own dependencies and those dependencies with theirs, and so on. External dependencies can cause dependency conflicts. They can cause pain when upgrading. They can no longer be supported or work in the future. For example, some Gems (Ruby library dependencies) are no longer supported when moving to Rails 3 from Rails 2 – which causes significant pain in those migrations as the dependencies need replacing – either with code written by the team, or a new dependency. Either way, lots of testing will occur.

Use a high percentage of the library

It feels unreasonable to include a large library when you only use a very small percentage of it.

Including a library that has many more reasons to change than you are using it for will cause the library to be updated and upgraded for many reasons that are unrelated to the reason why the library is in use by your code. This could cause future pain with upgrade churn with no value. The library may even be obsoleted for unrelated reasons to why you are using it.

A software decision is made at a moment in time

Using a library is a software decision made at a moment in time. At that moment the choice made might have been the correct solution to the problem. However at any moment in time in the future it may no longer be the correct solution. That is the reality of supporting changing business requirements and needs. A question that should always be asked is:

What if this decision is the wrong one? How would we refactor out of this decision?

We all know that in software, we learn more as we go. The key question with any software that we develop should be how do we refactor out of this decision if we learn more and the decision is no longer the correct one for the business needs now.

Can you get out of the decision to use a given library? How would the code look to protect against the need to stop using the library and easily replace it with another library or your own code? What would it look like if you tried to do that for a framework choice?

Isolate the External Dependency – Know Thy Seam

Micheal Feathers talks about seams in his book Working Effectively with Legacy Code (http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052). Seams are a generically useful pattern. The seam is the place that you can test around. Which also means you can replace what is behind the seam with any other code that supports the tests and everything should continue to work.

Single use dependency

If this is the first (and probably only (as known right now)) time that we will use this dependency, then wrap the usage of it in tests inside the class that uses it.

For instance if you’re using an Excel exporter library and this is the only place that will ever use it, ensure that it only is used / included in this component / class and not auto-loaded throughout the entire code base. Then ensure that the class that uses it is covered for tests for the scenarios that use the library. This class is then the seam that allows you to keep the library under control.

To continue with the Excel example – make sure you load Excel files in your tests and hence test your code’s usage of the library’s interactions and expectations of how Excel files will be represented to your code. If you do not, and the library changes, you will not know if the library has broken your code.

Multi use dependency

If this is a generic piece of code that will be used in many places, wrap it in your own wrapper, ensure that it only is used / included in the wrapper and not auto-loaded throughout the entire code base. Test the wrapper and its interactions.  The wrapper is now the seam that allows you to keep the library under control and easily replaceable.

For the Excel library example, make sure representative Excel files that you expect to interact with are covered with tests.

Don’t be naïve about the altruistic nature of library writers

The thing with frameworks and libraries is that they are written by people who are (most likely) experiencing different forces to what you are experiencing in your code base. Their decisions are dictated by the forces they are experiencing, not the forces the business you are writing code for is experiencing.

Things change over time. Decisions that are made now may change in the future. Your needs of a library may change. The direction and intent of the library may change. You may in fact be using some magic of the library in an unanticipated way, which may break for you in the future due to a library code change. Things change, things migrate and upgrade. It isn’t the intent of a library creator to mess you up, but you are making decisions about using their code while not being in control of their code. This may cause you (or more to the point the business who’s code it is) a bunch of pain in the future.   So ensure that you remain in control when their change causes you pain.

Recommended Do Not Do Suggestions…

Do not smear a library around multiple classes.  Later you may discover you can no longer use it. In Rails, many Gems provide generic ‘helpful’ solutions, like attachments that hook on to your models. But what if on migrating to a new Rails version the Gem is no longer available? If you are not in control, then a lot of rework is required to dig yourself out of the hole. I have seen this in a Rails 2 to Rails 3 migration, which resulted in significant problems with Gems that were no longer available.

Do not hook yourself so tightly to the library of choice in a lot of places that it is hard to replace. .NET comes with a reporting library which from Visual Studio 2005 to Visual Studio 2008 was completely rewritten by Microsoft and you could no longer load the old reporting components in the new VS2008. With a lot of reports and limited abstraction layer and high coupling between the .NET reporting components, a lot of code needed rewriting to migrate to the newer version. Be aware that even when you think the choice is obvious and will never change, the library writer may force you to change and prove your decision to not protect yourself to be incorrect.

The economics of the issue

Fast now always looks better than being marginally slower now for the potential faster in 2 years time. This may be a valid trade off for a company that has no money and is desperately attempting to survive. But once you’ve moved from survival mode to agility in being able to change as needed, then the trade off becomes easy. And some may argue you might want agility in survival mode just as much – or even more.

Stay in control

Keep control of your code and your dependencies. Make it easy to replace things. Any time a decision that was made is no longer a good one, refactor to replace.

And always question how you will refactor away from any given external dependency. Knowing that will ensure that your code will remain agile.

Frameworks

Frameworks are awesome. Web frameworks such as Rails, .NET MVC, Django, Node, Flask, Sinatra are powerful accelerators of software development. They provide scaffolding that you can use, but you don’t necessarily have to write or maintain.

Frameworks are alluring. They often focus on providing features that are geared to accelerate development and are eminently demoable. Developers are encouraged to use all features everywhere. Those demos never seem to discuss the problems that could arise by using the feature as the system grows over the years. The demos do not discuss how to protect the system using them against how the feature may change.

Experience

My experience over the years while working on large and small systems has correlated extensive coupling to a framework across the whole code base to almost always being a bad idea. For a small enough system it can work. But systems seem to rarely stay small without a lot of premeditated effort.

I’ve seen front-end JavaScript messes where the only way out is a complete rewrite, as the framework wants to own the world and the technology choices are no longer correct for the business requirements. I’ve seen back end .NET, Java and Ruby implementations where you’re encouraged to build systems tightly coupled to the framework.

Systems make it easier or harder to do the right thing. A system that makes it easy to do the right thing allows its developers to fall into the pit of success, no matter the strength of the developer. Frameworks far too often encourage designs that are tightly coupled to the framework. As a result, better developers are needed in order to ensure a good, sustainable design.

An alternative design

An alternative design is to keep the framework at the edges of the system. This is similar to (but possibly not as hardcore as) the hexagonal architecture / clean architecture / screaming archicture. [https://blog.8thlight.com/uncle-bob/2012/08/13/the-clean-architecture.html]

Controllers, in the Rails and .NET MVC sense, are about converting the web into API calls to your code. Views are about displaying the information that they need. Database objects – active record or inside a repository – are for accessing the database. The business logic that the controllers call, that gets presented to the view, that transforms inputs into models that could be persisted – that doesn’t need a framework. Any code written in your language that does not rely on your framework is not going to break when the framework is upgraded – or replaced.

When using a database, keep all the data access behind an API that returns single, or lists of, objects that you can use. Pass those objects to be manipulated with your system’s logic in non-framework code. Save the results back to the DB by passing objects (or hashes) to the database API.

Implementing the database code behind an interface means you can change it at will – even swapping out a database implementation if you must. It is a single seam for data access of the application. Keeping database access at the edges of your code allows optimising database calls in one place instead of having multiple calls spread throughout the business logic code. This may result in loading more data at one time initially, but deal with a memory problem when you have one, rather than smearing your logic and your framework code together across your code base.

For example:

def controller_export_method
// Get the parameters from the web
firstname = params[:firstname]
lastname = params[:lastname]
// Fetch objects from the DB
contacts = Contacts.fetch_for_name( firstname, lastname)
// Do something with the objects
results = ContactExportFormat.transform(contacts)
// Persist to the DB / export and assign the result for the view to display
@result = ContactExportLog.persist(results)

// send_file / send_data the export
end

Selectively break the rules, when you must

When you need to optimize some code due to speed or memory constraints, then potentially mix logic and database and other external accesses, but as a last resort. Lean towards enumerable objects that do lazy evaluation and yield to the business logic so that these dependencies are clear and separate. But start with clean, clear separations and muddle them up for a clear and obvious external business goal of performance.

Focused Usage

Focused functionality provided by a framework is great. For example routing to a controller is a common pattern implemented by many web frameworks. It is something that is clear and contained and testable. The goodness does a specific thing and it can be tested and proven to work.

Unfocused, extensive framework usage is a problem. The understanding of the reason for its usage can be lost. Once it is lost, it becomes incredibly difficult (if not impossible) to get it safely under control again as no one is really sure what it is doing or why it is there.

Focused usage allows change to be isolated. If the framework makes a change to how params are dealt with, then if we know we only deal with params in controllers, then we know where to look to fix it. If we’ve allowed params to be used randomly throughout the system we’re in trouble. Isolate change. And make sure the behaviours that are expected are codified in tests so we can know if the framework changes that things still work.

The Framework Moves

Frameworks move at a steady pace. Bugs are fixed, security patches are applied, features are added and deprecated. All of this happens at a reasonable rate. For most code bases that live for more than a couple of years, the framework will be able to be upgraded multiple times. The less code that is written that is coupled to the framework, the less painful that upgrade will be.

And so

If all your code inherits from a framework class, you are doing something wrong.

Limit frameworks to do well understood, localised and controllable things. These could then be replaced with some other version of the framework easily.

Frameworks that force you to design in a specific way and are smeared around your entire code base will probably cause a world of pain in the long term.   (Or lets be honest, probably not you but the business who owns the code and the poor guy who now owns the code.)

It can be faster to just do it the framework way initially. In the short term. But it might not be responsible in the long term. The economics of short term gain over potential later cost are hard. But if you expect the code to be around in more than 2 years, they should stop being hard.

Frameworks are awesome.

Held at an arms length. Under your control.

But keep in mind, non-framework code will always survive a framework change.