Getting data out of your system

We have data. It needs sharing.  What are the options?

One must assume that the data is required, so sharing it is necessary.

We could share the database: I’ve written about burning my fingers on doing this as well as the drawbacks of doing it if you’re forced to.  So what are other options?

If we do not provide SQL as our API, then we need to build something to provide that.

We could build an HTTP API to be called.  This API could provide the state of the data now, or we could attempt to provide a solution that provides a flow of events.

We could write data to another database and that is the contract that we share.  This still provides SQL as the API but at least the main application database can change at will as long as the contract into the secondary database is maintained and supported.

We could event the data and another service could listen to the events, reads the data and makes it available in another database and that is the contract with the consumer(s).

It is inevitable that something needs to write the data somewhere for the other application that wants it to gain access to it.  It will depend on what that application is as to what makes the most sense. But something will need to be built to provide the data.

There are many solutions that will hopefully keep your system changeable.  Find one that does not require large amounts of up front work, continuous maintenance, and provides the feedback needed to tells us if we are breaking our consumers.

Advertisements

What are the drawbacks of database integration?

The simplicity of integrating at the database is very appealing.  I’ve been draw into the trap of being caught out and getting my fingers burnt in the past.  So what are the real drawbacks?

TL;DR

  • Why isn’t this the norm for all 3rd party integrations? Those reasons all apply.
  • Avoid writing from multiple applications
  • Experimenting to learn is different to maintaining something long term.
  • Be aware that sharing the entire database is exposing the entire database as the public API.
  • Not knowing what you can change freely introduces friction.Reducing friction is a Good Idea.
  • Provide feedback to know when something has changed that should not have. Contract style tests may help.
  • Take steps to limit the exposed API/contract. Agreeing to share only specific tables may help. Maybe only share tables with a certain prefix.

Why isn’t this the norm for all 3rd party integrations?

Start with thinking about why we generally expose APIs instead of providing SQL database access directly.  All of those reasons will be present whether providing the integration point to a 3rd party company or to any other team – internal or external.

Below are some thoughts on what could be some of those problems.

Avoid writing from multiple applications

There are many patterns that try to help avoid updates from disparate places.  OO discourages direct updates through encapsulation – you cannot touch my data except through my interface.  In DDD you can use the aggregate pattern in order to help reason about changes to data which must all go through the Aggregate Root. The functional solution is to make everything immutable.  When using a pattern like Master Data – we’re trying to solve this problem by making an application / service the single source of truth.  All of these mechanisms are intended to help us reason about software. Software that writes to the database from multiple different codebases can be very hard to reason about.

Writing data from multiple applications violates the Don’t Repeat Yourself principle.  Knowledge about the tables, their defaults and their validations all need to be shared across multiple applications.  It is also hard for a developer to discover that a second application is writing to the table.  Doing a search in the repo they are working in will not find the access.  This can result in applications writing different defaults, validations, data types or not being aware of new columns or the meaning of new data.

A mitigation to this is to build a gem to share.  This is great until we do not keep all the applications on the same gem version and therefore we still have the same problems.  We need to lock step deploy every application using the gem to keep them in sync.  We now are building a distributed monolith.

What about read only – surely that is fine?

Maybe.

If you’re doing it to experiment and learn if something (like a 3rd party tool) is valuable, then definitely learn with direct integration.  After we’ve learnt, consider what the long term implementation is.  The pain that grows with this solution is in the long term maintenance and changeability of the solution, not the short term win.

For a long term solution, if you never change the database schema at all, then this is the quickest, easiest and – if you’re 100% right – most painless mechanism.  If you’re wrong, you’re probably in for pain.  How much pain depends on how rapidly things need to change and potentially how much the consumer is actually interested in.

The biggest problem is that we don’t know what will happen in the future, so assuming that we will never change the database in the future may always be a naïve choice.

What’s the big deal?

When you share your database you share your entire schema to be coupled to.  This is like exposing all of your data via an API where the API is every single SQL query that can be made.  Any change that is made to the schema can impact the consumer. If you give the entire schema that means you may need to ask permission for any change to your database schema as you do not know if the change you are making will impact the consumer.

The database schema is the API contract

Any shared API is a shared contract that the team owning the API guarantees to keep for all consumers.  When changing the API there needs to be negotiation and planning.  When we provide our database to another team to consume, the entire database becomes the contract that we guarantee to maintain. And any changes to that contract must be negotiated.

Why is that bad?

If we need to ask permission to change something it will introduce friction.  We need to remember to talk to another team. We then need to actually talk to the other team.  Other teams have their own priorities.  Given the best will in the world, communication still takes non-zero time.  And it will need to happen for every single change we make.  This slows us down, particularly when we’re making a lot of changes.

Friction slows us down.  Can we solve that?

The friction around needing to ask permission can introduce several responses

  • We could start to work really, really hard to not change data in the database.
  • We could start to think really, really hard up front before we build anything.
  • We could start to think about mechanisms that can reduce the number of things we need to communicate about.

The first two options introduce negative feedback cycles.  We spend more time in order to achieve them instead of less. And, given that things will change and we can’t know everything up front, we may make even more of a mess of the data when optimising for them.

Communication is important, so making sure the things that we’re communicating about are valuable is useful.  It keeps the signal to noise ratio high.  Starting to think of ways to reduce the number of things we are required to communicate about is useful, especially when that does not increase the time spent doing the new thing.  Ideally we should only communicate when the consumer must change.

Can’t we mitigate these problems?

We are smart developers, of course we can introduce mitigations!

We can introduce rules – like adding columns is always allowed.  If we are allowed to add columns that is one change we don’t need to communicate about.  This is a net win as most likely the consumers would generally not notice additive changes. We need to jointly make this rule as it is plausible that consumers could notice and fail.

The consumer could document what they are using.  If we can find out what is being used, then we can self-serve.  The only catch is – what if it isn’t up to date. Humans are fallible so this may happen. This solution moves the work that was done by the publishing team in needing to communicate to the consuming team as they always need to document.  But we’re still doing that work.  A failure may result in a negative feedback cycle – be better! Document more!  Spend more time achieving something that will fail any time a human forgets.

Any mitigation that does not involve feedback from code / tests will potentially fail at some point.  If we can introduce automated mechanisms that tell us that we’re breaking something we could get ahead of that.  Automation does not forget.  If it is setup to run, it will run every time.  If we can get automated feedback, then we know that we’re impacting and can get ahead of the game.

Contract Tests

A good mitigation might be to come up with solutions similar to those for HTTP APIs – perhaps some of the contract testing ideas could help.  For contract tests, the consuming team writes tests that test the API how they use it.  The team exposing the API cannot change these tests.  Their system needs to keep them alive.  This is a great feedback cycle for exposed APIs.  And then the two teams can negotiate about how their usage can change.

The negative part about defining a contract on the base schema of a database is that it represents the base model that we’re trying to build and invest in.  This is where we’re experimenting and learning internally to the team and the system.  Having that coupled to an external consumer makes that experimentation harder

Is there a better way?

Could we define the contract explicitly?  Could we make it work like an HTTP API?  At that point potentially all the rules and expectations that we have around HTTP APIs come into play.

An option could be to populate tables or provide views that the consumer uses. The internal team does not use these tables or views but signs up to them being the API contract for the consumer. This means that we can now apply all the standard mitigations around APIs that we are exposing outside of our system.  This becomes a known space to work in.  This does not make it easy, but it does define the deliberate place where we can support the integration to our database while allowing us to retain control and freely refactor the rest of the database as we wish.

A key idea around refactoring is to build the simplest solution that we can safely refactor out of.  Coupling to a defined, contained interface as a contract feels like a solution that can enable safe refactoring of the underlying code design while keeping the defined contract working.

What does this solution lead to?

Suddenly we have reduced the scope of questions we need to ask when we change the database.  We now know we are fine if the change in the database does not relate to one of the contracted tables / views.  Suddenly a whole category of friction is removed. We can write tests to give us feedback when we are accidentally changing those tables or views.  We do not use them in our systems so we have no reason to change them unless we are changing them at the request of the consumer. Another radical idea would be to use the Dependency Inversion principle and allow the consumer to define what the data should look like.  It is for them after all.

This sounds like a lot of work.  Can’t we just use database refactoring techniques to solve this?

The mechanisms for database refactoring are awesome.  They make your database more fluid just like code refactoring makes your code more fluid.

When you own all of the code, code refactoring flows in small steps that allow you to incrementally deploy software over and over again.  It is an awesome engineering feat.

When you own all of the database, database refactoring allows changes to the database to happen in small steps that allow you to incrementally deploy changes across the software and database over and over again.  It is another awesome engineering feat

Code refactoring slows down and has friction when the edge of the code that you are refactoring is shared by another consumer.  For instance an HTTP API or a class exposed by a Ruby Gem. The steps to quickly change the system run into friction around communication, planning and the potential that you might need to wait a long time before you can finally remove the interim code that is helping make the progression from one form of system to another

Database refactoring slows down and has friction when the database that you are refactoring is shared by another consumer.  The steps to quickly change the system run into friction around communication, planning and the potential that you might need to wait a long time before you can finally remove the interim code that is helping make the progression from one database design to another.

Database refactoring is really useful for helping understand how to change a legacy database shared by another consumer out of your control.  It moves it from static and scary to change, to slow moving and more malleable.

Database refactoring is really useful for helping speed up your changes in data when you own the whole stack.  It allows you to incrementally change the whole system in a far more fluid way.

Database refactoring doesn’t solve the communication and waiting overheads that come with co-ordinating with other consumers.  You can only clean up your refactoring as fast as the consumer follows your changes.  You still need to communicate about every change in case it has impact or introduce mitigations for that as described so far.

Another problem… data is not always meaningful without code

The data in an application’s database might need code to make it mean something to the business domain.  Keeping this in sync across multiple applications accessing the database directly can also lead to pain and violates DRY.  Deliberately providing data that has meaning is far more useful – whether that is directly into a database tables where that interpretation is potentially set or published out in an API that uses the code to do the interpretation.  The temptation here is to introduce a library to share to make sense of the data in the database.  But then we have to ensure every consumer is using the same version of the library…

Change is inevitable, how fast you embrace it is up to you.

I work in fluid environments that accept that code will change.  I accept that we don’t know everything.  We will change the systems we build as new knowledge and designs emerge.  Any team that needs to co-ordinate with another team to ask permission to make a change slows down.  There will be friction in refactoring at the shared edge of our systems.  I prefer to live in hope that we will be able to make more and more changes, to experiment and innovate with new products and ideas simply because we choose not to embrace solutions that will slow us down. Solutions that continue to be reoccurring work aren’t great.  Solutions that stop us from having to do a certain category of work speed us up. Work not done is time available to do other useful work.

Given a new or evolving data schema and database level integration – make sure you keep aware of the drawbacks and protect yourself.

  • Reduce the scope that is consumed.
  • Increase the feedback around what is consumed.
  • Inspect and adapt the pain that you do experience.

Hopefully these thoughts will help some others to avoid getting as many fingers burnt as I have in the past.

 

Burning my fingers with database integration

On the whole integration databases lead to serious problems because the database becomes a point of coupling between the applications that access it. This is usually a deep coupling that significantly increases the risk involved in changing those applications and making it harder to evolve them. As a result most software architects that I respect take the view that integration databases should be avoided https://martinfowler.com/bliki/IntegrationDatabase.html

I have learnt this lesson painfully in the past.  The clearest example is an application which was exposing a new API to replace an older one in a large monolithic app.  The desire was to build a model that better represented the domain. The implementer was going to build it in a new rails application.  It would be the new API and it would represent a better view of the system. At the same time, it was viewed that this API would not change “too much” after it was released.  So when reviewing and conversing about the solution and how it would be maintained, the fateful “it is not going to change, so let’s move forward and not worry about maintainability now” was stated.

The decision in this instance conversed around sharing the model as a Gem between the two applications, or otherwise sharing it as views and having the new application consume the views.  Having a model as a Gem for a concrete implementation of data is not a great idea as it means that all consumers must always have the latest gem deployed as that is the latest representation of the database schema and hence any time a schema change happens all consumers containing the gem should be redeployed – even if the change has no bearing on a given consumer, they need to keep in sync so that they interact with the data correctly.  This is particularly true if the model includes code to interpret what is in the database columns meaningfully.

In this instance we chose to share views to share the data.  If we had multiple applications owning the schema we could get into trouble managing changes to the migrations from multiple sources.  To resolve this we chose to make the primary application own the database schema. That meant any view change required for the new API needed to happen in the primary application first.  It would also need to be deployed to live before the API code could consume it.

If you recall the “it’s not going to change” comment.  It turned out six months later that we thought this API was useful and started to use it – both directly and from a search service.  Long story short, we did change the data a lot and this design choice became a real pain.  Changes only needed in data for the API caused multiple repos to change and be deployed.  Deployment could be painful as sometimes the different database schemas were out of sync. Deploying an older version of the main application would downgrade the views for the API consuming it.  Testing could be painful as we needed a matching test database based on the main application so that the views were up to date for the tests to run in the test database schema.  There were lots of lots of little niggles due to the coupling of the implementation.

The worst part of this choice was when people moved into the project they struggled to understand the coupling.  They would forget the process as too many things were solved by “we should remember to do XX”.  The solutions were too often “humans be better”, not systems providing feedback when the humans inevitably did the wrong thing.  But it is hard to provide automated feedback about changing code in one repository that is affecting another one.

When we did this I was working with the best group of agile software developers I have worked with.  It caused us a lot of pain and we owned all parts of code accessing the database.  It is hard to get out of this path once you have started down it.

“As a result most software architects that I respect take the view that integration databases should be avoided” – Martin Fowler

In retrospect this really does continue to feel like the smarter choice.

Why do some decisions fail?

Gladys has just received a notification from her company’s security guild that they will be adopting a tool to manage application secrets.  She is happy that someone has made the hard decision.  She will happily use it when she next needs to deal with secrets in the applications she works.

Zanele has just received a notification that to enhance pairing all developers are required to use emacs with the same environment settings.  Zanele is incensed with the decision.

Gladys may have actually forgotten what the decision was by the time she actually needs to use it.  Hopefully she’ll remember that there is a solution and ask.

Zanele is likely to ignore the decision for as long as possible.  She will resent the decision.  She will complain about how management force developers to do things.  Maybe she will check out what is available on the job market tonight.

Two stories, both fictional, but they both speak true.  What is the difference?  Why can some decisions that come from elsewhere be happily embraced while others really not?

Gladys is not being expected to change what she does.  She is just happy that someone has made a decision about something that she may not be doing yet or that she does not do frequently.  Gladys is not invested in the solution that is being presented.  Fundamentally she doesn’t really care though she is happy to be provided with a solution.

Zanele cares very deeply about using vim and has honed 10 years of skill using it.  Her setup is exactly as she wants it and all the hot keys are mapped like she wants them.  She has no desire or reason to change but she is being told to.  She has not been given a voice in the conversation in deciding the solution that fundamentally changes what she does on a day to day basis for a problem she may or may not be aware of.

Instinctively Gladys understands that the solution presented is needed.  It hasn’t been explained in detail (or it might have been) but her response is based on a belief that the request is reasonable.

Zanele has been given a solution to a problem.  She does not like the solution and perhaps she can see other ways to solve the problem.  She resents the solution and does not feel it is reasonable.

Making a decision around what Zanele actually does on a day to day basis without consulting or including her in the problem solving is a recipe for push back.  “Making” the decision was not hard.  We’ll just tell all the developers to change what they do.  Getting acceptance that the decision was a good idea – is something completely different and may turn a “good” decision on paper into a failed decision in reality.

The scenarios that play out are different and these examples may be simplistic but the majority of frustration I see is when autonomy of how I do my work is taken away from me.  Ironically, the more autonomy I have about the way that I work, the touchier I may be about what I expect to be in control of and influence.

Make decisions I don’t care about without me

Gladys didn’t care about the decision hence she was happy it was made.  (Someone had to make it.  Thankfully it wasn’t me.)

Include me in decisions I do care about

If Zanele were involved in the decision-making process it is possible that she would agree that the solution presented was the best possible outcome.  Alternatively, she might have provided some other options.  Being involved with the conversation would have given her insight and understanding.  The outcome of the decision might be the same but the emotional response to the decision may be completely different if she is involved.

So how do we know what decisions people don’t care about?

I don’t think we can know what decisions people will care about and hence it is very difficult to know that any specific decision is going to get push back.  A reasonable baseline may be that if you are going to impact what someone actually does, and they actually have to change their current behaviour and actions from existing ones, they are probably going to have an opinion on it.  Even more so, it will fail if those who need to implement the change do not think the problem is worth solving.

The first step is awareness.  In my early days of doing things more collaboratively, I was often reminded by a colleague “have you asked the team” for a decision that I thought was simple and obvious.  Sometimes we just are conditioned to making the obvious decisions even when they impact others far more than ourselves.  (I think I am better at this now…)

Once we are trying to be aware that decisions influence what people do, we should deliberately structure things that allow people to opt in to the conversation about what they care about and out of what they do not care about.  We need to foster a culture where expectation around autonomy and being involved matches with actually being involved.

The decision could be easy.  Achieving adoption might be less so.  Changing what people do is hard.  Being aware of that is useful.

There is much more to be said here.  This is just one aspect, but it is one that has struck me time and again that is too often not acknowledged.  If we start with actively respecting the people impacted by the decision, maybe things would be a little better for everyone overall.

Building a learning culture

Over the last couple of years, I’ve been focusing on how to push the skill / ability level of the teams that I’ve been working in.  For the last little while that has been focused across multiple teams.

Doing this from inside a team is “easy”.  Time can be spent pairing with anyone willing to pair with me.  To inculcate an attitude of understanding, of questioning, of challenging, and of trying new things to understand what it could look like.  In my experience, keeping an open mind and having respectful conversations enables much learning and growth in any developer – which in turn is more valuable for the company. And I keep on learning new things from developers of every experience level along the way.

Things are different when faced with the challenge of growing a learning culture across multiple teams and not being in any given team.  The following are some of my thoughts and experiments around building a learning culture in an organisation of 4 teams and growing.

Why promote a learning culture?

The software industry has a shortage of skills. It always seems hard to find good software developers to hire.

We have an ever-growing pool of new developers.  A quote going around is “Every 5 years the number of programmers in the world roughly doubles.  So half the programmers in the work have less than 5 years’ experience.” (1)

We are in an industry with a wealth of knowledge that can be consumed – but practical application is needed to truly understand the nuances.

We are in an industry where the learning gained from experience does matter.

Axiom: Experience is valuable.

However

“True learning involves a permanent change in the way you see and act in the world.  The accumulation of information isn’t learning.” – Benjamin Hardy (2)

True experience is more valuable

True experience involves a permanent change in the way you see and act in the world. Experience is not truly valuable unless it is learnt from.  True experience is most valuable when it can be understood in terms of principles and values that were effective (with a good experience) or were broken (for a bad experience).

Experience should be viewed around a common understanding of the values and principles being applied.  The values and principles should be based on the needs of the organisation.  Experience should be respected, discussed and challenged in line with these values and principles.

How can we harness experience to speed up learning?

Given that experience that we learn from is valuable.  How do we better harness the true experience in any given room / team / company?  How can we learn from our experiences and share those learnings with those who have not yet had them most effectively?  How can we extract the years of learning out of the experienced developers’ heads so that we don’t need to have 10+ years to learn it?

What about shared values and principles?

Hypothesis: Teams that value the same things in software will build software more effectively.

Lemma:
If we know vaguely* where you are going
We can all pull vaguely in the same direction

*Vaguely is important.

If we get too precise it limits a team’s ability to innovate and effectively solve the real problems they face.
If it is too ill defined, the teams have no direction and can waste effort duplicating work or going in different or unexpected directions.

Lemma:
Any decision we make should be based on a mental model that can be expressed.  If you can’t express the reasoning, then the reasoning is flawed.

If you don’t like something in a code review – understand why.  If you can’t express the value or principle that is being violated maybe you don’t know why and are just being opinionated.  Understand your opinion first, before expressing it.  It may be that the values and principles are still being met, just in a different way that you aren’t used to.

We need to move from conversations about how to conversations about why.  What are the intentional trade-offs and design decisions that are taking us in this direction or got us here? The how is important – but driven by a deep understanding of why.  This allows us to ensure there is no cargo culting of solutions and no sacred cows being ignored. How do we elevate the conversation from technology details to software truths?

Build a collaborative learning culture.

If we are self-aware and understand the decisions that we are making, then we can discuss these decisions with our teams and build a common understanding of what is a good decision for the team.  The team needs / context outweighs the individual’s needs.

If the team can understand the decisions it is making collectively, then teams can discuss their decisions with other teams and we can build a common understanding of what is a good decision for the organisation.  The organisation’s needs and context outweighs the teams’.

Context

It’s hard to justify why something is good or bad without context.

Without context – if the software does what was asked for, then it is good.  It doesn’t matter if it is spaghetti code.  It doesn’t matter if it is unmaintainable.  It doesn’t matter if it is inefficient.  Being right enough matters for now.  If the software never changes then it is good.  If the software changes, then we might have wanted to optimise for changeability and the current code is no longer good (enough).

Some things that we have tried

My focus has been on building a common understanding of why we do the things we do.  This provides a space for communicating opinions / values / ideas and increases understanding between the things that developers are valuing.

We tried

  • Code appreciation / Code review sessions
  • Code kata and conversation sessions
  • Kata sessions on Friday morning
  • Coaching – though harder to do cross-team
  • Retrospectives – though usually less focused on code

We introduced guilds to encourage cross-team conversations around specific topics.  These included: architecture, continuous delivery, security, databases.

We have an active tech blog, containing knowledge that is useful to remember or to share cross-team – e.g. security fixes made / to be aware of, continuous delivery pipelines of different teams, architectural knowledge base.  This is also a knowledge repository for learning so that new hires can start to get the context of what the rest of the team has already learnt and what their values are.

Code appreciation / code review sessions

This fluctuated through many different forms.  From once a week rotating through a different developer each week across all teams, to sharing just in your team and discussing and then a less frequent cross-team sharing session.

Code kata and conversations

1 hour a week facilitated session on different software topics ranging from TDD to DDD, from SOLID to testing practices and design patterns.  The focus is on practices and discussing and understanding the values and principles that elevate from the exercises.  This is usually in pairs or small groups working on a problem and then we retrospect on learnings as a group.  This started with doing some katas to explore certain ideas but then moved on to many different things and possibly shouldn’t have ‘kata’ in the title any more.

Learnings

We have been successfully changing the conversation from right and wrong and syntax to values and principles and design.  This has been noticeable across the teams.

When we agree on the values and principles, “right” and “wrong” become much easier to articulate.

I have relearnt that not everything is a teachable moment.  Different sessions have different focuses and sometimes in a group environment attempting to ask too many probing questions can be intimidating.

Not everyone will engage.  The key is to ensure that enough do and we focus on what the company needs from building a stronger learning culture and a strong software development team.  Hopefully the rest will pick up from the majority.

Where to from here?  Building a more collaborative learning culture

Everything that we’ve done so far has been to enable a collaborative learning culture.

But it has been focused on bringing the group involved up to a common level and I feel that we are now at a place with enough people engaged and interested.  We need to become truly collaborative in our learning.  This year I hope to see more presenters, more sharing and more growth and understanding across the teams.

The end result will hopefully be a competent, young team that can collaborate effectively around shared values and a common understanding about how to experiment and learn and discuss the right solutions for the organisation.

References:

(1) https://twitter.com/web_goddess/status/804452382536912897) – attributed to Robert Martin. Martin Cronje at Agile NZ 2016 quoted a similar figure.

(2) Via a great presentation by Katlyn Parvin – https://speakerdeck.com/katlyn333/am-i-senior-yet-grow-your-career-by-teaching-your-peers

(3) It looks like Martin is doing similar things to what we’re attempting to do (after moving to NZ) – https://speakerdeck.com/martincronje/agilenz-towards-mastery-establishing-craftsmanship-culture-in-a-team