In yesterday’s article I discussed the challenges of co-ordinating activities across a complex set of microservices, and I touched on the subject of end-to-end view of customer journeys and complex business processes, but I didn’t really cover that in any detail, so let’s look at what we need, and how that could be achieved.
Perhaps the key objective of a microservices architecture is to break down a complex monolithic architecture into a set of components, each of which can progress and evolve at they own pace thus accelerating innovation and the development lifecycle. The challenge comes, of course, when the innovation that you require is not contained within a single microservice. The greater the need for collaboration across microservice development teams, then the slower development will become. At an extreme, if there is a significantly large amount of cross-microservice collaboration needed (i.e. cross-microservice dependencies) then the benefits of the architecture are lost. In fact, because the typical governance structure for microservice architecture teams is very lightweight (e.g. single pizza teams, the concept of “tribes”) then it is likely that the microservice approach will be less effective than a monolithic approach.
End-to-end business processes and customer journeys are one of the areas where there are clear dependencies across microservices (there are others, e.g. data models, toolchains etc thatI will come back to in another article).
Let’s take a very standard customer journey process – lead to cash – as a worked example. I’m going to discuss this fairly generically, but if you’d like to look at something more detailed, then I’d recommend something like the Telco eTOM process model which has a fully worked our process structure.

The first thing to note is that the overall customer journey passes through not a small number of microservices, but across many applications. It’s safe to say that as of today, no company of significant size has implemented this entire chain using microservice architecture, so we really do not have a real example – we will have to work in hypotheticals.
Typically, medium to large companies will have implemented different applications for different functions across this process, and in a large organisation, those applications are often owned by different parts of the organisation – e.g. Marketing by the CMO, Fulfillment and billing by the COO and Payments by the CFO. The challenge is not just a microservices challenge, but a challenge across the whole enterprise.
In the SOA world, this end-to-end process is typically designed by a Business Process Modelling (BPM) tool and implemented as a ESB. The ESB provides a single place for the complex process routing to be held and co-ordinated. The routing rules can be held separately from the application code, and so users can alter business process without requiring a development change.

When we move to a microservices-led approach where communication is over pub-sub messaging tools (e.g. Kafka) then it’s not clear where the routing rules are defined other than in the microservice placing the message on a queue. This is definitely not a good place for them to reside for three main reasons.
- Firstly, similar decisions need to be made in many microservices. For example in our lead to cash process, we might require a extra credit check for customer for large orders, and that decision could be needed in different microservices for different products, and so we end up with the same logic being embedded in different microservices which is clearly bad practice
- Significant changes to the process routing would impact multiple microservices, and with each microservice implementing its own rules engine, this would not allow the process flexibility that business users currently have with BPM
- With a fire-and-forget pub-sub model, it becomes extremely difficult to track the end-to-end process. While single interactions can be delivered with certainly, simple messaging cannot guarantee an end-to-end process.
While microservice-based systems with the business process embedded in the microservices logic may be quick to build, it is likely that they will be difficult to maintain, and slower to react to business process changes than monolithic systems.
However, we can mitigate most of these downsides while not necessarily sacrificing the benefits on the microservices approach.
Firstly, we can deal with the first two issues with one change to our model. If we view the routing tables and decision logic for processes not as a central service (as an ESB would be) but as shared data that can be loaded at run-time by all microservices, then this would give us back our central control of routing and the business flexibility – as those tables could be changed by business users and not reliant on code changes within application. There are a number of business rules engines that can be used for this purpose where the decision routing can be encapsulated in a library or even in a service and loaded at run-time (and thus avoids the ESB bottleneck).

The third part, traceability of the process is much harder. We can never assume that any code written is 100% correct, so our traceability must cater for the situation where code in a microservice may mis-direct or even just fail to send a process onwards. The really tricky part is that this issue also applies to any logging that we might ask the microservice to complete – it’s entirely possible (even likely) that a bug in the code could cause a business process to just stop without warning.
This is no small problem – for example a recent report estimated the loss from fraud and error in the telco industry at c. $30 billion – and much of this is a result of broken processes.
The only way to truly mitigate this is to maintain the full integrity of the business process data at all times. Sometimes the process’s current state resides inside a microservice, and sometimes it resides on a queue; and the detailed process history is federated across all the microservices that it has passed through. This is especially challenging if different microservices use different tools, data schema, databases and recovery models.
In one of my previous roles, I worked with a great team of audit and compliance specialists, and their much favoured approach was to always focus on access to the raw data. Their view was that the only true reflection of the current status was the data committed to a database – anything else was transitory.
This was recently backup up by some discussions I had with a customer on how to implement disaster recovery for a Kafka queue – they just couldn’t get near an RPO of zero, and while transaction state existed on the queue, that meant real data loss on any failure scenario – which was unacceptable to them.
So, some conclusions :
- where a microservice based systems is dealing with transactions of significant value or high volumes of transactions with low value, then there are substantial risks of revenue loss without strong business process management
- Using shared routing engines and rules engines that have a central build-time point of control by business users, but a distributed run-time within microservices can deliver both flexibility and control
- making raw data available for audit and securing all data (including queued data) with RPO of zero is essential to maintain integrity of business process.