Two years ago we started the decentralization of an e-commerce project. The reason is that the project is deployed in production with a cluster of Hybris nodes in Europe, and our customer has world-wide interests. Latencies between continents are bigger than the quality we want to provide. Also we have the typical issues of a huge monolith, like long release cycles, long and exhaustive testing process. . . With an ever growing feature set, the time it took to bring new features to market was also steadily growing.
Research
Application
As I stated before, here at communicode we have large experience in working with Java and Spring. This is why finding the basic technology for us to use for developing the micro-services was straight forward.
From the very beginning we knew that our applications would be based on Spring boot because it delivers embedded servlet container (can be tomcat, jetty, etc. . . ), so this was one of the easiest parts to agree with the team.
“Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can ‘just run’. We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration.”
Decision about the framework and technology used for creating applications is important. This is the basic stone of the project and it is important that everybody in the team feels comfortable with the tools and technologies used.
Communication
After our first decision, we had to think about the communication between the services. Communication between micro-services can be either synchronous or asynchronous. The main communication needed for our use case was to inform other services that some actions were performed by the user. In this publish/subscriber pattern the communication could be made asynchronously, allowing us to give a fast response to the user meanwhile all the processes triggered by the interaction of the user were finished.
For achieving this, the idea was to use a message queue. We tried out two solutions: Redis (as a message queue) and RabbitMQ. One of the requisites for the new system was that we had to assure that no data is lost in any case, therefore, so far Redis does not support reliable delivery, the decision was to use RabbitMQ.
RabbitMQ is a message broker that implements the AMQP (Advanced Message Queuing Protocol). It is robust, easy to use, open source and is supported in several languages like Java, .NET, Python and Ruby.
For working with RabbitMQ it is important to understand what a message broker is. For making an analogy with the real world, a message broker is like the post office. When you have to send a package, you need the content (the paper you introduce in it), the destination address and the source address. Once you go to the post office, they take responsibility of your package and they assure that it will be delivered or, in the contrary, they can send you a notification of failure on the delivery. A message broker works the same. It assures that your message will be delivered in an asynchronous mode, where you don’t have to wait for the acknowledge from the recipient.
At this point, we have applications that can communicate between each other. Now we have to solve some further problems: deployment, testing and monitoring.
Deployment and distribution
There are different approaches for deploying and distributing an application. In our case, we wanted to be agnostic of the environment where the application would run. We wanted to be agile on this process and it had to be easy to maintain, easy to redeploy for new releases and easy to configure. Therefore, our focus was on Docker. Why Docker? Quoting the official site from Docker: “Docker is the world’s leading software containerization platform. Docker allows you to compose your application from microservices without worrying about inconsistencies between development and production environments, and without locking into any platform or language. Docker lets you design the entire cycle of application development, testing and distribution, and manage it with a consistent user interface. Docker offers you the ability to deploy scalable services, securely and reliably, on a wide variety of platforms.”
So Docker actually allows us to run our application always in the same environment. The only thing we have to do is to configure this environment and then, the deployable is always exactly the same. As long as with Docker we can generate an image of the application, we can create containers based on that image and deploy them fast in any environment capable of running Docker. Docker has daemons for Windows, Mac and runs natively in GNU-Linux (and, since the latest release, now they can also run natively in Mac). This means that, no matter which environment you use for developing, the application will run always the same, in development and in production.
Testing
Testing was the most interesting topic in this project. Generally we write unit tests, API tests and integration tests (within the application), but this project is composed of several applications that communicate with each other, therefore we needed to properly test the use cases from the entry point (usage from a user) to the exit point.
The graph shows all the connections that exist between the services. As you may already have imagined, for testing all the services, we needed to make an orchestration of them first, configuring them and have them up and running with the proper connections. For that we used the “docker-maven-plugin” from “fabric8io”. The plugin allows the management of the Docker images and containers. It can build images, run containers, link containers between each other and plenty other Docker functionalities like push, stop, remove, etc. . .
What we did was to configure the set up exactly as it is used in production, assuring that the deployment of the delivered services will always work and the connectivity from the entry point to the exit point is preserved and the applications are capable of sending and receiving messages.
In the graph we can see that, apart of RabbitMQ and the entry points, there is an extra box called “JLament”. This is the “Testing tool/Framework” that we built for testing. In JLament we have the orchestration previously discussed, the test suite, mocks of external services not developed by the team (mocks of backends) and some services that allow the tests to communicate with the services, reading messages from the message queue, publishing messages, keeping track of the communications, and all sort of different functionalities needed for testing all use cases.
Storage
This topic was one of the most controversial that we had. Choosing the right storage for your application is key, so we invested a good amount of time researching to find the best solution for us. The first approach we had was to use Redis. Every time a new technology is proposed and discussed, before adapting it, we always make a research about the technology, where we test what this technology can do for us. In the case of Redis, a non relational database, we made several proposals on how to store the data, pros and cons lists of each structure and then measurements of the performance of each approach. For example, we made a comparison about the performance that Redis would give us in case of storing the data in Strings or in Redis Hashes (stored in form of a map) and a pro-cons list before making our decision. In this case the pro-cons list was:
And the performance results were: By gathering all this information, we could make decisions that were mature, well thought-out and adequate for the project. The result after testing Redis as a database technology was not successful for us. The performance that we got when the amount of data stored grow was out of our boundaries (we had timing constraints to satisfy).
Approach 1 is the String approach and Approach 2 is the Hash approach.
Approach with blue line is the String approach and Approach with red line is the Hash approach.
Then the infrastructure team decided that Amazon AWS would be the offer chosen for hosting the services in the cloud. Therefore we decided to take a look at the solutions provided by Amazon AWS. Using Amazon RDS seemed to be a good idea. It provides the possibility of using technologies well known to the developers of the team like MySQL, Oracle DB or PostreSQL. So we tried that solution only to find out that the data model as we wanted to store it would have required complex queries for simple use cases, making the usage of a relational database in the cloud quite costly and more complex than necessary. So we took a look at the other service that Amazon AWS provides for non relational databases: DynamoDB. “Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. It’s flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications. aws.amazon.com/documentation/dynamodb
Setting it up was easy and the performance provided by DynamoDB is incredibly high. Amazon also provides a Java SDK for the connection with DynamoDB, making the development much easier.
Monitoring
The last step in all projects is to have the right tools for monitoring the services. In a big project like this, where different companies are producing different components, there is the need of having centralized tools that can be used for anyone in the project for monitoring all components. Just imagine if every component (we are talking about more than 15 at the moment) would have its own monitoring tools. There would be no possibility to cross the data for debugging purposes without having a huge overhead. Therefore we all agreed to use ELK (ELK stands for Elasticsearch) for the logging system and InfluxDB + Grafana for the metrics. ELK stack allows us to centralize all logs from all components, using Logstash to push data to Elasticsearch that can be read and filtered by Kibana. This is a powerful stack for logging systems. Kibana provides different visualizations, creations of dashboards and even graphs for monitoring, for example, how many times a log statement occurs. InfluxDB + Grafana is the second monitoring stack. In InfluxDB we store all metrics produced by any component and Grafana allows us to create graphs for visualization of the data. With this stack we can have basic monitoring (i.e.: CPU, Memory usage, Threads available / on use), custom monitoring (i.e.: count of events of one type raised in the application), infrastructure monitoring (i.e.: provisioned read/write in dynamoDB and usage of that provisioned data). . .
This combination of both stacks, provides both, the infrastructure team and the component team with all the needed information for monitoring the systems and debugging, tracing or tracking any kind of misbehavior that may be produced by the normal usage of the platform.