ggirtsou's Blog

Thoughts on software and architecture. GitHub

17 February 2018

Microservices Go-live checklist

Microservices Go-live checklist

Photo by Louis Moncouyoux on Unsplash

More and more organisations break up monoliths (formerly known as “app”) in smaller services and adopt microservices architecture pattern and they communicate in an asynchronous way via Event Sourcing. The main benefits are:

Go Live Checklist

Revise architecture, get rid of Single Points of Failure

Make sure everyone is happy with the current architecture, and if you need to deliver fast, plan when those fixes will be made, make sure everyone is on the same page, and document those decisions.

Additionally, you want your service to be highly available:

And secure (security is a huge topic and below list is by no means exhaustive):

Documentation

Ensure your documentation (README/Wiki entries), provide enough information about your app:

Your microservice should be fairly simple and do very specific tasks. If you find it does too many things, consider breaking it down to more microservices.

Containerize your app

There are many benefits in containerizing your application, described here.

In a nutshell, your artifact is a Docker image that is deployed and managed by a container orchestration framework, such as Kubernetes.

Healthcheck and readiness endpoints

To be used by the container orchestration framework to determine if the app is healthy and ready to accept connections.

Possible healthcheck values:

Degraded performance means some operations will fail.

Metrics endpoint

You can create meaningful dashboards (for example in Grafana) by leveraging the power of metrics. This is useful to see how your app is behaving, for example memory consumption, GC pauses, error rate, HTTP status codes, and whatever else is useful when troubleshooting your application.

A popular tool for metrics collection is Prometheus.

Logging

Structured logging provides useful insights on what your app does and can be aggregated and parsed in a service like Splunk or SumoLogic.

Set up alerts

Create alerts based on business and operations logic, here are some examples:

Note: Finding the right value for a threshold sometimes takes time, and should be adjusted as you don’t want to be alerted too often as you’ll end up ignoring, or even worse missing alerts due to excessive noise.

Think about what threshold is acceptable for your service and investigate why an alert was triggered, and if necessary adjust the threshold.

Backups and restore plan

Have a backup strategy as well as a way to restore from previous backups. It is very important that your teammates are familiar with these processes and you have tested they work.

You don’t want restoring from a backup to fail in a critical moment.

You also want an alert if backups are not being taken, this can be achieved by your backup service exposing relevant metrics and you setting up alerts for all apps that have their data backed up.

Resources

For more on distributed systems, reliable microservices, and event sourcing I recommend the following resources:

Your experience

I’d love to hear your thoughts and about your experience in microservices world. Feel free to reach out to me on LinkedIn!

tags: microservices - go-live