Kubernetes Health Checks with Readiness and Liveness Probes (Kubernetes Best Practices)

Kubernetes Health Checks with Readiness and Liveness Probes (Kubernetes Best Practices)



distributed systems can be hard to manage a big reason is that there are many moving parts that all need to work in order for the system to function the small part breaks the system has detected route around it and fix it and this all needs to be done automatically in this episode of kubernetes best practices let's learn how you can set up readiness and liveness probes in your kubernetes cluster health checks are a simple way to let the system know if an instance of your app is working or not if an instance of your app is not working and other services should not access it or send requests to it instead the request to be sent to another instance of the app that is ready or retried at a later time the system should also bring your app back to a healthy state by default kubernetes will start to send traffic to a pod when all the containers inside that pod start and well restart containers when they crash while this default behavior can be good enough when you're starting out you can make your deployments more robust by creating custom health checks fortunately kubernetes makes this relatively straightforward so there's no excuse not to kubernetes gives you two types of health checks and it's important to understand the differences and uses of each readiness probes are designed to let kubernetes know when your app is ready to serve traffic kubernetes will make sure the readiness to probe passes before allowing a service to send traffic to the pod if a readiness probe starts to fail kubernetes will stop sending traffic to the pod until it passes again liveness Propst lets kubernetes know if your app is alive or dead if your app is alive then kubernetes will leaves it alone if your app is dead then kubernetes will remove the pot and start a new one to replace it let's imagine a scenario where your app takes a minute to warm up and start your service won't work until it's up and running even though the process has already started you will also have issues if you want to scale up this deployment to have multiple copies new copy shouldn't receive traffic until they're fully ready but by default kubernetes will start sending traffic as soon as the processes inside the container start by using a readiness probe kubernetes will wait until the app is fully started before it allows the service to send traffic to the new copy let's imagine another scenario where you have an app as a nasty case of deadlock that happens in an edge case causing it to hang indefinitely and stop serving requests because the process continues to run by default kubernetes will think that everything is fine and continue to send to the broken pot by using a liveness probe kubernetes will detect the ab is no longer serving requests and I'll restart the offending pod by default the next step is actually defining the probes that will test readiness and liveness there are three types of probes HTTP command and tcp you can use any of them for liveness and readiness checks HTTP probes are probably the most common type of custom probe even if your app isn't an HTTP server you can usually create a lightweight HTTP server inside your app to respond to the liveness probe now kubernetes will ping a path and if it gets an HTTP response in the 200 or 300 range it'll mark it as healthy otherwise they'll be marked as unhealthy for command probes kubernetes will run a command inside your container if the command returns with an exit code 0 then the container will be marked healthy otherwise it'll be marked unhealthy this type of probe is useful but you can't or don't want to run an HTTP server but you can run a command that can check if your app is healthy or not the last type of probe is the TCP probe kubernetes will try to establish a TCP connection on the specified port if it can establish a connection the container is considered healthy if it can't it's considered a failure now these can come in handy if you have a scenario where the HTTP probe or the command probe don't work well for example a G RPC or FTP service RIA prime candidate for this type of probe probes can be configured in many ways you can specify how often they should run what the success and failure thresholds are and how long to wait for responses see the documentation for more details however there's one very important heading that you need to configure been using liveness probes this is the initial delay second setting as I mentioned before aliveness Pro failure will cost a pot to restart you need to make sure the probe doesn't start until the app is ready otherwise the Apple just constantly restart in a loop and never be ready I recommend using the p99 startup time as a initial delay seconds or you know just take the average startup time and add a buffer as your app startup time gets faster or slower make sure you update this number most people will tell you that health checks are a requirement for any distributed system and kubernetes is no exception using health checks gives your kubernetes services a solid foundation better reliability and higher uptime thankfully kubernetes mix is easy to do I'll see you on the next episode of kubernetes best practices

7 comments

  1. Should there not be a Dieness as well? If there is something the existing pod needs to complete before it dies, there should be an opportunity given for that as well. Not seeing anything clear on that. Example, sending logs or post-transaction data etc.,

  2. Well thought on the design. I don't know why AWS Lambda or Azure functions come up with this similar implementation and those computes currently cry for cold start.

  3. I think thi smore like Healcheck feature of AWS for ALB but I like the idea of keeping it at Service or POD level plus auto re configuring/scaling of the whole cluster.

  4. Why not design K8s such that IF the Readiness and Liveness probes are both defined, then 'initialDelaySeconds' is not required; instead the Liveness probe starts after the Readiness probe succeeds or times-out; as defined by the 'failureThreshold' of the Readiness probe?

Add a Comment

Your email address will not be published. Required fields are marked *