After the introduction of Docker, the life of a developer became much easier. Kubernetes solved many problems and offloaded the task of setting up the necessary runtimes, libraries, and servers. Kubernetes even simplified complex deployments and made managing multiple containers for bigger applications within the reach of most developers.
But using Kubernetes also introduces new types of issues that can be difficult to spot and troubleshoot. Memory and resource allocation issues are a good example. They can have a wide impact on application performance and be hard to identify and correct
Fortunately, some general tips can help you avoid some common issues with misconfigured resource allocation in Kubernetes.
The short answer? You have fewer things to worry about.
As a developer, you won’t have to worry about any of this. You’ll be able to focus on the application itself.
While saving you, the developer, the hassle of managing multiple containers is already an improvement, there’s another important advantage of using Kubernetes: scalability.
With Kubernetes, it doesn’t matter if your application has fewer than 10 containers or if it has hundreds of them. Kubernetes can manage a cluster of five servers as easily as it manages a cluster of 500-plus servers. One Kubernetes cluster can even consist of different pools of machines in different places.
As a developer, sometimes you need to implement extra logic in the code to make distributed applications bulletproof. Again, the bigger the application (i.e., the more containers it consists of), the more effort has to be put into extra coding.
But don’t worry, Kubernetes can help here, too. It has built-in load-balancing features. It can automatically perform load balancing of requests between a specified set of containers and eliminate containers that can’t handle any more load or aren’t working properly.
Abstraction layers created by Kubernetes and its features are very helpful, but they also complicate troubleshooting. Especially when it comes to resource management and allocation.
Kubernetes is designed for distributing containers across multiple nodes in the most effective way possible. But to do it really well, it needs to anticipate how many resources a container will need to function properly. You can provide this information by setting the proper resource requests and resource limits.
While they’re optional, it’s a best practice to set both. But before we go there, let’s review what they are and how they differ. ’
While it may not sound like rocket science to set requests and limits, there are some pitfalls to avoid.
How does Kubernetes assign memory for a container? It depends. A pod can be run in the following scenarios:
Without requests and limits set, pods will simply be managed on a first-come, first-served basis. Kubernetes will try to distribute RAM between all running pods equally, but if one pod tries to allocate more and more memory, Kubernetes may kick out other pods from the node to meet the demand. There’s nothing stopping pods from consuming all the free memory on the node. Trust me, you don’t want to have a memory leak in this situation.
You might be thinking “I’ll set those requests to guarantee the amount my pod needs to run properly, but I don’t think I need limits.” Doing this will solve some problems.
By setting resource requests, Kubernetes will make sure to schedule a pod on a node with a minimum amount of RAM available, so, in theory, you’re safe. But in practice, nothing protects you from a memory-leaking application.
If you have a pod needing only 512 MB of RAM to run properly on a node with 8 GB of RAM, and you respectively set memory request to 600 MB for it, then you should be able to fit more than 10 pods on the node. But if one of these pods has a memory leak, Kubernetes may not schedule any other pod on the node.
On the other hand, if you only set limits, nothing guarantees a minimum amount of RAM memory for the pod. So, depending on the system usage, your application simply may not perform properly.
Setting both memory resource requests and limits for a pod helps Kubernetes manage RAM usage more efficiently. But doing so doesn’t solve all problems.
If an application has a memory leak or tries to use more memory than a set limit amount, Kubernetes will terminate it with an “OOMKilled—Container limit reached” event and Exit Code 137.
When you see a message like this, you have two choices: increase the limit for the pod or start debugging. If, for example, your website was experiencing an increase in load, then adjusting the limit would make sense. On the other hand, if the memory use was sudden or unexpected, it may indicate a memory leak and you should start debugging immediately.
Remember, Kubernetes killing a pod like that is a good thing—it prevents all the other pods from running on the same node.
Kubernetes uses memory requests to determine on which node to schedule the pod. For example, on a node with 8 GB free RAM, Kubernetes will schedule 10 pods with 800 MB for memory requests, five pods with 1600 MB for requests, or one pod with 8 GB for request, etc. However, limits can (and should) be higher than requests and are not considered for scheduling.
So, for example, you can schedule 10 pods on the same node with 800 MB for memory requests and 1 GB for memory limit. This leads to a situation where some pods may try to use more memory than node capacity.
In this case, Kubernetes may terminate some pods. Because of this, it’s important to understand how Kubernetes decides which pod to kill.
Here is the hierarchy Kubernetes uses to decide which pods to delete:
And this hierarchy is why it’s so important to set the appropriate value for both requests and limits for each workflow.
Setting CPU requests and limits isn’t as straightforward as memory, where you can simply define a specific amount of bytes. Kubernetes defines CPU resources as “CPU units.” They equal one vCPU/core for cloud providers and one hyperthread on bare-metal machines. In theory, a CPU request of “1” will allow a container to use one vCPU/core (regardless of whether it’s running on a single-core or 24-core machine). Fractional values are also possible, so a value of “0.5” will allow a container to use half of the core.
However, Kubernetes translates these values under the hood into a proportion of CPU cycles, which means if there’s high CPU usage on the node, there’s no guarantee the container will get as much CPU as it requested. So, it’s really a priority setting.
Unlike memory limits, Kubernetes will not kill a container that tries to use more CPU than its limit. Instead, Kubernetes will only throttle down the process and assign less CPU time to it.
Since CPU requests/limits aren’t absolute value settings but the percentage of the quota, it can be difficult to troubleshoot CPU performance-related issues.
Not setting resource request values and limits can cause problems, and setting the wrong resource request values and limits can also cause problems. The challenge is finding the correct values to use. ’Your best bet is to start from reasonably guessed numbers and gradually adjust them to the optimal values. To do so, you need to have a robust monitoring system (and if you think you don’t, read this blog post).
Aggregating and monitoring logs can be very useful in identifying the right values for requests and limits. Most of the events related to requests and limits are emitted as logs. Messages like “OOMKilled—Container limit reached” are pretty straightforward.
You can read more about troubleshooting with Kubernetes logging here. Fortunately, it’s easy to stream logs from Kubernetes into one place. Tools like FluentD can take care of this. From there, you only need a system that can aggregate and directly show you the most important messages.
SolarWinds® Papertrail™ can manage logs from containers and Kubernetes components and nodes, which gives you an even better overview of what’s happening in your cluster. It’s very handy, especially in bigger clusters. The powerful yet simple search syntax offered by Papertrail can drastically reduce debugging time. Moreover, it can show you events with context and pinpoint issues. Even if you prefer real-time troubleshooting, Papertrail has you covered with its live tail feature. If you want to see it yourself, sign up for a trial or request a demo.
This post was written by Dawid Ziolkowski. David has 10 years of experience as a network/system engineer at the beginning, DevOps in between, and cloud-native engineer recently. He’s worked for an IT outsourcing company, a research institute, telco, a hosting company, and a consultancy company, so he’s gathered a lot of knowledge from different perspectives. Nowadays he’s helping companies move to cloud and/or redesign their infrastructure for a more cloud-native approach.