Give EKS Nodes a Dedicated EBS Volume for Containers

Lately I have been hitting situations where containers overload the system disk on my EKS nodes. Once the root disk maxes out its throughput, the operating system freezes, CPU usage spikes, and the node drops into NotReady. Even worse, the node stops accepting SSH, so I cannot inspect what actually went wrong. Kubernetes will eventually reschedule the Pods, but only after the node has stayed in NotReady for five minutes.

To keep this from happening again, I split the system disk and the container data disk into two separate EBS volumes so that container workloads always land on the second volume.

繼續閱讀

為 EKS 節點建立容器專用的 EBS 磁區

最近碰到在 EKS cluster 中,容器對系統磁碟的壓力過大狀況,這種情況會因為系統磁碟效能達到上限,作業系統無法正常運作,CPU 使用率飆高,最後節點進入 NotReady 狀態。尤其,在這種情況發生時,我們只能看到節點處於 NotReady 狀態,無法觀測到是哪一個部分出了問題,同時,因我們也沒辦法連入節點了。儘管 Kubernetes 具備自動恢復的能力,但在節點 NotReady 過了五分鐘後,才會將 Pod 刪除並重新部署到 EKS cluster 當中。

為了避免這個反覆發生,我把節點的系統磁碟和容器資料磁碟拆開為兩個 EBS 磁區,以確保容器只會使用第二顆 EBS volume。

繼續閱讀