我的应用程序中的 pod 扩展为每个用户 1 个 pod(每个用户都有自己的 pod)。我对应用程序容器设置的限制如下:
resources:
limits:
cpu: 250m
memory: 768Mi
requests:
cpu: 100m
memory: 512Mi
我的节点池中的节点每个都有 8GB 内存。我启动了一堆用户实例来开始测试,并看着我的资源指标随着我启动每个实例而上升:
中央处理器:
记忆:
在 15:40,我看到事件日志显示了这个错误(注意:第一个节点被排除在外):
0/2 nodes are available: 1 Insufficient memory, 1 node(s) didn't match node selector.
当内存/cpu 请求仍远低于总容量(cpu 约为 50%,mem 约为 60%)时,为什么会发生这种情况?
以下是一些相关信息kubectl describe node
:
Non-terminated Pods: (12 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
ide theia-deployment--ac031811--football-6b6d54ddbb-txsd4 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 13m
ide theia-deployment--ac031811--footballteam-6fb7b68794-cv4c9 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 12m
ide theia-deployment--ac031811--how-to-play-football-669ddf7c8cjrzl 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 14m
ide theia-deployment--ac031811--packkide-7bff98d8b6-5twkf 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 9m54s
ide theia-deployment--ac032611--static-website-8569dd795d-ljsdr 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 16m
ide theia-deployment--aj090111--spiderboy-6867b46c7d-ntnsb 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 2m36s
ide theia-deployment--ar041311--tower-defenders-cf8c5dd58-tl4j9 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 14m
ide theia-deployment--np091707--my-friends-suck-at-coding-fd48ljs7z 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 4m14s
ide theia-deployment--np091707--topgaming-76b98dbd94-fgdz6 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 5m17s
kube-system csi-azurefile-node-nhbpg 30m (1%) 400m (21%) 60Mi (1%) 400Mi (7%) 12d
kube-system kube-proxy-knq65 100m (5%) 0 (0%) 0 (0%) 0 (0%) 12d
lens-metrics node-exporter-57zp4 10m (0%) 200m (10%) 24Mi (0%) 100Mi (1%) 6d20h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1130m (59%) 3750m (197%)
memory 4836Mi (90%) 7988Mi (148%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-azure-disk 0 0
根据 kubernetes文档:
可以在此处找到有关如何运行 pod 限制的更多信息。
更新:
可以通过重新调整内存限制和添加适合您偏好的驱逐策略来优化资源消耗。您可以在此处和此处的 kubernetes 文档中找到更多详细信息。
更新 2:
为了更好地理解调度程序拒绝将 Pod 放置在节点上的原因,我建议在您的 AKS 群集中启用资源日志。查看 AKS文档中的本指南。从常见日志中查找
kube-scheduler
日志以查看更多详细信息。我发现在查看可用容量时,您需要注意
Allocatable
,而不是Capacity
。来自 Azure 支持: