bulletshot60提出的问题 -computer

bulletshot60

Asked: 2024-03-14 23:08:23 +0800 CST

Terraform `aws_eks_node_group` 已准备就绪，但创建从未完成

我有一个 terraform 设置，在其中创建一个新的启动模板和一个节点组。如果没有启动模板，一切都会正常工作。使用启动模板，节点已准备就绪，但节点组永远不会完成创建。

main.tf

...

resource "aws_launch_template" "this" {
  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_type           = var.block_device_mappings.type
      volume_size           = var.block_device_mappings.size
      iops                  = var.block_device_mappings.iops
      kms_key_id            = var.block_device_mappings.kms_key_id
      encrypted             = var.block_device_mappings.encrypted
      delete_on_termination = var.block_device_mappings.delete_on_termination
    }
  }

  user_data = base64encode(templatefile("${path.module}/user_data.tpl", {
    cluster_endpoint = var.cluster_endpoint
    certificate_authority_data = var.certificate_authority_data
    bootstrap_extra_args = "--use-max-pods false"
    cluster_name = var.cluster_name
  }))
}

resource "aws_eks_node_group" "this" {
  cluster_name    = var.cluster_name
  node_group_name = var.node_group_name
  node_role_arn   = var.node_group_arn
  instance_types  = [var.instance_type]
  subnet_ids = [
    for subnet in var.subnets : subnet.id
  ]
  capacity_type = var.capacity_type

  scaling_config {
    desired_size = var.desired_capacity
    max_size     = var.max_capacity
    min_size     = var.min_capacity
  }

  update_config {
    max_unavailable = 1
  }

  labels = var.node_group_labels

  dynamic "taint" {
    for_each = toset(var.node_group_taints)

    content {
      key    = taint.value.key
      value  = taint.value.value
      effect = taint.value.effect
    }
  }

  launch_template {
    id      = aws_launch_template.this.id
    version = aws_launch_template.this.latest_version
  }
}

...

user_data.tpl

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="/:/+++"

--/:/+++
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash

/etc/eks/bootstrap.sh --apiserver-endpoint '${cluster_endpoint}' --b64-cluster-ca '${certificate_authority_data}' ${bootstrap_extra_args} '${cluster_name}'

--/:/+++--

kubectl get pods

NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-1-128.us-west-1.compute.internal   Ready    <none>   13m   v1.29.0-eks-5e0fdde
ip-192-168-1-140.us-west-1.compute.internal   Ready    <none>   13m   v1.29.0-eks-5e0fdde
ip-192-168-1-157.us-west-1.compute.internal   Ready    <none>   13m   v1.29.0-eks-5e0fdde

kubectl describe node ip-192-168-1-128.us-west-1.compute.internal

Name:               ip-192-168-1-128.us-west-1.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.4xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-west-1
                    failure-domain.beta.kubernetes.io/zone=us-west-1a
                    k8s.io/cloud-provider-aws=cff041cdc91d38d182baa77beef8bf9f
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-192-168-1-128.us-west-1.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5.2xlarge
                    topology.kubernetes.io/region=us-west-1
                    topology.kubernetes.io/zone=us-west-1a
Annotations:        alpha.kubernetes.io/provided-node-ip: 192.168.1.128
                    csi.volume.kubernetes.io/nodeid: {"csi.tigera.io":"ip-192-168-1-128.us-gov-west-1.compute.internal"}
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.1.128/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.42.7.192
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 14 Mar 2024 10:40:54 -0400
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-192-168-1-128.us-west-1.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Thu, 14 Mar 2024 10:54:21 -0400
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 14 Mar 2024 10:41:24 -0400   Thu, 14 Mar 2024 10:41:24 -0400   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Thu, 14 Mar 2024 10:52:09 -0400   Thu, 14 Mar 2024 10:40:54 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 14 Mar 2024 10:52:09 -0400   Thu, 14 Mar 2024 10:40:54 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 14 Mar 2024 10:52:09 -0400   Thu, 14 Mar 2024 10:40:54 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 14 Mar 2024 10:52:09 -0400   Thu, 14 Mar 2024 10:41:18 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   192.168.1.128
  InternalDNS:  ip-192-168-1-128.us-west-1.compute.internal
  Hostname:     ip-192-168-1-128.us-west-1.compute.internal
Capacity:
  cpu:                16
  ephemeral-storage:  20959212Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             64333324Ki
  pods:               110
Allocatable:
  cpu:                15890m
  ephemeral-storage:  18242267924
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             61334028Ki
  pods:               110
System Info:
  Machine ID:                 ec2821bfac66895c1abc29a47021fe76
  System UUID:                ec2821bf-ac66-895c-1abc-29a47021fe76
  Boot ID:                    356d15db-1436-4c45-af1e-6a668eddd8e0
  Kernel Version:             5.10.210-201.852.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.11
  Kubelet Version:            v1.29.0-eks-5e0fdde
  Kube-Proxy Version:         v1.29.0-eks-5e0fdde
ProviderID:                   aws:///us-west-1a/i-0874068c9ab354407
Non-terminated Pods:          (6 in total)
  Namespace                   Name                                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                 ------------  ----------  ---------------  -------------  ---
  calico-apiserver            calico-apiserver-5f98fdb745-cf4xg    0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  calico-system               calico-node-6c98k                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         13m
  calico-system               calico-typha-695fb789b5-sfq4n        0 (0%)        0 (0%)      0 (0%)           0 (0%)         13m
  calico-system               csi-node-driver-qtczs                0 (0%)        0 (0%)      0 (0%)           0 (0%)         13m
  ionic-system                tigera-operator-967f9fc76-tghqf      0 (0%)        0 (0%)      0 (0%)           0 (0%)         15m
  kube-system                 kube-proxy-cnlnc                     100m (0%)     0 (0%)      0 (0%)           0 (0%)         13m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (0%)  0 (0%)
  memory             0 (0%)     0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                From                   Message
  ----     ------                   ----               ----                   -------
  Normal   Starting                 13m                kube-proxy             
  Normal   Synced                   13m                cloud-node-controller  Node synced successfully
  Normal   Starting                 13m                kubelet                Starting kubelet.
  Warning  InvalidDiskCapacity      13m                kubelet                invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  13m (x2 over 13m)  kubelet                Node ip-192-168-1-128.us-west-1.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    13m (x2 over 13m)  kubelet                Node ip-192-168-1-128.us-west-1.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     13m (x2 over 13m)  kubelet                Node ip-192-168-1-128.us-west-1.compute.internal status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  13m                kubelet                Updated Node Allocatable limit across pods
  Normal   RegisteredNode           13m                node-controller        Node ip-192-168-1-128.us-west-1.compute.internal event: Registered Node ip-192-168-1-128.us-west-1.compute.internal in Controller
  Normal   NodeReady                13m                kubelet                Node ip-192-168-1-128.us-west-1.compute.internal status is now: NodeReady

更令人困惑的是，如果我--apiserver-endpoint '${cluster_endpoint}' --b64-cluster-ca '${certificate_authority_data}'从.tpl文件中删除，一切都会正常工作，除了最大 pod 计数错误（由于实例类型，它会下降到 58）。

笔记：

我们使用 Calico 而不是 AWS 节点 CNI。这是该项目的要求，所以我坚持这一点。
迄今为止唯一突出的奇怪之处是，当我在没有上述参数的情况下运行此命令时，污点会填充，而当我在没有上述参数的情况下运行时，污点不会填充，但这可能是一个转移注意力的事情。

任何建议表示赞赏。

Terraform `aws_eks_node_group` 已准备就绪，但创建从未完成

如何减少“vmmem”进程的消耗？

从 Microsoft Stream 下载视频

Google Chrome DevTools 无法解析 SourceMap：chrome-extension

Windows 照片查看器因为内存不足而无法运行？

支持结束后如何激活 WindowsXP？

远程桌面间歇性冻结

子网掩码 /32 是什么意思？

鼠标指针在 Windows 中按下的箭头键上移动？

VirtualBox 无法以 VERR_NEM_VM_CREATE_FAILED 启动

应用程序不会出现在 MacBook 的摄像头和麦克风隐私设置中

bulletshot60's questions