AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / server / 问题

问题[etcd](server)

Martin Hope
Jonas
Asked: 2021-08-22 07:54:38 +0800 CST

systemd 终止使用 podman 启动的 etcd 服务 - 仅允许主 PID 接收

  • 1

我尝试将etcd作为在podman容器中运行的 systemd 服务启动。

启动后,我从 systemd 收到此错误日志:

systemd[1]: etcd.service: Got notification message from PID 4696, but reception only permitted for main PID 4868

但是 etcd 似乎可以开始尝试通知容器守护进程:

21T15:31:08.817Z","caller":"etcdserver/server.go:2500","msg":"cluster version>
Aug 21 15:31:08 ip-10-0-0-71 podman[4696]: {"level":"info","ts":"2021-08-21T15:31:08.817Z","caller":"etcdmain/main.go:47","msg":"notifying init daemon>
Aug 21 15:31:08 ip-10-0-0-71 podman[4696]: {"level":"info","ts":"2021-08-21T15:31:08.818Z","caller":"etcdmain/main.go:53","msg":"successfully notified>

但 systemd 似乎没有意识到这一点并终止了 etcd 服务:

Aug 21 15:32:34 ip-10-0-0-71 systemd[1]: etcd.service: start operation timed out. Terminating.
Aug 21 15:32:35 ip-10-0-0-71 podman[4696]: {"level":"info","ts":"2021-08-21T15:32:35.000Z","caller":"osutil/interrupt_unix.go:64","msg":"received sign>
Aug 21 15:32:35 ip-10-0-0-71 podman[4696]: {"level":"info","ts":"2021-08-21T15:32:35.000Z","caller":"embed/etcd.go:367","msg":"closing etcd server","n>

这是 systemd 服务状态:

$ sudo systemctl status etcd.service
● etcd.service - etcd
     Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Sat 2021-08-21 15:32:35 UTC; 8min ago
    Process: 4868 ExecStart=/usr/bin/podman run -p 2380:2380 -p 2379:2379 --volume=/var/lib/etcd:/etcd-data:z --name etcd 842445240665.dkr.ecr.eu-nort>
   Main PID: 4868 (code=exited, status=0/SUCCESS)
        CPU: 3.729s

这是我从 podman 开始的 etcd 的 systemd 单元服务文件:

cat <<EOF | sudo tee /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=podman_ecr_login.service mk_etcd_data_dir.service

[Service]
Type=notify
ExecStart=/usr/bin/podman run -p 2380:2380 -p 2379:2379 --volume=/var/lib/etcd:/etcd-data:z \
 --name etcd <my-aws-account>.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
 /usr/local/bin/etcd --data-dir=/etcd-data \
 --name etcd0 \
 --advertise-client-urls http://127.0.0.1:2379 \
 --listen-client-urls http://0.0.0.0:2379 \
 --initial-advertise-peer-urls http://127.0.0.1:2380 \
 --listen-peer-urls http://0.0.0.0:2380 \
 --initial-cluster etcd0=http://127.0.0.1:2380

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable etcd
sudo systemctl start etcd

我怀疑这可能与Type=notify我使用 podman 或 etcd 的方式有关。我以与 etcd 文档中所述类似的方式启动 etcd:在容器内运行 etcd 集群 - 运行单个节点 etcd。我在 Debian 11 上使用 Podman 3.0.1 运行它。

关于如何使用 podman 作为 systemd 服务启动 etcd 的任何建议?

linux debian systemd podman etcd
  • 1 个回答
  • 221 Views
Martin Hope
Jonas
Asked: 2021-08-19 13:07:01 +0800 CST

如何从 systemd 在 docker 中启动 etcd?

  • 1

我想从 systemd 在 docker 中启动 etcd(单节点),但似乎出了点问题 - 它在启动后大约 30 秒被终止。

看起来服务以"activating"状态启动,但在大约 30 秒后终止,但未达到"active"状态。也许 docker 容器和 systemd 之间缺少任何信号?

更新(见帖子底部):systemd 服务状态达到failed (Result: timeout)- 当我删除Restart=on-failure指令时。

当我在启动后检查 etcd 服务的状态时,我得到了这个结果:

$ sudo systemctl status etcd● etcd.service - etcd   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Wed 2021-08-18 20:13:30 UTC; 4s ago
  Process: 2971 ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --advertise-client-urls http://10.0.0.11:2379 --listen-client-urls http://0.0.0.0:2379 --initial-advertise-peer-urls http://10.0.0.11:2380 --listen-peer-urls http://0.0.0.0:2380 --initial-cluster etcd0=http://10.0.0.11:2380 (code=exited, status=125)
 Main PID: 2971 (code=exited, status=125)

我在 Amazon Linux 2 机器上运行它,并在启动时运行用户数据脚本。我已经确认docker.service并docker_ecr_login.service成功运行。

机器启动后不久,我可以看到 etcd 正在运行:

 sudo systemctl status etcd
● etcd.service - etcd
   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: activating (start) since Wed 2021-08-18 20:30:07 UTC; 1min 20s ago
 Main PID: 1573 (docker)
    Tasks: 9
   Memory: 24.3M
   CGroup: /system.slice/etcd.service
           └─1573 /usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com...

Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.690Z","logger":"raft","caller":"...rm 2"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.691Z","caller":"etcdserver/serve..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"membership/clust..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"etcdserver/server.go:2...
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"api/capability.g..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"etcdserver/serve..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"embed/serve.go:9...ests"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.695Z","caller":"etcdmain/main.go...emon"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.695Z","caller":"etcdmain/main.go...emon"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.702Z","caller":"embed/serve.go:1...2379"}
Hint: Some lines were ellipsized, use -l to show in full.

无论 etcd 监听节点 IP (10.0.0.11) 还是 127.0.0.1,我都会得到相同的行为。

我可以在本地运行 etcd,从命令行开始(它不会在 30 秒后终止),使用:

sudo docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd-local \
my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
/usr/local/bin/etcd --data-dir=/etcd-data \
--name etcd0 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-client-urls http://0.0.0.0:2379 \
--initial-advertise-peer-urls http://127.0.0.1:2380 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd0=http://127.0.0.1:2380

etcd 的参数类似于运行单节点 etcd-ectd 3.5 文档。

这是用于启动 etcd 的启动脚本的相关部分:

sudo docker volume create --name etcd-data

cat <<EOF | sudo tee /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=docker_ecr_login.service

[Service]
Type=notify
ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data \
 --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
 /usr/local/bin/etcd --data-dir=/etcd-data \
 --name etcd0 \
 --advertise-client-urls http://10.0.0.11:2379 \
 --listen-client-urls http://0.0.0.0:2379 \
 --initial-advertise-peer-urls http://10.0.0.11:2380 \
 --listen-peer-urls http://0.0.0.0:2380 \
 --initial-cluster etcd0=http://10.0.0.11:2380
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable etcd
sudo systemctl start etcd

列出机器上的所有容器时,我可以看到它一直在运行:

sudo docker ps -a
CONTAINER ID   IMAGE                                                       COMMAND                  CREATED          STATUS                      PORTS                          NAMES
a744aed0beb1   my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0   "/usr/local/bin/etcd…"   25 minutes ago   Exited (0) 24 minutes ago                          etcd

但我怀疑它无法重新启动,因为容器名称已经存在。

从 systemd 启动时,为什么 etcd 容器会在大约 30 秒后终止?看起来它成功启动了,但 systemd 只显示它处于“激活”状态,但从未处于“激活”状态,并且它似乎在大约 30 秒后终止。从 etcd docker 容器到 systemd 是否缺少一些信号?如果是这样,我怎样才能让那个信号正确?


更新:

删除Restart=on-failure服务单元文件中的指令后,我现在得到 status: failed (Result: timeout):

$ sudo systemctl status etcd
● etcd.service - etcd
   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Wed 2021-08-18 21:35:54 UTC; 5min ago
  Process: 1567 ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://0.0.0.0:2379 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://0.0.0.0:2380 --initial-cluster etcd0=http://127.0.0.1:2380 (code=exited, status=0/SUCCESS)
 Main PID: 1567 (code=exited, status=0/SUCCESS)

Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.332Z","caller":"osutil/interrupt...ated"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.333Z","caller":"embed/etcd.go:36...379"]}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: WARNING: 2021/08/18 21:35:54 [core] grpc: addrConn.createTransport failed ...ing...
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.335Z","caller":"etcdserver/serve...6a6c"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.337Z","caller":"embed/etcd.go:56...2380"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.338Z","caller":"embed/etcd.go:56...2380"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.339Z","caller":"embed/etcd.go:36...379"]}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: Failed to start etcd.
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: Unit etcd.service entered failed state.
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: etcd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
linux docker systemd amazon-linux-2 etcd
  • 1 个回答
  • 694 Views
Martin Hope
mway-niels
Asked: 2021-07-07 22:04:33 +0800 CST

Kubernetes:kubeadm join 在私有网络中失败

  • 0

我正在尝试按照本指南在 Hetzner Cloud 上设置 HA Kubernetes 集群。我创建了 6 台服务器、3 台控制平面主机和 3 台工作人员。尝试使用 kubeadm 将第二台服务器加入集群时,出现以下错误:

在 k8s-server-1 上:

Jul 06 14:09:01 k8s-server-1 kubelet[8059]: E0706 14:09:01.430599    8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:54 k8s-server-1 kubelet[8059]: E0706 14:08:54.370142    8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:51 k8s-server-1 kubelet[8059]: E0706 14:08:51.762075    8059 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-1\": Get \"https://my.kubernetes.test:6443/api/v1/nodes/k8s-server-1?resourceVersion=0&timeout=10s\": context deadline exceeded"
Jul 06 14:08:47 k8s-server-1 kubelet[8059]: E0706 14:08:47.325309    8059 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-k8s-server-1.168f32516b37209a", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-apiserver-k8s-server-1", UID:"10b8928a4f8e5e0b449a40ab35a3efdc", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver}"}, Reason:"Unhealthy", Message:"Readiness probe failed: HTTP probe failed with statuscode: 500", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-1"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd0ee49429a, ext:115787424848, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd16f1a0a1d, ext:117801107410, loc:(*time.Location)(0x74c3600)}}, Count:2, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/events/kube-apiserver-k8s-server-1.168f32516b37209a": read tcp 192.168.178.2:60934->192.168.178.8:6443: use of closed network connection'(may retry after sleeping)
Jul 06 14:08:47 k8s-server-1 kubelet[8059]: E0706 14:08:47.324053    8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:46 k8s-server-1 kubelet[8059]: I0706 14:08:46.986663    8059 status_manager.go:566] "Failed to get status for pod" podUID=10b8928a4f8e5e0b449a40ab35a3efdc pod="kube-system/kube-apiserver-k8s-server-1" error="etcdserver: request timed out"

在 k8s-server-2 上:

Jul 06 14:09:04 k8s-server-2 kubelet[6685]: E0706 14:09:04.072247    6685 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"weave-net-9fldg.168f3252093de42e", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"weave-net-9fldg", UID:"88743b7a-aa81-4948-be9b-78c4bbf436fe", APIVersion:"v1", ResourceVersion:"714", FieldPath:"spec.initContainers{weave-init}"}, Reason:"Pulled", Message:"Successfully pulled image \"docker.io/weaveworks/weave-kube:2.8.1\" in 6.525660057s", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Jul 06 14:08:57 k8s-server-2 kubelet[6685]: E0706 14:08:57.993540    6685 controller.go:144] failed to ensure lease exists, will retry in 400ms, error: Get "https://my.kubernetes.test:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-server-2?timeout=10s": context deadline exceeded
Jul 06 14:08:57 k8s-server-2 kubelet[6685]: I0706 14:08:57.352989    6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.992481    6685 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"weave-net-9fldg.168f3252093de42e", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"weave-net-9fldg", UID:"88743b7a-aa81-4948-be9b-78c4bbf436fe", APIVersion:"v1", ResourceVersion:"714", FieldPath:"spec.initContainers{weave-init}"}, Reason:"Pulled", Message:"Successfully pulled image \"docker.io/weaveworks/weave-kube:2.8.1\" in 6.525660057s", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/events": read tcp 192.168.178.3:47722->192.168.178.8:6443: use of closed network connection'(may retry after sleeping)
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.990109    6685 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-2\": Get \"https://my.kubernetes.test:6443/api/v1/nodes/k8s-server-2?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: I0706 14:08:56.989160    6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.988865    6685 kubelet.go:1683] "Failed creating a mirror pod for" err="Post \"https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/pods\": read tcp 192.168.178.3:47722->192.168.178.8:6443: use of closed network connection" pod="kube-system/etcd-k8s-server-2"
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: E0706 14:08:54.210098    6685 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-k8s-server-2_kube-system(22b3a914daf1bef98cb01ddd7868523d)\"" pod="kube-system/etcd-k8s-server-2" podUID=22b3a914daf1bef98cb01ddd7868523d
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: I0706 14:08:54.208472    6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: E0706 14:08:54.208199    6685 kubelet.go:1683] "Failed creating a mirror pod for" err="rpc error: code = Unknown desc = context deadline exceeded" pod="kube-system/etcd-k8s-server-2"
Jul 06 14:08:53 k8s-server-2 kubelet[6685]: E0706 14:08:53.347043    6685 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-proxy-2z5js.168f3250c7fc2120", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-proxy-2z5js", UID:"0ac8fe5d-7332-4a4d-abee-48c6d4dee38f", APIVersion:"v1", ResourceVersion:"711", FieldPath:"spec.containers{kube-proxy}"}, Reason:"Started", Message:"Started container kube-proxy", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd04243d720, ext:5783805064, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd04243d720, ext:5783805064, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Jul 06 14:08:53 k8s-server-2 kubelet[6685]: I0706 14:08:53.269542    6685 scope.go:111] "RemoveContainer" containerID="e2664d16d53ff5ae6de27fe52e84651791bca1ca70a6987c9a4e3e7318eaa174"
Jul 06 14:08:47 k8s-server-2 kubelet[6685]: I0706 14:08:47.194425    6685 scope.go:111] "RemoveContainer" containerID="7aaa63419740b5e30cc76770abc92dfbabe1f48d4d812b4abc89168f73e46d51"
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: I0706 14:08:46.987598    6685 status_manager.go:566] "Failed to get status for pod" podUID=778e041efc75c1983cbb59f2b3d46d09 pod="kube-system/kube-controller-manager-k8s-server-2" error="etcdserver: request timed out"
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: E0706 14:08:46.986807    6685 controller.go:144] failed to ensure lease exists, will retry in 200ms, error: etcdserver: request timed out
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: E0706 14:08:46.986800    6685 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-2\": etcdserver: request timed out"

服务器列表:| 姓名 | 公共IP | 私有IP | | --- | --- | --- | | k8s-服务器-1 | 192.168.178.2 | 10.23.1.2 | | k8s-服务器-2 | 192.168.178.3 | 10.23.1.3 | | k8s-服务器-3 | 192.168.178.4 | 10.23.1.4 | | k8s-worker-1 | 192.168.178.5 | 10.23.1.5 | | k8s-worker-2 | 192.168.178.6 | 10.23.1.6 | | k8s-worker-3 | 192.168.178.7 | 10.23.1.7 |

此外,k8s-server-* 应用了以下防火墙规则(仅适用于通过公共 IP 路由的流量,不适用于私有网络内部): | 方向 | 港口 | 来源/目的地 | | --- | --- | --- | | 入口 | 80 | 任何| | 入口 | 第443章 任何| | 入口 | 22 | 静态公司IP | | 入口 | 6443 | 静态公司IP | | 出口 | 任何| 任何|

在同一网络中有一个负载均衡器,将流量路由到 k8s-server-1。它的公共 IP 是 192.168.178.8,私有 IP 是 10.23.1.8。

我在两个节点上运行的内容:

apt-get update
apt-get install     apt-transport-https     ca-certificates     curl     gnupg     lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo   "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io
systemctl enable docker.service
systemctl enable containerd.service
cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

systemctl enable docker
systemctl daemon-reload
systemctl restart docker

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sysctl --system

apt-get update
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

...在服务器 1 上:

kubeadm config images pull
kubeadm init --apiserver-advertise-address=10.23.1.2 --control-plane-endpoint "my.kubernetes.test:6443" --upload-certs

mkdir ~/.kube
cp /etc/kubernetes/admin.conf ~/.kube/config

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
watch kubectl get pod -n kube-system
watch kubectl get nodes

...在服务器 2 上:

kubeadm config images pull
kubeadm join my.kubernetes.test:6443 --token XXXXX.XXXXX --discovery-token-ca-cert-hash sha256:XXXXXXXXXX --control-plane --certificate-key XXXXXXXXXX
vpn networking kubernetes kubeadm etcd
  • 1 个回答
  • 2754 Views
Martin Hope
Tùng Nguyễn
Asked: 2021-05-24 21:37:05 +0800 CST

Centos 7,HA postgresql12,带有 etcd v3.4 的赞助人

  • 0

我遵循了这个文档,但不知道如何启用 v2 以便赞助人可以使用,有人可以帮忙吗? https://computingforgeeks.com/setup-etcd-cluster-on-centos-debian-ubuntu/

high-availability centos7 etcd patroni
  • 1 个回答
  • 287 Views
Martin Hope
sfgroups
Asked: 2021-02-19 05:37:18 +0800 CST

用于 Kubernetes 外部数据库设置的 ETCD 数据库集群证书更新

  • 0

我已经使用 etcdadm 工具etcdctl version: 3.4.7为我的 Kubernetes集群部署了 3 节点外部 ETCD 数据库 ( ) 集群。v1.18.6我的证书将在几个月后到期。

我相信kubeadm alpha certs renew all命令会更新 Kubernetes 证书。请问知道更新外部ETCD数据库集群证书的正确步骤吗?

我的集群证书详细信息

# kubeadm alpha certs check-expiration 

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jul 20, 2021 14:13 UTC   152d                                    no
apiserver                  Jul 20, 2021 14:13 UTC   152d            ca                      no
apiserver-kubelet-client   Jul 20, 2021 14:13 UTC   152d            ca                      no
controller-manager.conf    Jul 20, 2021 14:13 UTC   152d                                    no
front-proxy-client         Jul 20, 2021 14:13 UTC   152d            front-proxy-ca          no
scheduler.conf             Jul 20, 2021 14:13 UTC   152d                                    no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Apr 17, 2030 01:19 UTC   9y              no
front-proxy-ca          Apr 17, 2030 01:19 UTC   9y              no

主节点证书详细信息

/etc/kubernetes/pki/ca.crt,             Apr 17 01:19:52 2030 GMT
/etc/kubernetes/pki/apiserver.crt,             Jul 20 14:13:09 2021 GMT
/etc/kubernetes/pki/apiserver-kubelet-client.crt,             Jul 20 14:13:10 2021 GMT
/etc/kubernetes/pki/front-proxy-ca.crt,             Apr 17 01:19:52 2030 GMT
/etc/kubernetes/pki/front-proxy-client.crt,             Jul 20 14:13:10 2021 GMT


/etc/etcd/pki/ca.crt,             Apr 17 01:19:35 2030 GMT
/etc/etcd/pki/server.crt,             Apr 19 01:19:36 2021 GMT
/etc/etcd/pki/peer.crt,             Apr 19 01:19:36 2021 GMT
/etc/etcd/pki/etcdctl-etcd-client.crt,             Apr 19 01:19:36 2021 GMT
/etc/etcd/pki/apiserver-etcd-client.crt,             Apr 19 01:19:36 2021 GMT

谢谢

kubernetes etcd
  • 1 个回答
  • 1466 Views
Martin Hope
xpepermint
Asked: 2021-01-20 01:38:43 +0800 CST

Kubernetes API:比较和更新配置映射键

  • 0

Etcd 有一个概念,Atomic Compare-and-Update即在执行更新之前比较键的值。我想使用此功能更新ConfigMap我的 Kubernetes 集群中的 a。仅当现有配置映射数据或特定数据键与某个值匹配时,我才想更新配置映射。

示例 ConfigMap:

curl -X POST -H 'Content-Type: application/json' \
    -d '{"apiVersion": "v1", "kind": "ConfigMap", "metadata": {"name": "test"}, "data": {"foo": "1"}}' \
    http://localhost:8001/api/v1/namespaces/default/configmaps

如果可能的话,我需要与 K8S API 或直接与 K8S 交互etcd(是吗?),我不想依赖resourceVersion. 我想依赖我自己的版本,它实际上是配置映射的数据键。我怎样才能实现这样的原子更新(或删除)操作?

kubernetes etcd
  • 1 个回答
  • 369 Views
Martin Hope
jagatjyoti
Asked: 2021-01-18 23:04:49 +0800 CST

升级在 docker 容器内运行的多 etcd 集群

  • 0

目前我的 k8s 集群已开启v1.16.x,我想将其升级到v1.17.xETCD 必须升级到 3.4(当前为 3.3)。我的设置有点复杂,因为我在主节点之外运行 ETCD,它是一个3 节点 etcd 集群,作为容器在 3 个单独的 EC2 中运行。

我知道有关于将 ETCD 从 3.3 升级到 3.4 的简洁文档,但它没有描述当它在容器内运行时如何完成。花了相当多的时间在谷歌上搜索它,但没有运气。Kubeadm 没有太大帮助,因为 kubeadm 计划没有显示 ETCD 的主要版本升级。

我认为进行备份然后更改清单中的图像版本会有所帮助,但不太确定。

请问有人可以指导我吗?

amazon-web-services docker kubernetes etcd
  • 2 个回答
  • 88 Views
Martin Hope
Andrew Striletskyi
Asked: 2020-08-08 04:35:00 +0800 CST

etcd 的 TLS 握手问题

  • 1

我们正在为 k8s 集群使用外部 etcd 集群。我们将 master 连接到此 etcd 服务器,但收到

"tls: first record does not look like a TLS handshake"

如何解决这个问题?(对于 eksctl 方面,所有在具有相同证书的 etcd 服务器上都可以正常工作)

ETCDCTL_API=3 /usr/local/bin/etcdctl member list   --endpoints=https://127.0.0.1:2379   --cacert=/etc/etcd/ca.crt   --cert=/etc/etcd/etcd-server.crt   --key=/etc/etcd/etcd-server.key
    b1fa8ebad0f4fa6, started, etcd-kube-cluster-1, https://10.105.113.*:2380, https://10.105.113.*:2379, false
    984a08591dda4911, started, etcd-kube-cluster-3, https://10.105.114.*:2380, https://10.105.114.*:2379, false
    b55b37a2544c7daa, started, etcd-kube-cluster-2, https://10.105.113.*:2380, https://10.105.113.*:2379, false

Kube-api 服务器清单更新为相同的证书

kubernetes etcd
  • 1 个回答
  • 1669 Views
Martin Hope
srinu259
Asked: 2020-04-26 04:33:21 +0800 CST

为什么 kubernetes kube-api server 需要 etcd-keyfile 和 kubelet-client-key

  • 2

据我了解,kube-api 服务器在与 ETCD 和 Kubelet 通信时充当客户端。ETCD 和 Kubelet 都充当 kube-api 的服务器。在安全环境下(双向 SSL 认证),kube-api 服务器需要 ETCD 和 Kubelet 证书以及 CA 证书。我不明白的是为什么我们在配置 kube-apiserver.yaml 时需要提供 ETCD (etcd-keyfile) 和 Kubelet (kubelet-client-key) 的私钥?

kubernetes etcd
  • 1 个回答
  • 160 Views
Martin Hope
uav
Asked: 2020-02-05 11:02:32 +0800 CST

自己的 Kubernetes etcd 集群

  • 1

我想在两个位置(距离 300 公里)构建自己的 Kubernetes 集群并将其集成到 GitLab。

让我列出我的想法。我的问题是我的想法是否有错误并要求解决它。

  1. 由于我只能设置虚拟机并且没有直接在主机上的权限,因此我想在 5 个虚拟机(3+2)上安装一个 etcd-cluster。我会在 Ubuntu 18.04 上使用 apt 安装 etcd。为此,我一开始不需要 Kubernetes。

  2. 奇数个实例仅适用于 etcd 而不适用于控制平面?

  3. 为控制平面设置单独的虚拟机是否有意义,或者我可以重用 etcd 集群的 3+2 个虚拟机吗?否则我已经有 10 个虚拟机了。

kubernetes etcd
  • 2 个回答
  • 293 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve