如何从执行容器的节点访问容器的目录?
我对主节点具有 root 访问权限。现在我想看看pod/container的/etc
目录。etcd
kubectl exec
没有帮助,因为它是一个没有发行版的映像,它缺少常用的 shell 工具,例如ls
和tar
.
我用containerd
如何从执行容器的节点访问容器的目录?
我对主节点具有 root 访问权限。现在我想看看pod/container的/etc
目录。etcd
kubectl exec
没有帮助,因为它是一个没有发行版的映像,它缺少常用的 shell 工具,例如ls
和tar
.
我用containerd
使用 docker 我会运行,但你如何使用/docker login
做类似的事情?ctr
containerd
由于达到速率限制,我们需要登录:
ctr:复制失败:httpReaderSeeker:打开失败:意外状态代码https://registry-1.docker.io/v2/library/[...]:429请求太多 - 服务器消息:toomanyrequests:您已达到您的拉速率限制。您可以通过身份验证和升级来增加限制:https ://www.docker.com/increase-rate-limit
遵循containerd文档/etc/containerd/config.toml
:
version = 2
[plugins."io.containerd.grpc.v1.cri".registry.configs."docker.io".auth]
username = "myusername"
password = "mypassword"
似乎不起作用。
我已经按照这个官方教程允许裸机 k8s 集群具有 GPU 访问权限。但是我在这样做时收到了错误。
Kubernetes 1.21 containerd 1.4.11 和 Ubuntu 20.04.3 LTS(GNU/Linux 5.4.0-91-generic x86_64)。
Nvidia 驱动程序预装在系统操作系统上,版本为 495 Headless
将以下配置粘贴到里面/etc/containerd/config.toml
并执行服务重启后,containerd 将无法以exit 1
.
容器化配置.toml
系统日志在这里。
# persistent data location
root = "/var/lib/containerd"
# runtime state information
state = "/run/containerd"
# Kubernetes doesn't use containerd restart manager.
disabled_plugins = ["restart"]
# NVIDIA CONFIG START HERE
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
# NVIDIA CONFIG ENDS HERE
[debug]
level = ""
[grpc]
max_recv_message_size = 16777216
max_send_message_size = 16777216
[plugins.linux]
shim = "/usr/bin/containerd-shim"
runtime = "/usr/bin/runc"
我可以确认 Nvidia Driver 确实通过运行检测到 GPU(Nvidia GTX 750Ti)nvidia-smi
并得到以下输出
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 34% 34C P8 1W / 38W | 0MiB / 2000MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
修改了config.toml让它工作。
由于某种原因,我不得不构建一个没有 Internet 连接的裸机 Kubernetes 集群。
由于 dockershim 已被弃用,我决定使用 containerd 作为 CRI,但kubeadm init
由于超时,使用 kubeadm 离线安装在执行时失败。
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
由于以下原因,我可以看到很多错误日志journalctl -u kubelet -f
:
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.473188 9299 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://133.117.20.57:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/rhel8?timeout=10s": dial tcp 133.117.20.57:6443: connect: connection refused
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.533555 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: I1124 16:25:25.588986 9299 kubelet_node_status.go:71] "Attempting to register node" node="rhel8"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.589379 9299 kubelet_node_status.go:93] "Unable to register node with API server" err="Post \"https://133.117.20.57:6443/api/v1/nodes\": dial tcp 133.117.20.57:6443: connect: connection refused" node="rhel8"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.634625 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.735613 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.835815 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.936552 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.036989 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.137464 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.238594 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.338704 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.394465 9299 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"rhel8.16ba6aab63e58bd8", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"rhel8", UID:"rhel8", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"rhel8"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05f9812b2b227d8, ext:5706873656, loc:(*time.Location)(0x55a228f25680)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05f9812b2b227d8, ext:5706873656, loc:(*time.Location)(0x55a228f25680)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://133.117.20.57:6443/api/v1/namespaces/default/events": dial tcp 133.117.20.57:6443: connect: connection refused'(may retry after sleeping)
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.143503 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.244526 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302890 9299 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302949 9299 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" pod="kube-system/kube-scheduler-rhel8"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302989 9299 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" pod="kube-system/kube-scheduler-rhel8"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.303080 9299 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-rhel8_kube-system(e5616b23d0312e4995fcb768f04aabbb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-rhel8_kube-system(e5616b23d0312e4995fcb768f04aabbb)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.2\\\": failed to pull image \\\"k8s.gcr.io/pause:3.2\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.2\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.2\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.2\\\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused\"" pod="kube-system/kube-scheduler-rhel8" podUID=e5616b23d0312e4995fcb768f04aabbb
当我对 Internet 连接执行相同操作时,安装成功。并且当使用 docker 代替 containerd 时,即使没有 Internet 连接也可以成功完成安装。
我有一个带有 Docker 的 Kubernetes 集群,最近将其迁移到 containerd,但由于一些兼容性问题,我仍然想使用 Docker 来管理 Kubernetes 的镜像和容器。
当使用 Docker 作为运行时时,Docker 能够加载镜像以便 Kubernetes 可以使用它,并且能够使用 docker ps 命令列出作为 Kubernetes pod 运行的容器。
即使切换到 containerd,我仍然可以运行和使用 Docker。然而,由于 Docker 与 Kubernetes 世界隔离,因此无法使用 docker 命令管理 Kubernetes 中的资源。
似乎 Kubernetes 正在使用 containerd 的命名空间“k8s.io”运行,所以我希望我可以配置 Docker 来管理该命名空间中的资源,这可能吗?
我正在尝试在裸机服务器(RHEL8)中使用 containerd 构建 kubernetes。
没有互联网连接,所以我手动下载了所需的图像(例如 k8s.gcr.io/kube-scheduler:v1.22.1)并使用“ctr image import”加载它们。
图像似乎已成功加载。
#ctr images ls -q
k8s.gcr.io/coredns/coredns:v1.8.4
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/kube-apiserver:v1.22.1
k8s.gcr.io/kube-controller-manager:v1.22.1
k8s.gcr.io/kube-proxy:v1.22.1
k8s.gcr.io/kube-scheduler:v1.22.1
k8s.gcr.io/pause:3.5
然后我执行了“kubeadm init”,但由于 ImagePull 错误而失败。
#kubeadm init --kubernetes-version=1.22.1 --cri-socket=/run/containerd/containerd.sock
[init] Using Kubernetes version: v1.22.1
[preflight] Running pre-flight checks
[WARNING FileExisting-tc]: tc not found in system path
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
如何让 kubeadm 使用本地镜像?还是可以忽略这些预检错误?
编辑:这个过程(手动加载图像而不是执行 kubeadm config images pull)在 docker 和 CentOS7 中运行良好。
我使用建议的命令在 Amazon Linux 2 上安装了containerd:
sudo amazon-linux-extras enable docker
sudo yum install -y containerd
我在EC2 用户数据脚本中添加了它以在实例启动时运行。
但是,我应该如何将containerd
(容器运行时 - 类似于 docker)作为服务启动?由于我通过yum
那里安装似乎不包含systemd 服务文件。二进制文件位于/usr/bin/containerd
. 我应该echo
在引导脚本中使用来生成systemd 服务文件还是一个好的做法?
我运行一个 Kubernetes 集群,安装了kubeadm
. 我最近从 1.19 升级到 1.20 并将容器运行时从 迁移docker
到containerd
,因为docker
现在已弃用。
我配置containerd
并kubelet
使用它,并docker
从所有节点卸载。一切似乎都运行良好。
今天,我尝试从 1.20 升级到 1.21,但运行时收到两个警告,kubeadm upgrade plan
这让我认为containerd
过渡尚未完成:
它尝试使用 docker:
cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
kubeadm
似乎不知道我们不再使用 docker,但是我没有在文档或本地 conf 中找到正确的选项,除非--cri-socket
它不适用于kubeadm upgrade
.它不检测 cgroup 驱动程序设置:
The 'cgroupDriver' value in the KubeletConfiguration is empty. Starting from 1.22, 'kubeadm upgrade' will default an empty value to the 'systemd' cgroup driver. The cgroup driver between the container runtime and the kubelet must match!
这真的很令人惊讶,因为我有cgroupDriver: systemd
in kubectl -n kube-system get cm kubelet-config-1.20 -o yaml
,in , and中/var/lib/kubelet/config.yaml
还有标志,它甚至被打印出来了!/etc/default/kubelet
/var/lib/kubelet/kubeadm-flags.env
kubeadm --v=10
如何确定是否存在潜在的配置问题,或者我是否可以安全地忽略这些警告?
我不确定哪些文件、configmap 或日志可能对帮助我解决这个问题有用,但如果需要,我很乐意提供它们。
我有安装了 docker 和 containerd 的 Kubernetes 节点。我需要节点上的 docker 来运行 CI 管道和构建。如何让 Kubernetes 使用推荐的 containerd 而不是 docker?现有文档建议从系统中删除 docker,这对我的情况来说是不可取的。
有没有办法强制 Kubernetes 在安装两者时使用 containerd 作为容器运行时而不是 docker?
今天早上我在红帽(Linux 4.18.0-240.1.1.el8_3.x86_64)上的更新/升级都遇到了这个问题,不知道该怎么办。听起来 yum 因此完全被卡住了。
$ sudo yum update
Updating Subscription Management repositories.
Last metadata expiration check: 2:40:27 ago on Wed 16 Dec 2020 07:53:10 AM CST.
Error:
Problem: package docker-ce-3:20.10.1-3.el7.x86_64 requires containerd.io >= 1.4.1, but none of the providers can be installed
- cannot install the best update candidate for package docker-ce-3:19.03.14-3.el7.x86_64
- package containerd.io-1.4.3-3.1.el7.x86_64 is filtered out by modular filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)