关于【kubernetes】的问题- 第1页

E. Jaep

Asked: 2025-04-09 18:45:05 +0800 CST

从 pod 在节点上安装软件包

6

免责声明

我知道这不是我们在 Kubernetes 集群上应该工作的方式，而且我们想要执行的操作可能会带来安全风险

背景

一位同事休假了，忘记记录我们 Kubernetes 集群节点的 root 密码了。现在我必须在节点上安装一个简单的软件包（nfs-common），才能挂载 NFS 卷。

我正在尝试chroot在已安装的主机文件系统上安装一个软件包。

为了做到这一点，我创建了以下 pod：

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-pod
spec:
  containers:
  - name: ubuntu-container
    image: ubuntu:24.04
    command: ["/bin/bash", "-c", "while true; do sleep 30; done;"]
    volumeMounts:
    - name: host-root
      mountPath: /hostfs
    securityContext:
      privileged: true
      capabilities:
        add:
          - SYS_ADMIN
          - SYS_RESOURCE
          - SYS_NICE
          - SYS_PTRACE
          - SYS_BOOT
          - SYS_MODULE
          - SYS_RAWIO
          - SYS_PACCT
          - SYS_NICE
          - SYS_TIME
          - SYS_TTY_CONFIG
          - SYSLOG
          - NET_ADMIN
  hostPID: true
  volumes:
  - name: host-root
    hostPath:
      path: /
  restartPolicy: Never

一旦 pod 启动，我就可以进入其中chroot /hostfs /bin/bash。

但是，apt update && apt install -y nfs-common失败并出现以下错误：

Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Ign:4 http://archive.ubuntu.com/ubuntu jammy-security InRelease
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Ign:4 http://archive.ubuntu.com/ubuntu jammy-security InRelease
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Ign:4 http://archive.ubuntu.com/ubuntu jammy-security InRelease
Err:1 http://archive.ubuntu.com/ubuntu jammy InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu jammy-security InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
29 packages can be upgraded. Run 'apt list --upgradable' to see them.

我相信我缺少一种能力，但我无法弄清楚。

Amit Thakur

Asked: 2025-04-05 14:22:20 +0800 CST

Digital Ocean kubeconfig 保存突然停止工作并出现错误 403

5

Digital Ocean API 一直以来都相当不稳定。他们不断随机地向 API 添加新的访问控制，并且完全不具备向后兼容性。

这次我想做的是：

doctl auth remove --context default
doctl auth init
doctl kubernetes cluster kubeconfig save mycluster

但它只是失败并出现错误：

错误：

GET https://api.digitalocean.com/v2/kubernetes/clusters/***** 403 (request "*****" ) You are not authorized to perform this operation.

我们之前一直运行的自动化系统突然停止工作了。我现在需要弄清楚该给新令牌分配什么权限，并将该令牌传播到所有自动化系统。

如果您知道需要启用哪些设置，请告诉我。感谢您的努力！

khteh

Asked: 2025-04-04 11:19:47 +0800 CST

运行 Ollama 作为 k8s STS，使用外部脚本作为入口点来加载模型

5

我设法将 Ollama 作为 k8s STS 运行。我将其用于 Python Langchain LLM/RAG 应用程序。但是，以下 DockerfileENTRYPOINT脚本尝试MODELS从 k8s STS 清单中提取作为 ENV 导出的图像列表时遇到问题。Dockerfile 具有以下ENTRYPOINT内容CMD：

ENTRYPOINT ["/usr/local/bin/run.sh"]
CMD ["bash"]

run.sh：

#!/bin/bash
set -x
ollama serve&
sleep 10
models="${MODELS//,/ }"
for i in "${models[@]}"; do \
      echo model: $i  \
      ollama pull $i \
    done

k8s日志：

+ models=llama3.2
/usr/local/bin/run.sh: line 10: syntax error: unexpected end of file

David Maze 的解决方案：

          lifecycle:
            postStart:
              exec:
                command:
                  - bash
                  - -c
                  - |
                    for i in $(seq 10); do
                      ollama ps && break
                      sleep 1
                    done
                    for model in ${MODELS//,/ }; do
                      ollama pull "$model"
                    done

ollama-0          1/2     CrashLoopBackOff     4 (3s ago)        115s
ollama-1          1/2     CrashLoopBackOff     4 (1s ago)        115s

  Warning  FailedPostStartHook  106s (x3 over 2m14s)  kubelet            PostStartHook failed

$ k logs -fp ollama-0
Defaulted container "ollama" out of: ollama, fluentd
Error: unknown command "ollama" for "ollama"

更新Dockerfile：

ENTRYPOINT ["/bin/ollama"]
#CMD ["bash"]
CMD ["ollama", "serve"]

我需要定制Dockerfile以便可以安装 Nvidia Container Toolkit。

khteh

Asked: 2025-04-03 20:24:32 +0800 CST

Dockerfile 和容器从 k8s Statefulset 获取 ENV

4

我需要在容器启动之前处理来自 k8s Statefulset 的 ENV Dockerfile。：

RUN echo CREDENTIALS: $CREDENTIALS
ARG user="${CREDENTIALS%/*}"
ARG password="${CREDENTIALS#*/}"
ENV USER $user
ENV PASSWORD $password

Statefulset环境：

          env:
            - name: CREDENTIALS
              valueFrom:
                secretKeyRef:
                  name: app-secret
                  key: CREDENTIALS

当我使用运行时d run -dt -e CREDENTIALS="user/P@$$w0rd" myimage:latest，PASSWORD丢失了。

voipp

Asked: 2025-02-26 21:41:08 +0800 CST

如何正确地转义 ansible 命令中的符号？

6

我想在每个节点上执行命令：

docker ps --format '{{.Names}}' -a | egrep -v '^k8s.*$'

我尝试了数百万种在 ansible 中执行命令的变体，其中包括：

- hosts: kubernetes
  tasks:
    - name: check docker
      command:
        cmd: docker ps --format '{{.Names}}' -a | egrep -v '^k8s.*$'
      register: doc

    - debug: var=doc.stdout_lines

我尝试转义字符。但没有任何效果。那么，如何让 ansible 在每个主机上执行我的 docker 命令？

PS 我想列出不受 k8s 控制的容器

Tucan

Asked: 2025-02-25 02:41:21 +0800 CST

将环境变量从 k8 secret store 传递到 docker 镜像

5

如何扩展来自机密存储的环境变量并将其传递到 docker 容器内？所述 docker 容器没有 shell，因此无法运行脚本。这是示例 yaml 文件

        envFrom:
        - secretRef:
            name: secret     
        command: ["my-command"]
        args:
          - "--env=ENV1=${MY_ENV_VAR1}"
          - "--env=env2=${MY_ENV_VAR2}"

ninjab3s

Asked: 2025-02-12 20:15:33 +0800 CST

Kubernetes PostStartHook 使用 curl 失败

5

我尝试让 postStart 钩子在容器中工作，但总是失败。我收到的错误如下：

kubelet[1057]: E0212 11:07:20.205922    1057 handlers.go:78] "Exec lifecycle hook for Container in Pod failed" err=<
kubelet[1057]:         command 'curl -H 'Content-Type: application/json' -d '{ \"restarted\": True}' -X POST http://localhost:5000/restarted' exited with 2: curl: (2) no URL specified
kubelet[1057]:         curl: try 'curl --help' or 'curl --manual' for more information
kubelet[1057]:  > execCommand=[curl -H 'Content-Type: application/json' -d '{ \"restarted\": True}' -X POST http://localhost:5000/restarted] containerName="srsran-cu-du" pod="srsran/srsran-project-cudu-chart-78f658b865-pjvt2" message=<
kubelet[1057]:         curl: (2) no URL specified
kubelet[1057]:         curl: try 'curl --help' or 'curl --manual' for more information
kubelet[1057]:  >

我的清单中的钩子如下所示：

lifecycle:
  postStart:
    exec:
      command: [ "curl", "-H",  "'Content-Type: application/json'", "-d", "'{ \"restarted\": True}'", "-X", "POST http://localhost:5000/restarted" ]

其呈现为curl -H 'Content-Type: application/json' -d '{ \"restarted\": True}' -X POST http://localhost:5000/restarted。

如果我直接在容器中运行 curl 命令，它会正常工作。但是当通过 posStart 钩子运行它时，它不起作用。我做错了什么？

我尝试过用替换'但\\\"也没有用。

MischievousChild

Asked: 2025-01-22 21:40:27 +0800 CST

PodDisruptionBudget 与 HPA 中的最小副本数之间的差异

5

Kubernetes 引入了PodDisruptionBudget，它可以防止我们的应用程序的功能因手动和自动垂直（节点）扩展而中断。

假设我们正在使用HozirontalPodAutoscaler。另外使用 PDB 的价值是什么？PodDisruptionBudget minAvailable和HozirontalPodAutoscaler minReplicas之间有什么区别？

Baiqing

Asked: 2025-01-18 06:01:54 +0800 CST

Flink Kubernetes S3 状态支持

5

一直在查看 Flink Kubernetes Operator v1.10 的文档，有没有办法预先配置集群，以便所有提交的作业都将使用带有预定义 s3 路径的 rocksdb 状态？要实现这一点需要什么？我一直在尝试使用 S3 后端设置作业，但它说不支持 s3 后端，我需要启用 s3 插件，但我不确定该怎么做。

Milenko Markovic

Asked: 2024-12-27 20:06:29 +0800 CST

为什么一个 pod 发生 CrashLoopBackOff 而另一个 pod 运行正常？

5

我已经部署，但随着时间的推移，出现了错误。

NAME                                  READY   STATUS             RESTARTS        AGE
pod/picanagm-solution-5cb8887968-qk4pr   0/1     CrashLoopBackOff   140 (78s ago)   11h
pod/picanagm-solution-77f5fcfdc-kwd9w    1/1     Running            0               2d20h

失败 Pod 的日志

Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Warning  BackOff  2m42s (x3258 over 11h)  kubelet  Back-off restarting failed container

和

Picked up JAVA_TOOL_OPTIONS: -Dlogging.config=/app/run/logback.xml -DcontentServer.factory-reset=folder -DcontentServer.factory-reset.folder-name=file:/app/bookmarks -DSameSite=none -Dconfiguration.sign-off.enabled=true -Ddata.extraction.templates.base.dir.path=${java.io.tmpdir} --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED

I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://10.244.4.81:8778/jolokia/
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/activeviam/MINDZ/starter/main/MINDZApplication has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0
    at java.base/java.lang.ClassLoader.defineClass1(Native Method)
    at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)

一个副本集正常，另一个副本集尚未准备好

replicaset.apps/picanagm-solution-5cb8887968   1         1         0       11h

为什么？如何调试此行为？

从 pod 在节点上安装软件包

Digital Ocean kubeconfig 保存突然停止工作并出现错误 403

运行 Ollama 作为 k8s STS，使用外部脚本作为入口点来加载模型

Dockerfile 和容器从 k8s Statefulset 获取 ENV

如何正确地转义 ansible 命令中的符号？

将环境变量从 k8 secret store 传递到 docker 镜像

Kubernetes PostStartHook 使用 curl 失败

PodDisruptionBudget 与 HPA 中的最小副本数之间的差异

Flink Kubernetes S3 状态支持

为什么一个 pod 发生 CrashLoopBackOff 而另一个 pod 运行正常？

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

问题[kubernetes](coding)