关于【google-kubernetes-engine】的问题- 第1页

danidemi

Asked: 2024-02-06 16:31:56 +0800 CST

GCP Workload Identity 适用于某些工作负载，而不适用于其他工作负载，即使 K8s 服务帐户相同

5

我们将微服务部署在两个不同的 GKE 集群中，一个用于测试，另一个用于生产。

我们的工作负载利用工作负载身份。在“测试环境”中一切正常，所有工作负载共享已绑定到 GCP 服务帐户的同一个 Kubernetes 服务帐户。

在“生产环境”中，集群由三个节点池支持（我包含此信息是为了完整性，但我不确定它是否重要），并且我们在工作负载身份方面存在问题。

在生产环境中，在某些容器中，如果我们使用 shell 获取元数据或者使用 gcloud，我们会意外地发现当前用户是与节点关联的用户，而不是来自工作负载身份的用户。对于其他 pod，工作负载身份会按预期工作。

另一个可能有趣的事情是，只有最近通过新部署添加的 Pod 似乎受到薄“错误配置”的影响。

我不知道如何调查这个问题。你有什么主意吗？

提前谢谢。

BPDev

Asked: 2023-03-17 11:15:18 +0800 CST

无法使用 PersistentVolumeClaim 重建部署

6

我想创建一个带有 PersistentVolumeClaim 的 MongoDB 部署。

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: auth-mongo-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth-mongo-depl
spec:
  selector:
    matchLabels:
      app: auth-mongo-pod-label
  template:
    metadata:
      labels:
        app: auth-mongo-pod-label
    spec:
      containers:
        - name: auth-mongo-pod
          image: mongo
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: auth-mongo-volume
              mountPath: /data/db
      volumes:
        - name: auth-mongo-volume
          persistentVolumeClaim:
            claimName: auth-mongo-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: auth-mongo-srv
spec:
  selector:
    app: auth-mongo-pod-label
  ports:
    - protocol: TCP
      port: 27017
      targetPort: 27017

完整代码

创建自动驾驶集群后首先构建：

Starting deploy...
 - Warning: Autopilot set default resource requests for Deployment default/auth-depl, as resource requests were not specified. See http://g.co/gke/autopilot-defaults
 - deployment.apps/auth-depl created
 - service/auth-srv created
 - persistentvolumeclaim/auth-mongo-pvc created
 - Warning: Autopilot set default resource requests for Deployment default/auth-mongo-depl, as resource requests were not specified. See http://g.co/gke/autopilot-defaults
 - deployment.apps/auth-mongo-depl created
 - service/auth-mongo-srv created
 - Warning: Autopilot set default resource requests for Deployment default/react-client-depl, as resource requests were not specified. See http://g.co/gke/autopilot-defaults
 - deployment.apps/react-client-depl created
 - service/react-client-srv created
 - ingress.networking.k8s.io/ingress-service created
Waiting for deployments to stabilize...
 - deployment/auth-depl: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/auth-depl-77fd8b57f5-vk8cf: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/auth-mongo-depl: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/auth-mongo-depl-7d967468f6-rkh79: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/react-client-depl: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/react-client-depl-68dcb844f6-b8fm9: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/auth-depl: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/auth-depl-77fd8b57f5-vk8cf: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/react-client-depl: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/react-client-depl-68dcb844f6-b8fm9: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/react-client-depl is ready. [2/3 deployment(s) still pending]
 - deployment/auth-depl is ready. [1/3 deployment(s) still pending]
 - deployment/auth-mongo-depl is ready.
Deployments stabilized in 2 minutes 31.285 seconds

mongo pod

如果我在 Google Cloud Build 中点击“重建”，我会得到：

Starting deploy...
 - deployment.apps/auth-depl configured
 - service/auth-srv configured
 - persistentvolumeclaim/auth-mongo-pvc unchanged
 - deployment.apps/auth-mongo-depl configured
 - service/auth-mongo-srv configured
 - deployment.apps/react-client-depl configured
 - service/react-client-srv configured
 - ingress.networking.k8s.io/ingress-service unchanged
Waiting for deployments to stabilize...
 - deployment/auth-depl: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
    - pod/auth-depl-666fdb5c64-cqnf6: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
 - deployment/auth-mongo-depl: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
    - pod/auth-mongo-depl-958db4cd5-db5pr: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
 - deployment/react-client-depl: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
    - pod/react-client-depl-54998f6c5b-wswz7: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
 - deployment/auth-mongo-depl: Unschedulable: 0/1 nodes available: 1 node is not ready
    - pod/auth-mongo-depl-958db4cd5-db5pr: Unschedulable: 0/1 nodes available: 1 node is not ready
 - deployment/auth-depl is ready. [2/3 deployment(s) still pending]
 - deployment/react-client-depl is ready. [1/3 deployment(s) still pending]
1/3 deployment(s) failed
ERROR
ERROR: build step 0 "gcr.io/k8s-skaffold/skaffold:v2.2.0" failed: step exited with non-zero status: 1

第二个 pod 卡住 pod 事件

我不确定为什么它无法扩展“与此 pod 关联的区域 us-central1-f 中的节点扩展失败：超出 GCE 配额。Pod 有未被安排的风险。”；如此简单的部署怎么会超出配额呢？虽然它确实在“FailedAttachVolume”之后说“已安排”。

我是否应该更改我的skaffold.yml文件以不尝试重建数据库部署？我不熟悉污点；我应该更改设置以便使用相同的节点吗？

我确实在之前的集群中尝试过 ReadWriteMany，但它没有用。播客事件

Dzmitry Lazerka

Asked: 2022-01-30 01:56:40 +0800 CST

如何在命名空间 Ingress 中使用 ManagedCertificate

1

我尝试在 Ingress 中使用 Google 托管证书（不是通过 k8s）。

如果 Ingress 在默认命名空间中，则使用ingress.gcp.kubernetes.io/pre-shared-cert: my-cert-name注释一切正常。

但是，如果 Ingress 在命名空间中，它会查找名为my-namespace/my-cert-name. 但是不可能用/它的名字创建一个证书。

使用 GKE k8s ManagedCertificate 一切正常。如何使其与非 k8s ManagedCertificate 一起使用？

更新：我们使用 Terraform 来管理 SSL 证书，使用google_compute_managed_ssl_certificate资源。我们将 GKE 与 Ingress 一起使用，并尝试使用该证书。如果 Ingress 在默认命名空间中——一切正常。如果 Ingress 位于其他命名空间中 - 就不可能使用该证书，因为 Ingress 查找名为的证书namespacename/certname而不是certname.

Amit

Asked: 2022-01-21 03:13:10 +0800 CST

启动 Kube-scheduler 的问题 [Kubernetes 的艰难之路]

1

我正在尝试按照 Kelsey Hightower 的Kubernetes The Hard Way的指南来设置 kubernetes 集群的硬件

设置好后kube-scheduler，当我启动调度程序时，我看到以下错误：

Jan 20 10:20:01 xyz.com kube-scheduler[12566]: F0120 10:20:01.025675 12566 helpers.go:119] **error: no kind "KubeSchedulerConfiguration" is registered for version** "kubescheduler.config.k8s.io/v1beta1"
Jan 20 10:20:01 xyz.com kube-scheduler systemd1: kube-scheduler.service: Main process exited, code=exited, status=255/n/a
Jan 20 10:20:01 xyz.com kube-scheduler systemd1: kube-scheduler.service: Unit entered failed state.
Jan 20 10:20:01 xyz.com kube-scheduler systemd1: kube-scheduler.service: Failed with result 'exit-code'.
Jan 20 10:20:06 xyz.com kube-scheduler systemd1: kube-scheduler.service: Service hold-off time over, scheduling restart.

有人可以提供一些关于正在发生的事情或我错过了什么的指示吗？我的kube-apiserver和kube-controller-manager活跃的。

我的kube-scheduler.yaml内心/etc/kubernetes/config是这样的。

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: "/var/lib/kubernetes/kube-scheduler.kubeconfig"
leaderElection:
  leaderElect: true

NothingCtrl

Asked: 2022-01-14 04:47:07 +0800 CST

在 Google Kubernetes 上部署 Odoo：在 ERROR 中记录严重性

0

我在 Googke Kubernetes 上部署 Odoo 13CE 并使用 Cloud Logging 进行日志记录，我使用默认容器日志配置应用程序。

一切运行顺利，除了所有日志输出严重性在 Cloud Loging 中标记为 ERROR

我可以将默认 Odoo 输出日志格式更改为JSON吗？
（或）覆盖函数init_logger()以使其与部署环境一起使用？

mangusbrother

Asked: 2021-11-26 13:31:40 +0800 CST

terraform apply error alreadyExists on untouched resources

0

我正在按照官方指南开始一个新的 terraform 项目：

https://learn.hashicorp.com/tutorials/terraform/gke?in=terraform/kubernetes&utm_source=WEBSITE&utm_medium=WEB_IO&utm_offer=ARTICLE_PAGE&utm_content=DOCS&_ga=2.91746777.2118895439.1637849824-960084622.1637849824

我设法让它运行。（我将它作为提交时触发的谷歌云构建任务的一部分运行）

但是，如果我更改资源中的某些内容（例如，我将“gke_num_nodes”默认值从 2 替换为 1），当我terraform apply再次运行时，这就是我得到的：


Plan: 4 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + kubernetes_cluster_host = (known after apply)
  + kubernetes_cluster_name = "workspace-auto-gke"
  + project_id              = "workspace-auto"
  + region                  = "europe-west4"
google_compute_network.vpc: Creating...
╷
│ Error: Error creating Network: googleapi: Error 409: The resource 'projects/workspace-auto/global/networks/workspace-auto-vpc' already exists, alreadyExists
│ 
│   with google_compute_network.vpc,
│   on vpc.tf line 15, in resource "google_compute_network" "vpc":
│   15: resource "google_compute_network" "vpc" {
│ 
╵

有没有办法让它不尝试重新创建现有的未触及资源？

我cloudbuild.json的如下：

{
  "steps": [
    {
      "name": "hashicorp/terraform",
      "entrypoint": "/bin/sh",
      "args": [
        "./cloudbuild/prepare-terraform.sh"
      ]
    }
  ],
  "logsBucket": "gs://my-bucket/logdir",
  "serviceAccount": "projects/my-proj/serviceAccounts/[email protected]"
}

prepare-terraform.sh简单地

terraform init
terraform plan
terraform apply -auto-approve

dzierzak

Asked: 2021-11-08 05:11:45 +0800 CST

如何从不同的应用程序触发 k8s Job？

0

我将简要描述我的应用程序工作流程：我有一个应用程序（cronjob），这个应用程序读取我的数据库，我想根据数据库的输出在 Kubernetes 中运行一些作业。有时 1 份工作，有时 10 份工作，这取决于。此外，我想将一些环境传递给这份工作。

此外，我在 GCP (Autopilot) 上运行我的 Kubernetes 集群，所以我不想让任何 pod 一直在运行。因此 Tekton 中的 EventListener 不是一个好的选择，因为 Kubernetes 服务在专用 Pod 中运行接收器逻辑。

我怎样才能以最合适的方式做到这一点？可能我应该使用 K8s API，但还有其他选择吗？

Melchy

Asked: 2021-10-17 01:52:59 +0800 CST

GKE 指标代理记录许多错误

2

我们已经创建了 GKE 集群，并且我们从 gke-metrics-agent 收到错误。错误每 cca 30 分钟出现一次。总是相同的 62 个错误。

所有错误都有标签k8s-pod/k8s-app: "gke-metrics-agent"。

第一个错误是：

error   exporterhelper/queued_retry.go:245  Exporting failed. Try enabling retry_on_failure config option.  {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = DeadlineExceeded desc = Deadline expired before operation could complete."

这个错误后面跟着这些错误的顺序

“go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send”
“/go/src/gke-logmon/gke-metrics-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry.go:245”
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
/go/src/gke-logmon/gke-metrics-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/metrics.go:120

有这样的 cca 40 错误。两个突出的错误是：

- error exporterhelper/queued_retry.go:175  Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.  {"kind": "exporter", "name": "googlecloud", "dropped_items": 19}"

- warn  batchprocessor/batch_processor.go:184   Sender failed   {"kind": "processor", "name": "batch", "error": "rpc error: code = DeadlineExceeded desc = Deadline expired before operation could complete."}"

我试图在谷歌上搜索这些错误，但我找不到任何东西。我什至找不到 gke-metrics-agent 的任何文档。

我尝试过的事情：

检查配额
将 GKE 更新到更新版本（当前版本为 1.21.3-gke.2001）
更新节点
禁用所有防火墙规则
将所有权限授予 k8s 节点

我可以提供有关我们的 kubernetes 集群的更多信息，但我不知道哪些信息可能对解决这个问题很重要。

montss

Asked: 2021-10-15 03:53:33 +0800 CST

容器环境中的Java大堆

0

我正在尝试在 kubernetes 上运行一个 Jetty 网络服务器，它在我们的生产环境中需要大量的堆 ~ 250 GB，在我们的测试环境中 ~ 50 GB。

我正在使用jetty:9.4-jdk11，我试图避免显式设置Xms或Xmx标志，因为不同环境之间的值不同，因为我认为依赖-XX:MaxRAMPercentage -XX:InitialRAMPercentage会更好，但无论我尝试什么，我都无法MaxHeapSize通过32178700288 ~ 30 GB。

Node 上只有 Java 应用程序和一些小 sidcar，有 64 GB 内存。

Dockerfile

FROM jetty:9.4-jdk11

ENV APP_WAR root.war
ENV APP_EXPLODED_WAR root/
ENV APP_DESTINATION_PATH $JETTY_BASE/webapps/
ENV APP_DESTINATION_WAR $APP_DESTINATION_PATH$APP_WAR
ENV APP_DESTINATION_EXPLODED_WAR $APP_DESTINATION_PATH$APP_EXPLODED_WAR

ADD . $APP_DESTINATION_EXPLODED_WAR

ENV JAVA_OPTIONS -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=90 -XX:InitialRAMPercentage=90 -XX:-OmitStackTraceInFastThrow -XX:+UseStringDeduplication -Xlog:gc*,stringdedup*=debug:file=/tmp/gc.log:time

容器资源设置

resources:
  limits:
    cpu: "8"
    memory: 60G
  requests:
    cpu: "6"
    memory: 60G

基于这些值，我应该得到 60 GB MaxHeapSize~ 54 GB 的 90%，而不是 30 GB。知道我缺少什么吗？

user16768564

Asked: 2021-09-09 02:55:20 +0800 CST

Cloud Logging UI 中缺少容器日志

0

我正在建立我们的 Kubernetes 基础设施。我们的 GKE 集群已启动并正在运行。我已经成功部署了一个测试服务，该服务可以访问并且行为正常。

测试服务在启动时以及每次收到请求时都会记录一条消息，但这些消息不会显示在 Cloud Logging UI 中。

我知道这不是容器映像，因为当我在本地运行它时它工作得很好。一定有一些问题阻止容器日志到达 Cloud Logging。

clusterCloud Logging选项设置为System, Workloads，并且节点池服务帐户具有该logging.logWriter角色。

在这一点上，我不知道问题可能是什么。我发现几篇较早的帖子描述了类似的问题，但主要与图像本身或从旧版 Stackdriver 迁移有关，此处并非如此。

GCP Workload Identity 适用于某些工作负载，而不适用于其他工作负载，即使 K8s 服务帐户相同

无法使用 PersistentVolumeClaim 重建部署

如何在命名空间 Ingress 中使用 ManagedCertificate

启动 Kube-scheduler 的问题 [Kubernetes 的艰难之路]

在 Google Kubernetes 上部署 Odoo：在 ERROR 中记录严重性

terraform apply error alreadyExists on untouched resources

如何从不同的应用程序触发 k8s Job？

GKE 指标代理记录许多错误

容器环境中的Java大堆

Cloud Logging UI 中缺少容器日志

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

问题[google-kubernetes-engine](server)