AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / server / 问题

问题[google-kubernetes-engine](server)

Martin Hope
danidemi
Asked: 2024-02-06 16:31:56 +0800 CST

GCP Workload Identity 适用于某些工作负载,而不适用于其他工作负载,即使 K8s 服务帐户相同

  • 5

我们将微服务部署在两个不同的 GKE 集群中,一个用于测试,另一个用于生产。

我们的工作负载利用工作负载身份。在“测试环境”中一切正常,所有工作负载共享已绑定到 GCP 服务帐户的同一个 Kubernetes 服务帐户。

在“生产环境”中,集群由三个节点池支持(我包含此信息是为了完整性,但我不确定它是否重要),并且我们在工作负载身份方面存在问题。

在生产环境中,在某些容器中,如果我们使用 shell 获取元数据或者使用 gcloud,我们会意外地发现当前用户是与节点关联的用户,而不是来自工作负载身份的用户。对于其他 pod,工作负载身份会按预期工作。

另一个可能有趣的事情是,只有最近通过新部署添加的 Pod 似乎受到薄“错误配置”的影响。

我不知道如何调查这个问题。你有什么主意吗?

提前谢谢。

google-kubernetes-engine
  • 1 个回答
  • 63 Views
Martin Hope
BPDev
Asked: 2023-03-17 11:15:18 +0800 CST

无法使用 PersistentVolumeClaim 重建部署

  • 6

我想创建一个带有 PersistentVolumeClaim 的 MongoDB 部署。

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: auth-mongo-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth-mongo-depl
spec:
  selector:
    matchLabels:
      app: auth-mongo-pod-label
  template:
    metadata:
      labels:
        app: auth-mongo-pod-label
    spec:
      containers:
        - name: auth-mongo-pod
          image: mongo
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: auth-mongo-volume
              mountPath: /data/db
      volumes:
        - name: auth-mongo-volume
          persistentVolumeClaim:
            claimName: auth-mongo-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: auth-mongo-srv
spec:
  selector:
    app: auth-mongo-pod-label
  ports:
    - protocol: TCP
      port: 27017
      targetPort: 27017

完整代码

创建自动驾驶集群后首先构建:

Starting deploy...
 - Warning: Autopilot set default resource requests for Deployment default/auth-depl, as resource requests were not specified. See http://g.co/gke/autopilot-defaults
 - deployment.apps/auth-depl created
 - service/auth-srv created
 - persistentvolumeclaim/auth-mongo-pvc created
 - Warning: Autopilot set default resource requests for Deployment default/auth-mongo-depl, as resource requests were not specified. See http://g.co/gke/autopilot-defaults
 - deployment.apps/auth-mongo-depl created
 - service/auth-mongo-srv created
 - Warning: Autopilot set default resource requests for Deployment default/react-client-depl, as resource requests were not specified. See http://g.co/gke/autopilot-defaults
 - deployment.apps/react-client-depl created
 - service/react-client-srv created
 - ingress.networking.k8s.io/ingress-service created
Waiting for deployments to stabilize...
 - deployment/auth-depl: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/auth-depl-77fd8b57f5-vk8cf: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/auth-mongo-depl: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/auth-mongo-depl-7d967468f6-rkh79: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/react-client-depl: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/react-client-depl-68dcb844f6-b8fm9: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/auth-depl: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/auth-depl-77fd8b57f5-vk8cf: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/react-client-depl: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
    - pod/react-client-depl-68dcb844f6-b8fm9: Unschedulable: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
 - deployment/react-client-depl is ready. [2/3 deployment(s) still pending]
 - deployment/auth-depl is ready. [1/3 deployment(s) still pending]
 - deployment/auth-mongo-depl is ready.
Deployments stabilized in 2 minutes 31.285 seconds

mongo pod

如果我在 Google Cloud Build 中点击“重建”,我会得到:

Starting deploy...
 - deployment.apps/auth-depl configured
 - service/auth-srv configured
 - persistentvolumeclaim/auth-mongo-pvc unchanged
 - deployment.apps/auth-mongo-depl configured
 - service/auth-mongo-srv configured
 - deployment.apps/react-client-depl configured
 - service/react-client-srv configured
 - ingress.networking.k8s.io/ingress-service unchanged
Waiting for deployments to stabilize...
 - deployment/auth-depl: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
    - pod/auth-depl-666fdb5c64-cqnf6: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
 - deployment/auth-mongo-depl: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
    - pod/auth-mongo-depl-958db4cd5-db5pr: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
 - deployment/react-client-depl: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
    - pod/react-client-depl-54998f6c5b-wswz7: 0/5 nodes are available: 5 Insufficient cpu, 5 Insufficient memory. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
 - deployment/auth-mongo-depl: Unschedulable: 0/1 nodes available: 1 node is not ready
    - pod/auth-mongo-depl-958db4cd5-db5pr: Unschedulable: 0/1 nodes available: 1 node is not ready
 - deployment/auth-depl is ready. [2/3 deployment(s) still pending]
 - deployment/react-client-depl is ready. [1/3 deployment(s) still pending]
1/3 deployment(s) failed
ERROR
ERROR: build step 0 "gcr.io/k8s-skaffold/skaffold:v2.2.0" failed: step exited with non-zero status: 1

第二个 pod 卡住 pod 事件

我不确定为什么它无法扩展“与此 pod 关联的区域 us-central1-f 中的节点扩展失败:超出 GCE 配额。Pod 有未被安排的风险。”;如此简单的部署怎么会超出配额呢?虽然它确实在“FailedAttachVolume”之后说“已安排”。

我是否应该更改我的skaffold.yml文件以不尝试重建数据库部署?我不熟悉污点;我应该更改设置以便使用相同的节点吗?

我确实在之前的集群中尝试过 ReadWriteMany,但它没有用。播客事件

google-kubernetes-engine
  • 1 个回答
  • 21 Views
Martin Hope
Dzmitry Lazerka
Asked: 2022-01-30 01:56:40 +0800 CST

如何在命名空间 Ingress 中使用 ManagedCertificate

  • 1

我尝试在 Ingress 中使用 Google 托管证书(不是通过 k8s)。

如果 Ingress 在默认命名空间中,则使用ingress.gcp.kubernetes.io/pre-shared-cert: my-cert-name注释一切正常。

但是,如果 Ingress 在命名空间中,它会查找名为my-namespace/my-cert-name. 但是不可能用/它的名字创建一个证书。

使用 GKE k8s ManagedCertificate 一切正常。如何使其与非 k8s ManagedCertificate 一起使用?

更新:我们使用 Terraform 来管理 SSL 证书,使用google_compute_managed_ssl_certificate资源。我们将 GKE 与 Ingress 一起使用,并尝试使用该证书。如果 Ingress 在默认命名空间中——一切正常。如果 Ingress 位于其他命名空间中 - 就不可能使用该证书,因为 Ingress 查找名为的证书namespacename/certname而不是certname.

google-cloud-platform google-kubernetes-engine
  • 1 个回答
  • 467 Views
Martin Hope
Amit
Asked: 2022-01-21 03:13:10 +0800 CST

启动 Kube-scheduler 的问题 [Kubernetes 的艰难之路]

  • 1

我正在尝试按照 Kelsey Hightower 的Kubernetes The Hard Way的指南来设置 kubernetes 集群的硬件

设置好后kube-scheduler,当我启动调度程序时,我看到以下错误:

Jan 20 10:20:01 xyz.com kube-scheduler[12566]: F0120 10:20:01.025675 12566 helpers.go:119] **error: no kind "KubeSchedulerConfiguration" is registered for version** "kubescheduler.config.k8s.io/v1beta1"
Jan 20 10:20:01 xyz.com kube-scheduler systemd1: kube-scheduler.service: Main process exited, code=exited, status=255/n/a
Jan 20 10:20:01 xyz.com kube-scheduler systemd1: kube-scheduler.service: Unit entered failed state.
Jan 20 10:20:01 xyz.com kube-scheduler systemd1: kube-scheduler.service: Failed with result 'exit-code'.
Jan 20 10:20:06 xyz.com kube-scheduler systemd1: kube-scheduler.service: Service hold-off time over, scheduling restart.

有人可以提供一些关于正在发生的事情或我错过了什么的指示吗?我的kube-apiserver和kube-controller-manager活跃的。

我的kube-scheduler.yaml内心/etc/kubernetes/config是这样的。

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: "/var/lib/kubernetes/kube-scheduler.kubeconfig"
leaderElection:
  leaderElect: true
linux kubernetes google-kubernetes-engine
  • 1 个回答
  • 604 Views
Martin Hope
NothingCtrl
Asked: 2022-01-14 04:47:07 +0800 CST

在 Google Kubernetes 上部署 Odoo:在 ERROR 中记录严重性

  • 0

我在 Googke Kubernetes 上部署 Odoo 13CE 并使用 Cloud Logging 进行日志记录,我使用默认容器日志配置应用程序。

一切运行顺利,除了所有日志输出严重性在 Cloud Loging 中标记为 ERROR

在此处输入图像描述

  • 我可以将默认 Odoo 输出日志格式更改为JSON吗?
  • (或)覆盖函数init_logger()以使其与部署环境一起使用?
google-cloud-platform google-kubernetes-engine odoo
  • 1 个回答
  • 129 Views
Martin Hope
mangusbrother
Asked: 2021-11-26 13:31:40 +0800 CST

terraform apply error alreadyExists on untouched resources

  • 0

我正在按照官方指南开始一个新的 terraform 项目:

https://learn.hashicorp.com/tutorials/terraform/gke?in=terraform/kubernetes&utm_source=WEBSITE&utm_medium=WEB_IO&utm_offer=ARTICLE_PAGE&utm_content=DOCS&_ga=2.91746777.2118895439.1637849824-960084622.1637849824

我设法让它运行。(我将它作为提交时触发的谷歌云构建任务的一部分运行)

但是,如果我更改资源中的某些内容(例如,我将“gke_num_nodes”默认值从 2 替换为 1),当我terraform apply再次运行时,这就是我得到的:


Plan: 4 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + kubernetes_cluster_host = (known after apply)
  + kubernetes_cluster_name = "workspace-auto-gke"
  + project_id              = "workspace-auto"
  + region                  = "europe-west4"
google_compute_network.vpc: Creating...
╷
│ Error: Error creating Network: googleapi: Error 409: The resource 'projects/workspace-auto/global/networks/workspace-auto-vpc' already exists, alreadyExists
│ 
│   with google_compute_network.vpc,
│   on vpc.tf line 15, in resource "google_compute_network" "vpc":
│   15: resource "google_compute_network" "vpc" {
│ 
╵

有没有办法让它不尝试重新创建现有的未触及资源?

我cloudbuild.json的如下:

{
  "steps": [
    {
      "name": "hashicorp/terraform",
      "entrypoint": "/bin/sh",
      "args": [
        "./cloudbuild/prepare-terraform.sh"
      ]
    }
  ],
  "logsBucket": "gs://my-bucket/logdir",
  "serviceAccount": "projects/my-proj/serviceAccounts/[email protected]"
}

prepare-terraform.sh简单地

terraform init
terraform plan
terraform apply -auto-approve
google-cloud-platform google-compute-engine terraform google-kubernetes-engine
  • 2 个回答
  • 4299 Views
Martin Hope
dzierzak
Asked: 2021-11-08 05:11:45 +0800 CST

如何从不同的应用程序触发 k8s Job?

  • 0

我将简要描述我的应用程序工作流程:我有一个应用程序(cronjob),这个应用程序读取我的数据库,我想根据数据库的输出在 Kubernetes 中运行一些作业。有时 1 份工作,有时 10 份工作,这取决于。此外,我想将一些环境传递给这份工作。

此外,我在 GCP (Autopilot) 上运行我的 Kubernetes 集群,所以我不想让任何 pod 一直在运行。因此 Tekton 中的 EventListener 不是一个好的选择,因为 Kubernetes 服务在专用 Pod 中运行接收器逻辑。

我怎样才能以最合适的方式做到这一点?可能我应该使用 K8s API,但还有其他选择吗?

google-cloud-platform kubernetes google-kubernetes-engine job-scheduler
  • 1 个回答
  • 356 Views
Martin Hope
Melchy
Asked: 2021-10-17 01:52:59 +0800 CST

GKE 指标代理记录许多错误

  • 2

我们已经创建了 GKE 集群,并且我们从 gke-metrics-agent 收到错误。错误每 cca 30 分钟出现一次。总是相同的 62 个错误。

所有错误都有标签k8s-pod/k8s-app: "gke-metrics-agent"。

第一个错误是:

error   exporterhelper/queued_retry.go:245  Exporting failed. Try enabling retry_on_failure config option.  {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = DeadlineExceeded desc = Deadline expired before operation could complete."  

这个错误后面跟着这些错误的顺序

  • “go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send”
  • “/go/src/gke-logmon/gke-metrics-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry.go:245”
  • go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
  • /go/src/gke-logmon/gke-metrics-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/metrics.go:120

有这样的 cca 40 错误。两个突出的错误是:

- error exporterhelper/queued_retry.go:175  Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.  {"kind": "exporter", "name": "googlecloud", "dropped_items": 19}"

- warn  batchprocessor/batch_processor.go:184   Sender failed   {"kind": "processor", "name": "batch", "error": "rpc error: code = DeadlineExceeded desc = Deadline expired before operation could complete."}"

我试图在谷歌上搜索这些错误,但我找不到任何东西。我什至找不到 gke-metrics-agent 的任何文档。

我尝试过的事情:

  • 检查配额
  • 将 GKE 更新到更新版本(当前版本为 1.21.3-gke.2001)
  • 更新节点
  • 禁用所有防火墙规则
  • 将所有权限授予 k8s 节点

我可以提供有关我们的 kubernetes 集群的更多信息,但我不知道哪些信息可能对解决这个问题很重要。

firewall google-cloud-platform metrics cloud google-kubernetes-engine
  • 2 个回答
  • 1529 Views
Martin Hope
montss
Asked: 2021-10-15 03:53:33 +0800 CST

容器环境中的Java大堆

  • 0

我正在尝试在 kubernetes 上运行一个 Jetty 网络服务器,它在我们的生产环境中需要大量的堆 ~ 250 GB,在我们的测试环境中 ~ 50 GB。

我正在使用jetty:9.4-jdk11,我试图避免显式设置Xms或Xmx标志,因为不同环境之间的值不同,因为我认为依赖-XX:MaxRAMPercentage -XX:InitialRAMPercentage会更好,但无论我尝试什么,我都无法MaxHeapSize通过32178700288 ~ 30 GB。

Node 上只有 Java 应用程序和一些小 sidcar,有 64 GB 内存。

Dockerfile

FROM jetty:9.4-jdk11

ENV APP_WAR root.war
ENV APP_EXPLODED_WAR root/
ENV APP_DESTINATION_PATH $JETTY_BASE/webapps/
ENV APP_DESTINATION_WAR $APP_DESTINATION_PATH$APP_WAR
ENV APP_DESTINATION_EXPLODED_WAR $APP_DESTINATION_PATH$APP_EXPLODED_WAR

ADD . $APP_DESTINATION_EXPLODED_WAR

ENV JAVA_OPTIONS -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=90 -XX:InitialRAMPercentage=90 -XX:-OmitStackTraceInFastThrow -XX:+UseStringDeduplication -Xlog:gc*,stringdedup*=debug:file=/tmp/gc.log:time

容器资源设置

resources:
  limits:
    cpu: "8"
    memory: 60G
  requests:
    cpu: "6"
    memory: 60G

基于这些值,我应该得到 60 GB MaxHeapSize~ 54 GB 的 90%,而不是 30 GB。知道我缺少什么吗?

java jetty kubernetes google-kubernetes-engine containers
  • 1 个回答
  • 143 Views
Martin Hope
user16768564
Asked: 2021-09-09 02:55:20 +0800 CST

Cloud Logging UI 中缺少容器日志

  • 0

我正在建立我们的 Kubernetes 基础设施。我们的 GKE 集群已启动并正在运行。我已经成功部署了一个测试服务,该服务可以访问并且行为正常。

测试服务在启动时以及每次收到请求时都会记录一条消息,但这些消息不会显示在 Cloud Logging UI 中。

我知道这不是容器映像,因为当我在本地运行它时它工作得很好。一定有一些问题阻止容器日志到达 Cloud Logging。

clusterCloud Logging选项设置为System, Workloads,并且节点池服务帐户具有该logging.logWriter角色。

在这一点上,我不知道问题可能是什么。我发现几篇较早的帖子描述了类似的问题,但主要与图像本身或从旧版 Stackdriver 迁移有关,此处并非如此。

logging google-cloud-platform google-compute-engine google-kubernetes-engine
  • 1 个回答
  • 492 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve