我有两台coreos stable v1122.2.0机器,每台都配置了tls的etcd2。
我使用https://github.com/coreos/etcd/tree/master/hack/tls-setup创建了证书。
现在我正在尝试配置 calico-node 以使用 rkt 在我的 coreos 主节点上运行。
我在 cloud-config 配置中有以下内容:
write_files:
- path: "/etc/kubernetes/cni/net.d/10-calico.conf"
content: |
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "https://10.79.218.2:2379,https://10.79.218.3:2379",
"log_level": "none",
"log_level_stderr": "info",
"hostname": "10.79.218.2",
"policy": {
"type": "k8s",
"k8s_api_root": "http://127.0.0.1:8080/api/v1/"
}
}
}
- path: "/etc/kubernetes/manifests/policy-controller.yaml"
content: |
apiVersion: v1
kind: Pod
metadata:
name: calico-policy-controller
namespace: calico-system
spec:
hostNetwork: true
containers:
# The Calico policy controller.
- name: k8s-policy-controller
image: calico/kube-policy-controller:v0.2.0
env:
- name: ETCD_ENDPOINTS
value: "https://10.79.218.2:2379,https://10.79.218.3:2379"
- name: K8S_API
value: "http://127.0.0.1:8080"
- name: LEADER_ELECTION
value: "true"
# Leader election container used by the policy controller.
- name: leader-elector
image: quay.io/calico/leader-elector:v0.1.0
imagePullPolicy: IfNotPresent
args:
- "--election=calico-policy-election"
- "--election-namespace=calico-system"
- "--http=127.0.0.1:4040"
...
units:
- name: calico-node.service
enable: true
command: start
content: |
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.2
Environment=IP=10.79.218.2
Environment=FELIX_FELIXHOSTNAME=10.79.218.2
Environment=CALICO_NETWORKING=false
Environment=NO_DEFAULT_POOLS=true
Environment=ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
--volume=modules,kind=host,source=/lib/modules,readOnly=false \
--mount=volume=modules,target=/lib/modules \
--trust-keys-from-https quay.io/calico/node:v0.19.0
KillMode=mixed
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
请忽略空格缩进..我认为我没有正确复制/粘贴它:)
当我尝试启动 calico-node 服务时,出现以下错误:
Sep 14 05:45:17 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:17 localhost rkt[1644]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:18 localhost rkt[1644]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:25 localhost rkt[1644]: Traceback (most recent call last):
Sep 14 05:45:25 localhost rkt[1644]: File "startup.py", line 292, in <module>
Sep 14 05:45:25 localhost rkt[1644]: client = IPAMClient()
Sep 14 05:45:25 localhost rkt[1644]: File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:25 localhost rkt[1644]: "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:25 localhost rkt[1644]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
Sep 14 05:45:25 localhost rkt[1644]: Calico node failed to start
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Unit entered failed state.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Failed with result 'exit-code'.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Service hold-off time over, scheduling restart.
Sep 14 05:45:25 localhost systemd[1]: Stopped Calico per-host agent.
Sep 14 05:45:25 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:25 localhost rkt[1714]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:26 localhost rkt[1714]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:28 localhost rkt[1714]: Traceback (most recent call last):
Sep 14 05:45:28 localhost rkt[1714]: File "startup.py", line 292, in <module>
Sep 14 05:45:28 localhost rkt[1714]: client = IPAMClient()
Sep 14 05:45:28 localhost rkt[1714]: File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:28 localhost rkt[1714]: "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:28 localhost rkt[1714]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
第 2-25 行
所以我明白了Invalid ETCD_CA_CERT_FILE.
。我并没有真正向 calico 指定要使用的键..所以我想我缺少一些配置。
我在 /etc/ssl/etcd 有以下等相关的键
8 -rw-------. 1 etcd etcd 1050 Sep 14 05:45 ca.pem
8 -rw-------. 1 etcd etcd 289 Sep 14 05:45 etcd1-key.pem
8 -rw-------. 1 etcd etcd 1058 Sep 14 05:45 etcd1.pem
8 -rw-------. 1 etcd etcd 227 Sep 12 03:49 server1-key.pem
8 -rw-------. 1 etcd etcd 822 Sep 12 03:49 server1.pem
我尝试添加Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
到 calico-node systemd 文件,但得到完全相同的结果。
有任何想法吗 ?
更新
所以我尝试手动运行 calico,而不是使用 systemd。我还添加了 calico 所需的所有环境变量
export CALICO_DISABLE_FILE_LOGGING=true
export HOSTNAME=10.79.218.2
export IP=10.79.218.2
export FELIX_FELIXHOSTNAME=10.79.218.2
export CALICO_NETWORKING=false
export NO_DEFAULT_POOLS=true
export ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
export ETCD_AUTHORITY=10.79.218.2:2379
export ETCD_SCHEME=https
export ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
export ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
export ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem
当我尝试使用以下命令执行印花布容器时:
/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
--volume=modules,kind=host,source=/lib/modules,readOnly=false \
--mount=volume=modules,target=/lib/modules \
--trust-keys-from-https quay.io/calico/node:v0.19.0
我明白了
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
File "startup.py", line 292, in <module>
client = IPAMClient()
File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 221, in __init__
ETCD_CERT_FILE_ENV, etcd_cert))
pycalico.datastore_errors.DataStoreError: Cannot read ETCD_KEY_FILE and/or ETCD_CERT_FILE. Both must be readable file paths. Values provided: ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem, ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
我将证书文件的文件权限更改为 666,但这并不能解决问题。而且我知道这些证书是有效的,因为 etcd tls 可以正常工作。所以我错过了什么?
更新 2
看来我缺少将证书目录安装在印花布容器上。
所以现在我正在运行印花布容器
/usr/bin/rkt run --volume etcd-ssl,kind=host,source=/etc/ssl/etcd/,readOnly=true --inherit-env --stage1-from-dir=stage1-fly.aci --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --trust-keys-from-https quay.io/calico/node:v0.19.0 --mount volume=etcd-ssl,target=/etc/ssl/etcd
我得到以下输出:
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
File "startup.py", line 292, in <module>
client = IPAMClient()
File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 246, in __init__
allow_reconnect=True)
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 204, in __init__
set(self.machines))
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 299, in machines
return self.machines
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 301, in machines
raise etcd.EtcdException("Could not get the list of servers, "
etcd.EtcdException: Could not get the list of servers, maybe you provided the wrong host(s) to connect to?
Calico node failed to start
我有点接近..但仍然没有解决方案。
更新 3
我尝试通过运行将 ETCD_ENDPOINTS 设置为 coreos 机器上的 etcd 服务器export ETCD_ENDPOINTS=https://10.79.218.2:2379
,现在当我尝试运行 calico rkt 映像时,我得到:
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
File "startup.py", line 295, in <module>
main()
File "startup.py", line 251, in main
warn_if_hostname_conflict(ip)
File "startup.py", line 192, in warn_if_hostname_conflict
current_ipv4, _ = client.get_host_bgp_ips(hostname)
File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 132, in wrapped
"running?" % (fn.__name__, e.message))
pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to SSLError(CertificateError("hostname '10.79.218.2' doesn't match u'etcd'",),)). Is etcd running?
Calico node failed to start
我也遇到了这个问题,最终通过查看 etcd 连接逻辑和使用的库的代码以及 Calico 团队在他们的 Slack 频道中的一些指针找到了问题的根源。
问题是因为 Calico 的当前版本(至少高达 0.22.0)使用 Python etcd 客户端,该客户端不支持 TLS 证书中的 IP SAN(Subject Alt Name)。这意味着您正在使用的证书无法正确地与配置它们的 etcd 服务器相关联。
这在此GitHub 问题中有所描述。
要解决此问题,您必须等到 urllib 库的新版本发布,它被 etcd 客户端获取,并发布新版本,然后 Calico 更新以使用新的 etcd 客户端。或者,您可以使用 FQDN 而不是 SAN 字段中的 IP 地址重新生成证书。这意味着您需要确保可以通过这些名称访问您的服务器,无论是使用 DNS 还是
/etc/hosts
正确设置。用于生成证书的 OpenSSL 配置应包含如下内容:描述您如何生成证书的链接使用CFSSL,因此我建议阅读其有关如何更改为使用主机名而不是 IP 地址的文档。我相信它可能就像修改JSON配置一样简单,如下所示:
我发现使用这个易碎的库我可以成功,如果: 客户端打开到 IP 地址的连接;服务器的证书在主题中声明该 IP 地址;并且服务器的证书在主题备用名称列表中没有任何 DNS 类型条目。
openssl x509 -text ...
以下是示例服务器证书的选择输出,当客户端使用 IP 地址打开连接10.10.10.1
以识别服务器时,该证书起作用:此外,还有较新版本的 Calico 图像。我只听说过两件关于
calico/node:v0.23.0
. 一个来自其他人 --- https://calicours.slack.com/archives/kubernetes/p1478206011002345。我自己对该图像进行了一些测试,只处理了一个问题,https://github.com/projectcalico/calico-containers/issues/1107。现在有 v1.0.0 测试版和 rc1,我还没有听说过关于它们的坏消息。