本文主要基于Kubernetes1.21.9和Linux操作系统CentOS7.4。
服务器版本 | docker软件版本 | Kubernetes(k8s)集群版本 | CPU架构 |
---|---|---|---|
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | v1.21.9 | x86_64 |
Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点
服务器 | 操作系统版本 | CPU架构 | 进程 | 功能描述 |
---|---|---|---|---|
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master节点 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
在Kubernetes中,保证应用的高可用性和稳定性非常重要。为此,Kubernetes提供了一些机制来监视容器的状态,并自动重启或删除不健康的容器。其中之一就是livenessprobe探测和readinessprobe探测。
本文将介绍Kubernetes中的livenessprobe探测和readinessprobe探测,并提供示例来演示如何使用它们。
使用livenessprobe探测和readinessprobe探测的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》https://www.cnblogs.com/renshengdezheli/p/16686769.html。
Kubernetes支持三种健康检查,它们分别是:livenessprobe, readinessprobe 和 startupprobe。这些探针可以周期性地检查容器内的服务是否处于健康状态。
在本文中,我们将重点介绍livenessprobe探测和readinessprobe探测。
创建存放yaml文件的目录和namespace
[root@k8scloude1 ~]# mkdir probe [root@k8scloude1 ~]# kubectl create ns probe namespace/probe created [root@k8scloude1 ~]# kubens probe Context "kubernetes-admin@kubernetes" modified. Active namespace is "probe".
现在还没有pod
[root@k8scloude1 ~]# cd probe/ [root@k8scloude1 probe]# pwd /root/probe [root@k8scloude1 probe]# kubectl get pod No resources found in probe namespace.
先创建一个普通的pod,创建了一个名为liveness-exec的Pod,使用busybox镜像来创建一个容器。该容器会执行args参数中的命令:touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 6000
。
[root@k8scloude1 probe]# vim pod.yaml [root@k8scloude1 probe]# cat pod.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: #terminationGracePeriodSeconds属性,将其设置为0,意味着容器在接收到终止信号时将立即关闭,而不会等待一段时间来完成未完成的工作。 terminationGracePeriodSeconds: 0 containers: - name: liveness image: busybox imagePullPolicy: IfNotPresent args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 6000 #先创建一个普通的pod [root@k8scloude1 probe]# kubectl apply -f pod.yaml pod/liveness-exec created
查看pod
[root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 0 6s 10.244.112.176 k8scloude2 <none> <none>
查看pod里的/tmp文件
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
pod运行30秒之后,/tmp/healthy文件被删除,pod还会继续运行6000秒,/tmp/healthy文件存在就判定pod正常,/tmp/healthy文件不存在就判定pod异常,但是目前没有探测机制,所以pod还是正在运行状态。
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 0 3m29s 10.244.112.176 k8scloude2 <none> <none>
删除pod,添加探测机制
[root@k8scloude1 probe]# kubectl delete -f pod.yaml pod "liveness-exec" deleted [root@k8scloude1 probe]# kubectl get pod -o wide No resources found in probe namespace.
创建具有livenessprobe探测的pod
创建了一个名为liveness-exec的Pod,使用busybox镜像来创建一个容器。该容器会执行args参数中的命令:touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600。
Pod还定义了一个名为livenessProbe的属性来定义liveness探针。该探针使用exec检查/tmp/healthy文件是否存在。如果该文件存在,则Kubernetes认为容器处于健康状态;否则,Kubernetes将尝试重启该容器。
liveness探测将在容器启动后5秒钟开始,并每隔5秒钟运行一次。
[root@k8scloude1 probe]# vim podprobe.yaml #现在加入健康检查:command的方式 [root@k8scloude1 probe]# cat podprobe.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: terminationGracePeriodSeconds: 0 containers: - name: liveness image: busybox imagePullPolicy: IfNotPresent args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy #容器启动的5秒内不监测 initialDelaySeconds: 5 #每5秒检测一次 periodSeconds: 5 [root@k8scloude1 probe]# kubectl apply -f podprobe.yaml pod/liveness-exec created
观察pod里的/tmp文件和pod状态
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp healthy [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 0 18s 10.244.112.177 k8scloude2 <none> <none> [root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp healthy [root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 0 36s 10.244.112.177 k8scloude2 <none> <none> [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 0 43s 10.244.112.177 k8scloude2 <none> <none> [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 1 50s 10.244.112.177 k8scloude2 <none> <none>
加了探测机制之后,当/tmp/healthy不存在,则会进行livenessProbe重启pod,如果不加宽限期terminationGracePeriodSeconds: 0,一般75秒的时候会重启一次
[root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-exec 1/1 Running 3 2m58s 10.244.112.177 k8scloude2 <none> <none>
删除pod
[root@k8scloude1 probe]# kubectl delete -f podprobe.yaml pod "liveness-exec" deleted [root@k8scloude1 probe]# kubectl get pod -o wide No resources found in probe namespace.
创建了一个名为liveness-httpget的Pod,使用nginx镜像来创建一个容器。该容器设置了一个HTTP GET请求的liveness探针,检查是否能够成功访问Nginx的默认主页/index.html。如果标准无法满足,则Kubernetes将认为容器不健康,并尝试重启该容器。
liveness探测将在容器启动后10秒钟开始,并每隔10秒钟运行一次。failureThreshold属性表示最大连续失败次数为3次,successThreshold属性表示必须至少1次成功才能将容器视为“健康”。timeoutSeconds属性表示探测请求的超时时间为10秒
。
[root@k8scloude1 probe]# vim podprobehttpget.yaml #httpGet的方式 [root@k8scloude1 probe]# cat podprobehttpget.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-httpget spec: terminationGracePeriodSeconds: 0 containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /index.html port: 80 scheme: HTTP #容器启动的10秒内不监测 initialDelaySeconds: 10 #每10秒检测一次 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 [root@k8scloude1 probe]# kubectl apply -f podprobehttpget.yaml pod/liveness-httpget created [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-httpget 1/1 Running 0 6s 10.244.112.178 k8scloude2 <none> <none>
查看/usr/share/nginx/html/index.html文件
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html /usr/share/nginx/html/index.html [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-httpget 1/1 Running 0 2m3s 10.244.112.178 k8scloude2 <none> <none>
删除/usr/share/nginx/html/index.html文件
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- rm /usr/share/nginx/html/index.html [root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html ls: cannot access '/usr/share/nginx/html/index.html': No such file or directory command terminated with exit code 2
观察pod状态和/usr/share/nginx/html/index.html文件,通过端口80探测文件/usr/share/nginx/html/index.html,探测不到说明文件有问题,则进行livenessProbe重启pod。
[root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-httpget 1/1 Running 1 2m43s 10.244.112.178 k8scloude2 <none> <none> [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-httpget 1/1 Running 1 2m46s 10.244.112.178 k8scloude2 <none> <none> [root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html /usr/share/nginx/html/index.html #通过端口80探测文件/usr/share/nginx/html/index.html,探测不到说明文件有问题,则进行livenessProbe重启pod [root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html /usr/share/nginx/html/index.html
删除pod
[root@k8scloude1 probe]# kubectl delete -f podprobehttpget.yaml pod "liveness-httpget" deleted [root@k8scloude1 probe]# kubectl get pod -o wide No resources found in probe namespace.
创建了一个名为liveness-tcpsocket的Pod,使用nginx镜像来创建一个容器。该容器设置了一个TCP Socket连接的liveness探针,检查是否能够成功连接到指定的端口8080。如果无法连接,则Kubernetes将认为容器不健康,并尝试重启该容器。
liveness探测将在容器启动后10秒钟开始,并每隔10秒钟运行一次。failureThreshold属性表示最大连续失败次数为3次,successThreshold属性表示必须至少1次成功才能将容器视为“健康”。timeoutSeconds属性表示探测请求的超时时间为10秒。
[root@k8scloude1 probe]# vim podprobetcpsocket.yaml #tcpSocket的方式: [root@k8scloude1 probe]# cat podprobetcpsocket.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-tcpsocket spec: terminationGracePeriodSeconds: 0 containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 tcpSocket: port: 8080 #容器启动的10秒内不监测 initialDelaySeconds: 10 #每10秒检测一次 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 [root@k8scloude1 probe]# kubectl apply -f podprobetcpsocket.yaml pod/liveness-tcpsocket created
观察pod状态,因为nginx运行的是80端口,但是我们探测的是8080端口,所以肯定探测失败,livenessProbe就会重启pod
[root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-tcpsocket 1/1 Running 0 10s 10.244.112.179 k8scloude2 <none> <none> [root@k8scloude1 probe]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-tcpsocket 1/1 Running 1 55s 10.244.112.179 k8scloude2 <none> <none>
删除pod
[root@k8scloude1 probe]# kubectl delete -f podprobetcpsocket.yaml pod "liveness-tcpsocket" deleted
下面添加readinessprobe探测
因为readiness probe的探测机制是不重启的,只是把用户发送过来的请求不再转发到此pod上,为了模拟此情景,创建三个pod,svc把用户请求转发到这三个pod上。
小技巧TIPS:要想看文字有没有对齐,可以使用 :set cuc ,取消使用 :set nocuc
创建pod,readinessProbe探测 /tmp/healthy文件,如果 /tmp/healthy文件存在则正常,不存在则异常。lifecycle postStart表示容器启动之后创建/tmp/healthy文件。
[root@k8scloude1 probe]# vim podreadinessprobecommand.yaml [root@k8scloude1 probe]# cat podreadinessprobecommand.yaml apiVersion: v1 kind: Pod metadata: labels: test: readiness name: readiness-exec spec: terminationGracePeriodSeconds: 0 containers: - name: readiness image: nginx imagePullPolicy: IfNotPresent readinessProbe: exec: command: - cat - /tmp/healthy #容器启动的5秒内不监测 initialDelaySeconds: 5 #每5秒检测一次 periodSeconds: 5 lifecycle: postStart: exec: command: ["/bin/sh","-c","touch /tmp/healthy"]
创建三个名字不同的pod
[root@k8scloude1 probe]# kubectl apply -f podreadinessprobecommand.yaml pod/readiness-exec created [root@k8scloude1 probe]# sed 's/readiness-exec/readiness-exec2/' podreadinessprobecommand.yaml | kubectl apply -f - pod/readiness-exec2 created [root@k8scloude1 probe]# sed 's/readiness-exec/readiness-exec3/' podreadinessprobecommand.yaml | kubectl apply -f - pod/readiness-exec3 created 查看pod的标签 [root@k8scloude1 probe]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS readiness-exec 1/1 Running 0 23s 10.244.112.182 k8scloude2 <none> <none> test=readiness readiness-exec2 1/1 Running 0 15s 10.244.251.236 k8scloude3 <none> <none> test=readiness readiness-exec3 0/1 Running 0 9s 10.244.112.183 k8scloude2 <none> <none> test=readiness
三个pod的标签是一样的
[root@k8scloude1 probe]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS readiness-exec 1/1 Running 0 26s 10.244.112.182 k8scloude2 <none> <none> test=readiness readiness-exec2 1/1 Running 0 18s 10.244.251.236 k8scloude3 <none> <none> test=readiness readiness-exec3 1/1 Running 0 12s 10.244.112.183 k8scloude2 <none> <none> test=readiness
为了标识3个pod的不同,修改nginx的index文件
[root@k8scloude1 probe]# kubectl exec -it readiness-exec -- sh -c "echo 111 > /usr/share/nginx/html/index.html" [root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- sh -c "echo 222 > /usr/share/nginx/html/index.html" [root@k8scloude1 probe]# kubectl exec -it readiness-exec3 -- sh -c "echo 333 > /usr/share/nginx/html/index.html"
创建一个service服务,把用户请求转发到这三个pod上
[root@k8scloude1 probe]# kubectl expose --name=svc1 pod readiness-exec --port=80 service/svc1 exposed
test=readiness这个标签有3个pod
[root@k8scloude1 probe]# kubectl get svc -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR svc1 ClusterIP 10.101.38.121 <none> 80/TCP 23s test=readiness [root@k8scloude1 probe]# kubectl get pod --show-labels NAME READY STATUS RESTARTS AGE LABELS readiness-exec 1/1 Running 0 7m14s test=readiness readiness-exec2 1/1 Running 0 7m6s test=readiness readiness-exec3 1/1 Running 0 7m test=readiness
访问service 服务 ,发现用户请求都分别转发到三个pod
[root@k8scloude1 probe]# while true ; do curl -s 10.101.38.121 ; sleep 1 ; done 333 111 333 222 111 ......
删除pod readiness-exec2的探测文件
[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- rm /tmp/healthy
因为/tmp/healthy探测不成功,readiness-exec2的READY状态变为了0/1,但是STATUS还为Running状态,还可以进入到readiness-exec2 pod里。由于readinessprobe只是不把用户请求转发到异常pod,所以异常pod不会被删除。
[root@k8scloude1 probe]# kubectl get pod --show-labels NAME READY STATUS RESTARTS AGE LABELS readiness-exec 1/1 Running 0 10m test=readiness readiness-exec2 0/1 Running 0 10m test=readiness readiness-exec3 1/1 Running 0 10m test=readiness [root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- bash root@readiness-exec2:/# exit exit
kubectl get ev (查看事件),可以看到“88s Warning Unhealthy pod/readiness-exec2 Readiness probe failed: cat: /tmp/healthy: No such file or directory”警告
[root@k8scloude1 probe]# kubectl get ev LAST SEEN TYPE REASON OBJECT MESSAGE ...... 32m Normal Pulled pod/readiness-exec2 Container image "nginx" already present on machine 32m Normal Created pod/readiness-exec2 Created container readiness 32m Normal Started pod/readiness-exec2 Started container readiness 15m Normal Killing pod/readiness-exec2 Stopping container readiness 13m Normal Scheduled pod/readiness-exec2 Successfully assigned probe/readiness-exec2 to k8scloude3 13m Normal Pulled pod/readiness-exec2 Container image "nginx" already present on machine 13m Normal Created pod/readiness-exec2 Created container readiness 13m Normal Started pod/readiness-exec2 Started container readiness 88s Warning Unhealthy pod/readiness-exec2 Readiness probe failed: cat: /tmp/healthy: No such file or directory 32m Normal Scheduled pod/readiness-exec3 Successfully assigned probe/readiness-exec3 to k8scloude3 32m Normal Pulled pod/readiness-exec3 Container image "nginx" already present on machine 32m Normal Created pod/readiness-exec3 Created container readiness 32m Normal Started pod/readiness-exec3 Started container readiness 15m Normal Killing pod/readiness-exec3 Stopping container readiness 13m Normal Scheduled pod/readiness-exec3 Successfully assigned probe/readiness-exec3 to k8scloude2 13m Normal Pulled pod/readiness-exec3 Container image "nginx" already present on machine 13m Normal Created pod/readiness-exec3 Created container readiness 13m Normal Started pod/readiness-exec3 Started container readiness
再次访问service服务,发现用户请求只转发到了111和333,说明readiness probe探测生效。
[root@k8scloude1 probe]# while true ; do curl -s 10.101.38.121 ; sleep 1 ; done 111 333 333 333 111 ......
通过本文,您应该已经了解到如何使用livenessprobe探测和readinessprobe探测来监视Kubernetes中容器的健康状态。通过定期检查服务状态、命令退出码、HTTP响应和内存使用情况,您可以自动重启不健康的容器,并提高应用的可用性和稳定性。