默认情况下,一个pod被调度到哪个node节点是由scheduler组件采用相应的算法计算出来的,这个过程是不受人工控制的,但是在实际使用中,这并不能满足所以要求,很多时候我们想控制某些pod到达某些节点,所以kubernetes就为我们提供了4种pod的调度策略来解决该问题。
通过在定义pod时,设置nodeName、NodeSelector等字段来实现pod定向调度到指定的节点上。
nodeName用于强制约束将pod调度到指定的node节点上,这方式其实是直接跳过scheduler的调度逻辑,直接将pod调度到指定的node节点上,如果指定的node几点不存在,也会往上调度,只不过pod运行失败而已。
nodeName调度如下演示:
[root@master ~]# vim pod-busybox.yaml apiVersion: v1 kind: Pod metadata: name: pod-command labels: env: dev namespace: default spec: nodeName: node2 #指定pod调度到node2 containers: - image: busybox name: busybox-container command: ["/bin/sh","-c","touch /tmp/hello.txt;while true;do /bin/echo $(date +%T) >> /tmp/hello.txt;sleep 3;done;"] resources: limits: cpu: 2 memory: 2G requests: cpu: 1 memory: 500M [root@master ~]# kubectl get pod pod-command -o wide #可以看到pod已经调度到node2上 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-command 1/1 Running 0 38s 10.244.2.76 node2 <none> <none> [root@master ~]#
nodeSelector用于将pod调度到指定的标签上的node节点,它通过k8s的标签选择器机制实现的,也就说,scheduler使用MathNodeSelector调度策略进行label匹配,找出目标node,然后将pod调度到目标节点,该匹配规则也是强制约束,即如果没有node匹配的上,也会往上调度,只不过pod运行失败而已。
nodeSelector调度演示如下:
[root@master ~]# kubectl label node node2 env=test #首先为node2打标签env=test [root@master ~]# cat deplyment_nginx1.yaml #创建一个deployment,有3个pod副本 apiVersion: apps/v1 kind: Deployment metadata: name: deployment-nginx1 labels: env: dev tiar: front namespace: default spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: nodeSelector: #定义节点选择器,把pod调度到有标签env=test的节点上 env: test containers: - image: nginx:1.7.9 name: nginx-container ports: - name: http containerPort: 80 [root@master ~]# [root@master ~]# kubectl get pod -l app=nginx -o wide #pod已经全部调度到node2节点上了 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES <none> deployment-nginx1-774c75c9bb-m4nfh 1/1 Running 0 7m25s 10.244.2.84 node2 <none> <none> deployment-nginx1-774c75c9bb-mgntc 1/1 Running 0 7m25s 10.244.2.85 node2 <none> <none> deployment-nginx1-774c75c9bb-x8q8s 1/1 Running 0 7m25s 10.244.2.83 node2 <none> <none> [root@master ~]#
上面说的nodeName和nodeSelector都属于定向调度,都是强制性的,即如果没有node匹配的上,pod就会运行失败,这显然太过于死板,不够圆滑,所以kubernetes还提供了亲和性调度。
亲和性调度是在nodeSelector的基础上进行了扩展,通过配置的形式,实现优先选择满足条件的node进行调度,如果没有,也可以调度到不满足条件的节点上,实现调度更加灵活。
Affinity主要有三类:
nodeAffinity 节点亲和性调度
nodeAffinity调度也可分为硬亲和性调度,软亲和性调度,首先来查看一下nodeAffinity的可配置项:
kubectl explain pod.spec.affinity.nodeAffinity requiredDuringSchedulingIgnoredDuringExecution #硬亲和性,如果找不到满足规则的node,则pod就会调度失败 nodeSelectorTerms #节点选择列表 matchExpressions #按节点标签列出的节点选择器要求列表(推荐) key #键 operator #关系符,支持In,NotIn,Exists,DoesNotExist,Gt,Lt values #值 matchFields #按节点字段列出的节点选择器要求列表 key #键 operator #关系符,支持In,NotIn,Exists,DoesNotExist,Gt,Lt values #值 preferredDuringSchedulingIgnoredDuringExecution #软亲和,表示优先调度到满足指定规则的node中去,如果都找不到满足指定规则的node,那就随机分配到node上去 weight #倾向权重,值为1-100,表示该项匹配规则的权重值 preference #一个节点选择器项,与相应的权重项关联 matchExpressions #按节点标签列出的节点选择器要求列表(推荐) key #键 operator #关系运算符,支持In,NotIn,Exists,DoesNotExist,Gt,Lt values #值 matchFields #按节点字段列出的节点选择器要求列表 key #键 operator #关系运算符,支持In,NotIn,Exists,DoesNotExist,Gt,Lt values #值
#更多关系运算符讲解: 1. In #在,表示key的值在指定的列表其中一项即可匹配成功; 2. NotIn #与In相反,表示key的值不在指定的列表,满足的话即表示匹配成功; 3. Exists #存在,存在是对标签的key而言,表示存在指定的key则表示匹配成功,使用Exists的话不用写value,因为Exists是针对key而言; 4. Gt #greater than的简写,大于的意思,表示大于指定的值则匹配成功; 5. Lt #less than的简写,小于的意思,表示小于指定的值则匹配成功; 6. DoesNotExists #不存在该标签的节点
nodeAffinity 节点亲和性调度 演示范例
编写pod的yaml资源清单,定义两个pod,一个是硬亲和性,一个软亲和性,都以匹配node节点标签来演示:
[root@master pod]# cat pod-nodeaffinity apiVersion: v1 kind: Pod metadata: name: pod-nodeaffinity-required labels: env: dev namespace: default spec: containers: - image: nginx:latest name: nginx-container-nodeaffinity-required ports: - name: http containerPort: 80 affinity: #亲和性 nodeAffinity: #节点亲和性 requiredDuringSchedulingIgnoredDuringExecution: #硬亲和性,不满足匹配规则则pod调度失败 nodeSelectorTerms: - matchExpressions: #匹配表达式 - key: env #表示node标签匹配env=xx,或env=yy都可以调度上去 operator: In values: ["xx","yy"] --- apiVersion: v1 kind: Pod metadata: name: pod-nodeaffinity-preferred labels: env: dev namespace: default spec: containers: - image: nginx:latest name: nginx-container-nodeaffinity-preferred ports: - name: http containerPort: 80 affinity: #亲和性 nodeAffinity: #节点亲和性 preferredDuringSchedulingIgnoredDuringExecution: #软亲和性,不满足匹配规则时pod将会被随机调度(软亲和性保证了pod不会调度失败) - weight: 1 #该preference节点选择器项的权重 preference: #节点选择器项 matchExpressions: #匹配表达式 - key: env #表示node标签匹配env=xx,或env=yy都可以调度上去 operator: In values: ["xx","yy"] [root@master pod]#
#因为只有node2有个env=test的标签,所以我们预期,第一个名为 pod-nodeaffinity-required的pod将会调度失败,因为它是硬亲和性,而第二个nginx-container-nodeaffinity-preferred是软亲和性,即使不满足匹配表达式,但是pod也能成功调度到任意一个node节点。 #查看第一个硬亲和性的pod调度是否成功 [root@master pod]# kubectl get pod pod-nodeaffinity-required #硬亲和性的pod调度失败 NAME READY STATUS RESTARTS AGE pod-nodeaffinity-required 0/1 Pending 0 18m [root@master pod]# kubectl describe pod pod-nodeaffinity-required | tail -5 #失败原因 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 18m default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector. Warning FailedScheduling 17m default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector. [root@master pod]# #查看第二个软亲和性的pod调度是否成功 [root@master pod]# kubectl get pod pod-nodeaffinity-preferred -o wide #pod调度成功,随机被调度到node1上 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-nodeaffinity-preferred 1/1 Running 0 15m 10.244.1.138 node1 <none> <none> [root@master pod]# #更多节点亲和性的调度不再演示
podAffinity pod亲和性调度
podAffinity pod亲和性调度也可分为硬亲和性调度,软亲和性调度,首先来查看一下podAffinity 的可配置项:
kubectl explain Pod.spec.affinity.podAffinity requiredDuringSchedulingIgnoredDuringExecution #pod的硬亲和性 namespaces #指定参照的pod的命名空间 topologyKey #指定调度作用域 labelSelector #标签选择器 matchExpressions #匹配表达式 key #键 operator #关系运算符,支持In,NotIn,Exists,DoesNotExist,Gt,Lt values #值 matchLabels #指定多个matchExpressions映射的内容 preferredDuringSchedulingIgnoredDuringExecution #pod的软亲和性 weight podAffinityTerm namespaces #指定参照的pod的命名空间 topologyKey #指定调度作用域 labelSelector #标签选择器 matchExpressions #匹配表达式 key #键 operator #关系运算符,支持In,NotIn,Exists,DoesNotExist,Gt,Lt values #值 matchLabels #指定多个matchExpressions映射的内容
podAffinity pod亲和性调度 演示范例
下面模拟一个场景,即mysql pod运行在node2上,为了让应用pod更加接近mysql pod,所以利用节点亲和性让应用pod也被调度到mysql pod所在node2节点上去。
#假设已经有一个mysql pod在node2上面跑了,现在需要创建应用pod并定义podAffinity 亲和性,让应用pod也调度到和mysql pod在同一个节点 [root@master pod]# kubectl get pod pod-mysql-server -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-mysql-server 1/1 Running 0 16s 10.244.2.105 node2 <none> <none> [root@master pod]# #本次只模拟podAffinity 硬亲和性,podAffinity 的软亲和性可自行模拟,大同小异 [root@master pod]# cat pod-podaffinity.yaml #编译一个pod来实现podAffinity硬亲和性 apiVersion: v1 kind: Pod metadata: name: pod-podaffinity-required labels: env: dev namespace: default spec: containers: - image: nginx:latest name: nginx-container-podaffinity-required ports: - name: http containerPort: 80 affinity: #定义亲和性 podAffinity: #类型为podAffinity亲和性 requiredDuringSchedulingIgnoredDuringExecution: #硬亲和性,表示必须匹配得上才调度成功,匹配不上则调度失败,pod将创建失败 - labelSelector: #标签选择器 matchExpressions: #表达式 - key: app #key为app,这个key其实是目标pod的key operator: In #运算符 values: ["aa","bb"] topologyKey: kubernetes.io/hostname #由于我们的mysql pod的标签为app=mysql,所以本次硬亲和性匹配失败,pod将会调度失败,如下所示: [root@master pod]# kubectl get -f pod-podaffinity.yaml #查看pod的状态,pod状态显示度失败 NAME READY STATUS RESTARTS AGE pod-podaffinity-required 0/1 Pending 0 8s [root@master pod]# kubectl describe pod pod-podaffinity-required| tail -5 #查看详细信息,显示不满足pod affinity Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 30s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules. [root@master pod]# #删除pod之后,将定义匹配规则修改为values: ["mysql","bb"],然后重新创建pod,查看pod,如下: [root@master pod]# kubectl get -f pod-podaffinity.yaml -o wide #发现pod已经调度到node2上面去了,说明pod亲和性设置成功 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-podaffinity-required 1/1 Running 0 47s 10.244.2.107 node2 <none> <none> [root@master pod]#
podAntiAffinity pod反亲和性调度
podAntiAffinity pod反亲和性调度也有硬反亲和性、软反亲和性,其语法与podAffinity pod亲和性调度 基本一致。
[root@master pod]# vim pod-podantiaffinity.yaml #创建一个pod反亲和性yaml文件 apiVersion: v1 kind: Pod metadata: name: pod-podantiaffinity-required labels: env: dev namespace: default spec: containers: - image: nginx:latest name: nginx-container-podantiaffinity-required imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 affinity: #定义亲和性 podAntiAffinity: #类型为反亲和性 requiredDuringSchedulingIgnoredDuringExecution: #硬反亲和性 - labelSelector: #标签选择器 matchExpressions: #表达式,结和反亲和性,整个意思就是不调度pod到具有app=mysql的pod所在的node节点 - key: app operator: In values: ["mysql"] topologyKey: kubernetes.io/hostname [root@master pod]# kubectl apply -f pod-podantiaffinity.yaml pod/pod-podantiaffinity-required created [root@master pod]# kubectl get -f pod-podantiaffinity.yaml -o wide #查看pod,pod已经被调度到node1,pod反亲和性模拟成功 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-podantiaffinity-required 1/1 Running 0 12s 10.244.1.139 node1 <none> <none> [root@master pod]#