测试环境k8s节点经常出现pod被驱逐的状态,如下:
[root@master bin]# kubectl get pod -A|grep Evicted middleware gitlab-7fd64c6d-2mv8v 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-4g9vg 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-4tssj 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-6jtts 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-76cs5 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-8hb7p 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-8tgb6 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-9f2kh 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-bt6ph 0/1 Evicted 0 7d1h middleware gitlab-7fd64c6d-fxdq4 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-hdkgf 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-lg9s8 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-n8k68 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-p6tpp 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-pxx49 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-qcxxb 0/1 Evicted 0 6d11h middleware gitlab-7fd64c6d-sm775 0/1 Evicted 0 6d11h wehgc back-etc-c5c8c9b65-ddd75 0/1 Evicted 0 5d13h wehgc back-etc-c5c8c9b65-dm7cp 0/1 Evicted 0 7d12h wehgc back-etc-c5c8c9b65-wx2sf 0/1 Evicted 0 36h wehgc hgc-gateway-8d7fbd9fd-7zggp 0/1 Evicted 0 7d12h wehgc hgc-gateway-8d7fbd9fd-lxgj5 0/1 Evicted 0 24h wehgc lucky-creditor-6d5f8b9b5-htlcj 0/1 Evicted 0 2d19h wehgc lucky-creditor-6d5f8b9b5-hwms2 0/1 Evicted 0 3d19h wehgc lucky-creditor-6d5f8b9b5-tdmq7 0/1 Evicted 0 6d wehgc lucky-creditor-6d5f8b9b5-x5zmq 0/1 Evicted 0 7d12h
查看其中一个pod的描述,出现Pod The node had condition: [DiskPressure]
[root@master bin]# kubectl -n middleware describe pod gitlab-7fd64c6d-4g9vg Name: gitlab-7fd64c6d-4g9vg Namespace: middleware Priority: 0 Node: 10.1.6.100/ Start Time: Fri, 30 Apr 2021 23:37:48 +0800 Labels: name=gitlab pod-template-hash=7fd64c6d Annotations: <none> Status: Failed Reason: Evicted Message: Pod The node had condition: [DiskPressure]. IP:
去这个节点查看根目录空间还有80%,感觉80不应该就驱逐啊,于是查看下/var/logs/messages,查询关键字Disk usage,确实有达到85%的,这80%是自动驱逐后剩下的空间
[root@node01 ~]# grep "Disk usage" /var/log/messages May 2 14:11:06 node01 kubelet: I0502 14:11:06.180116 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3874109030 bytes down to the low threshold (80%). May 2 14:16:06 node01 kubelet: I0502 14:16:06.537178 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3501626982 bytes down to the low threshold (80%). May 2 23:01:08 node01 kubelet: I0502 23:01:08.385688 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3741947494 bytes down to the low threshold (80%). May 2 23:06:09 node01 kubelet: I0502 23:06:09.009674 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3289896550 bytes down to the low threshold (80%). May 2 23:11:09 node01 kubelet: I0502 23:11:09.049373 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3674576486 bytes down to the low threshold (80%). May 2 23:16:09 node01 kubelet: I0502 23:16:09.206514 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3915044454 bytes down to the low threshold (80%). May 3 23:01:15 node01 kubelet: I0503 23:01:15.789941 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3434174054 bytes down to the low threshold (80%). May 3 23:06:16 node01 kubelet: I0503 23:06:16.730126 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3821581926 bytes down to the low threshold (80%). May 3 23:16:16 node01 kubelet: I0503 23:16:16.784770 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 86% which is over the high threshold (85%). Trying to free 4173846118 bytes down to the low threshold (80%). May 4 14:56:22 node01 kubelet: I0504 14:56:22.554667 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3345548902 bytes down to the low threshold (80%). May 4 15:11:30 node01 kubelet: I0504 15:11:30.987801 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3642582630 bytes down to the low threshold (80%). May 4 15:16:31 node01 kubelet: I0504 15:16:31.083252 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 86% which is over the high threshold (85%). Trying to free 4058433126 bytes down to the low threshold (80%). May 4 15:21:31 node01 kubelet: I0504 15:21:31.120914 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3259909734 bytes down to the low threshold (80%). May 4 15:26:31 node01 kubelet: I0504 15:26:31.160187 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3577026150 bytes down to the low threshold (80%). May 4 23:56:43 node01 kubelet: I0504 23:56:43.191514 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3919537766 bytes down to the low threshold (80%). May 5 17:31:57 node01 kubelet: I0505 17:31:57.351577 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3441280614 bytes down to the low threshold (80%). May 5 17:51:58 node01 kubelet: I0505 17:51:58.022054 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3728758374 bytes down to the low threshold (80%). May 6 10:22:13 node01 kubelet: I0506 10:22:13.941872 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3867670118 bytes down to the low threshold (80%). May 6 10:32:14 node01 kubelet: I0506 10:32:14.222872 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3846407782 bytes down to the low threshold (80%). May 6 21:37:18 node01 kubelet: I0506 21:37:18.331213 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3667904102 bytes down to the low threshold (80%). May 6 21:42:19 node01 kubelet: I0506 21:42:19.580873 2607 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 3346036326 bytes down to the low threshold (80%).
因为前期没有规划好docker的目录以及根目录的空间,所以才会导致这样,没办法,只能手动清除docker缓存或者写个定时任务定时清理,这里暂时手动清理下:
[root@node01 ~]# docker system prune -a WARNING! This will remove: - all stopped containers - all networks not used by at least one container - all images without at least one container associated to them - all build cache Are you sure you want to continue? [y/N] y deleted: sha256:457c85f6c3c5a46d1b3b333d1a3d405483d68e0b717e47ff87e0352a9efafdc9 deleted: sha256:82aff00bd5a27b1779fa0afae516fff2a83d87b6512e654f12e4a668ab991d0c deleted: sha256:14ee9b47ea172fd6143461ffb8f4193debe56bd36f989328c5376f051f598235 deleted: sha256:0c8d5ef81127e8957e09c299cd9b4cf4efae01ccfff268c3221f4ce8c01759a8 Total reclaimed space: 3.36GB
后续也可以清理一下根目录,分析一下是什么原因导致根目录空间持续上涨