RDB 基于时间的快照,其默认只保留当前最新的一次快照,特点是执行速度比较快,缺点是可能会丢失从上次快照到当前时间点之间未做快照的数据。
RDB bgsave(异步)实现快照具体过程
优点
RDB 快照保存了某个时间点的数据,可以通过脚本执行 redis 指令 bgsave(非阻塞,后台执行)或者 save(会阻塞写操作,不推荐)命令自定义时间点备份,可以保留多个备份,当出现问题可以恢复到不同时间点的版本,很适合备份,并且此文件格式也支持有不少第三方工具可以进行后续的数据分析。
比如: 可以在最近的 24 小时内,每小时备份一次 RDB 文件,并且在每个月的每一天,也备份一个 RDB 文件。这样的话,即使遇上问题,也可以随时将数据集还原到不同的版本。
RDB 可以最大化 Redis 的性能,父进程在保存 RDB 文件时唯一要做的就是 fork 出一个子进程,然后这个子进程就会处理接下来的所有保存工作,父进程无须执行任何磁盘工/0 操作。
RDB 在大量数据,比如几个 G 的数据,恢复的速度比 AOF 的快
缺点
不能实时保存数据,可能会丢失自上一次执行 RDB 备份到当前的内存数据
如果需要尽量避免在服务器故障时丢失数据,那么 RDB 不适合。虽然 Redis 允许设置不同的保存点(save point)来控制保存 RDB 文件的频率,但是,因为 RDB 文件需要保存整个数据集的状态,所以它并不是一个轻松快速的操作。因此一般会超过 5 分钟以上才保存一次 RDB 文件。在这种情况下,一旦发生故障停机,就可能会丢失好几分钟的数据。
当数据量非常大的时候,从父进程 fork 子进程进行保存至 RDB 文件时需要一点时间,可能是毫秒或者秒,取决于磁盘 IO 性能
在数据集比较庞大时,fork()可能会非常耗时,造成服务器在一定时间内停止处理客户端﹔如果数据集非常巨大,并且 CPU 时间非常紧张的话,那么这种停止时间甚至可能会长达整整一秒或更久。虽然 AOF 重写也需要进行 fork(),但无论 AOF 重写的执行间隔有多长,数据的持久性都不会有任何损失。
AOF 按照操作顺序依次将操作追加到指定的日志文件末尾。
注意:
同时启用 RDB 和 AOF,进行恢复时,默认 AOF 文件优先级高于 RDB 文件,即会使用 AOF 文件进行恢复;
AOF 模式默认是关闭的,第一次开启 AOF 后,并重启服务生效后,会因为 AOF 的优先级高于 RDB,而 AOF 默认没有文件存在,从而导致所有数据丢失。
AOF rewrite 重写
将一些重复的,可以合并的,过期的数据重新写入一个新的 AOF 文件,从而节约 AOF 备份占用的硬盘空间,也能加速恢复过程;可以手动执行 bgrewriteaof 触发 AOF,或定义自动 rewrite 策略。
AOF rewrite 过程
优点
数据安全性相对较高,根据所使用的 fsync 策略(fsync 是同步内存中 redis 所有已经修改的文件到存储设备),默认是 appendfsync everysec,即每秒执行一次 fsync,在这种配置下,Redis 仍然可以保持良好的性能,并且就算发生故障停机,也最多只会丢失一秒钟的数据( fsync 会在后台线程执行,所以主线程可以继续努力地处理命令请求)
由于该机制对日志文件的写入操作采用的是 append 模式,因此在写入过程中不需要 seek, 即使出现宕机现象,也不会破坏日志文件中已经存在的内容。然而如果本次操作只是写入了一半数据就出现了系统崩溃问题,不用担心,在 Redis 下一次启动之前,可以通过 redis-check-aof 工具来解决数据一致性的问题
Redis 可以在 AOF 文件体积变得过大时,自动地在后台对 AOF 进行重写,重写后的新 AOF 文件包含了恢复当前数据集所需的最小命令集合。整个重写操作是绝对安全的,因为 Redis 在创建新 AOF 文件的过程中,append 模式不断的将修改数据追加到现有的 AOF 文件里面,即使重写过程中发生停机,现有的 AOF 文件也不会丢失。而一旦新 AOF 文件创建完毕,Redis 就会从旧 AOF 文件切换到新 AOF 文件,并开始对新 AOF 文件进行追加操作。
AOF 包含一个格式清晰、易于理解的日志文件用于记录所有的修改操作。事实上,也可以通过该文件完成数据的重建
AOF 文件有序地保存了对数据库执行的所有写入操作,这些写入操作以 Redis 协议的格式保存,因此 AOF 文件的内容非常容易被人读懂,对文件进行分析(parse)也很轻松。导出(export)AOF 文件也非常简单:举个例子,如果不小心执行了 FLUSHALL.命令,但只要 AOF 文件未被重写,那么只要停止服务器,移除 AOF 文件末尾的 FLUSHAL 命令,并重启 Redis ,就可以将数据集恢复到 FLUSHALL 执行之前的状态。
缺点
一键编译 redis 安装脚本
#!/bin/bash # 编译安装Redis source /etc/init.d/functions #Redis版本 Redis_version=redis-5.0.9 suffix=tar.gz Redis=${Redis_version}.${suffix} Password=123456 #redis源码下载地址 redis_url=http://download.redis.io/releases/${Redis} #redis安装路径 redis_install_DIR=/apps/redis # CPU数量 CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'` # 系统类型 os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'` # 系统版本号 os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release` color () { if [[ $2 -eq 0 ]];then echo -e "\e[1;32m$1\t\t\t\t\t\t[ OK ]\e[0;m" else echo $2 echo -e "\e[1;31m$1\t\t\t\t\t\t[ FAILED ]\e[0;m" fi } download_redis (){ # 安装依赖包 yum -y install gcc jemalloc-devel || { color "安装依赖包失败,请检查网络" 1 ;exit 1;} cd /opt if [ -e ${Redis} ];then color "Redis源码包已存在" 0 else color "开始下载Redis源码包" 0 wget ${redis_url} if [ $? -ne 0 ];then color "下载Redis源码包失败,退出!" 1 exit 1 fi fi } install_redis (){ # 解压源码包 tar xvf /opt/${Redis} -C /usr/local/src ln -s /usr/local/src/${Redis_version} /usr/local/src/redis # 编译安装 cd /usr/local/src/redis make -j ${CPUS} install PREFIX=${redis_install_DIR} if [ $? -ne 0 ];then color "redis 编译安装失败!" 1 exit 1 else color "redis编译安装成功" 0 fi ln -s ${redis_install_DIR}/bin/redis-* /usr/sbin/ # 添加用户 if id redis &> /dev/null;then color "redis用户已存在" 1 else useradd -r -s /sbin/nologin redis color "redis用户已创建完成" 0 fi mkdir -p ${redis_install_DIR}/{etc,log,data,run} #准备redis配置文件 cp redis.conf ${redis_install_DIR}/etc/ sed -i "s/bind 127.0.0.1/bind 0.0.0.0/" ${redis_install_DIR}/etc/redis.conf sed -i "/# requirepass/a requirepass ${Password}" ${redis_install_DIR}/etc/redis.conf sed -i "s@^dir .*\$@dir ${redis_install_DIR}\/data@" ${redis_install_DIR}/etc/redis.conf sed -i "s@^logfile .*\$@logfile ${redis_install_DIR}\/log\/redis-6379.log@" ${redis_install_DIR}/etc/redis.conf sed -i "s@^pidfile .*\$@pidfile ${redis_install_DIR}\/run\/redis-6379.pid@" ${redis_install_DIR}/etc/redis.conf chown -R redis:redis ${redis_install_DIR} cat >> /etc/sysctl.conf <<EOF net.core.somaxconn = 1024 vm.overcommit_memory = 1 EOF sysctl -p echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.d/rc.local chmod +x /etc/rc.d/rc.local source /etc/rc.d/rc.local # 准备service服务 cat > /usr/lib/systemd/system/redis.service <<EOF [Unit] Description=redis persistent key-value database After=network.target [Service] ExecStart=${redis_install_DIR}/bin/redis-server ${redis_install_DIR}/etc/redis.conf --supervised systemd ExecStop=/bin/kill -s QUIT \$MAINPID Type=notify User=redis Group=redis RuntimeDirectory=redis RuntimeDirectoryMode=0755 [Install] WantedBy=multi-user.target EOF chown -R redis:redis ${redis_install_DIR} systemctl daemon-reload systemctl enable --now redis systemctl is-active redis if [ $? -ne 0 ];then color "redis服务启动失败!" 1 exit 1 else color "redis服务启动成功" 0 color "redis安装已完成" 0 fi } download_redis install_redis exit 0
master 节点配置
#修改redis.conf配置 vim /apps/redis/etc/redis.conf bind 0.0.0.0 masterauth "123456" requirepass "123456" #重启redis systemctl restart redis
slave 节点配置
#修改redis.conf配置 vim /apps/redis/etc/redis.conf bind 0.0.0.0 masterauth "123456" requirepass "123456" replicaof 10.0.0.7 6379 #重启redis systemctl restart redis
状态查看
master
[root@master ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=10.0.0.27,port=6379,state=online,offset=28,lag=1 slave1:ip=10.0.0.17,port=6379,state=online,offset=28,lag=1 master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:28 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:28 127.0.0.1:6379>
slave1
[root@slave1 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.7 master_port:6379 master_link_status:up master_last_io_seconds_ago:9 master_sync_in_progress:0 slave_repl_offset:154 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:154 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:154 127.0.0.1:6379>
slave2
[root@slave2 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.7 master_port:6379 master_link_status:up master_last_io_seconds_ago:5 master_sync_in_progress:0 slave_repl_offset:210 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:210 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:210 127.0.0.1:6379>
Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口。
哨兵可以不和Redis服务器部署在一起,但一般部署在一起。
cp /usr/local/src/redis/sentinel.conf /apps/redis/etc/redis-sentinel.conf cd /apps/redis/etc/ #配置sentinel [root@master etc]# grep "^[a-Z]" redis-sentinel.conf bind 0.0.0.0 port 26379 daemonize yes pidfile /apps/redis/run/redis-sentinel.pid logfile /apps/redis/log/sentinel_26379.log dir /apps/redis/data sentinel monitor mymaster 10.0.0.7 6379 2 sentinel auth-pass mymaster 123456 sentinel down-after-milliseconds mymaster 3000 sentinel parallel-syncs mymaster 1 sentinel failover-timeout mymaster 180000 sentinel deny-scripts-reconfig yes #启动sentinel [root@master etc]# redis-sentinel /apps/redis/etc/redis-sentinel.conf #查看sentinel配置信息 [root@master etc]# grep "^[a-Z]" redis-sentinel.conf bind 0.0.0.0 port 26379 daemonize yes pidfile /apps/redis/run/redis-sentinel.pid logfile /apps/redis/log/sentinel_26379.log dir /apps/redis/data sentinel deny-scripts-reconfig yes sentinel monitor mymaster 10.0.0.7 6379 2 sentinel parallel-syncs mymaster 1 sentinel down-after-milliseconds mymaster 3000 sentinel auth-pass mymaster 123456 sentinel config-epoch mymaster 0 #以下内容为自动生成 sentinel myid c663d4b9db845d721cd6dccf608c7904d896b745 #myid必须唯一 protected-mode no sentinel leader-epoch mymaster 0 sentinel known-replica mymaster 10.0.0.27 6379 sentinel known-replica mymaster 10.0.0.17 6379 sentinel known-sentinel mymaster 10.0.0.27 26379 66f276f274802c6f0243007a2be4b04001b9867e sentinel known-sentinel mymaster 10.0.0.17 26379 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac sentinel current-epoch 0
配置sentinel服务
[root@shichu ~]# cat /lib/systemd/system/redis-sentinel.service [Unit] Description=Redis Sentinel After=network.target After=network-online.target Wants=network-online.target [Service] ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/redis-sentinel.conf --supervised systemd ExecStop=/bin/kill -s QUIT $MAINPID Type=notify User=redis Group=redis RuntimeDirectory=redis RuntimeDirectoryMode=0755 [Install] WantedBy=multi-user.target
启动sentinel服务
chown -R redis:redis /apps/redis systemctl daemon-reload systemctl enable --now redis-sentinel
sentinel配置参数说明
sentinel monitor mymaster 10.0.0.8 6379 2 # 指定当前mymaster集群中master服务器的地址和端口
2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据
sentinel auth-pass mymaster 123456 #mymaster集群中master的密码,注意此行要在上面行的下面
sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000
sentinel parallel-syncs mymaster 1 #发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力
sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间,单位:毫秒
sentinel deny-scripts-reconfig yes #禁止修改脚本
[root@master etc]# ss -ntl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 511 *:26379 *:* LISTEN 0 511 *:6379 *:* LISTEN 0 128 *:111 *:* LISTEN 0 128 *:22 *:* LISTEN 0 100 [::1]:25 [::]:* LISTEN 0 128 [::]:111 [::]:* LISTEN 0 128 [::]:22
查看sentinel日志
master日志
[root@master redis]# tail /apps/redis/log/sentinel_26379.log 1491:X 11 Jul 2022 16:38:43.636 * supervised by systemd, will signal readiness 1491:X 11 Jul 2022 16:38:43.637 * Increased maximum number of open files to 10032 (it was originally set to 1024). 1491:X 11 Jul 2022 16:38:43.637 * Running mode=sentinel, port=26379. 1491:X 11 Jul 2022 16:38:43.638 # Sentinel ID is c663d4b9db845d721cd6dccf608c7904d896b745 1491:X 11 Jul 2022 16:38:43.638 # +monitor master mymaster 10.0.0.7 6379 quorum 2 1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 16:39:20.763 # -sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 16:39:48.855 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
slave1日志
[root@slave1 ~]# tail /apps/redis/log/sentinel_26379.log 1293:X 11 Jul 2022 16:39:19.722 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1293, just started 1293:X 11 Jul 2022 16:39:19.722 # Configuration loaded 1293:X 11 Jul 2022 16:39:19.722 * supervised by systemd, will signal readiness 1293:X 11 Jul 2022 16:39:19.723 * Increased maximum number of open files to 4096 (it was originally set to 1024). 1293:X 11 Jul 2022 16:39:19.724 * Running mode=sentinel, port=26379. 1293:X 11 Jul 2022 16:39:19.724 # Sentinel ID is 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 1293:X 11 Jul 2022 16:39:19.724 # +monitor master mymaster 10.0.0.7 6379 quorum 2 1293:X 11 Jul 2022 16:39:22.777 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379 1293:X 11 Jul 2022 16:39:48.988 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
slave2日志
[root@slave2 ~]# tail /apps/redis/log/sentinel_26379.log 900:X 11 Jul 2022 16:32:23.322 # +sdown sentinel 605f713c7e6554ae0bfed0b98304e29d6a69e678 10.0.0.37 26379 @ mymaster 10.0.0.7 6379 1256:X 11 Jul 2022 16:39:48.523 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1256:X 11 Jul 2022 16:39:48.523 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1256, just started 1256:X 11 Jul 2022 16:39:48.523 # Configuration loaded 1256:X 11 Jul 2022 16:39:48.523 * supervised by systemd, will signal readiness 1256:X 11 Jul 2022 16:39:48.524 * Increased maximum number of open files to 4096 (it was originally set to 1024). 1256:X 11 Jul 2022 16:39:48.525 * Running mode=sentinel, port=26379. 1256:X 11 Jul 2022 16:39:48.525 # Sentinel ID is 66f276f274802c6f0243007a2be4b04001b9867e 1256:X 11 Jul 2022 16:39:48.525 # +monitor master mymaster 10.0.0.7 6379 quorum 2
查看sentinel状态
[root@master redis]# redis-cli -a 123456 -p 26379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:26379> info sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=10.0.0.7:6379,slaves=2,sentinels=3 #两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突
[root@master etc]# systemctl stop redis [root@master etc]# ss -ntl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 511 *:26379 *:* LISTEN 0 128 *:111 *:* LISTEN 0 128 *:22 *:* LISTEN 0 100 [::1]:25 [::]:* LISTEN 0 128 [::]:111 [::]:* LISTEN 0 128 [::]:22
[root@master redis]# tail -f /apps/redis/log/sentinel_26379.log 1491:X 11 Jul 2022 17:07:16.959 # +sdown master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:17.044 # +odown master mymaster 10.0.0.7 6379 #quorum 2/2 1491:X 11 Jul 2022 17:07:17.044 # +new-epoch 4 1491:X 11 Jul 2022 17:07:17.044 # +try-failover master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:17.045 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4 1491:X 11 Jul 2022 17:07:17.048 # 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac voted for c663d4b9db845d721cd6dccf608c7904d896b745 4 1491:X 11 Jul 2022 17:07:17.050 # 66f276f274802c6f0243007a2be4b04001b9867e voted for c663d4b9db845d721cd6dccf608c7904d896b745 4 1491:X 11 Jul 2022 17:07:17.102 # +elected-leader master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:17.102 # +failover-state-select-slave master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:17.205 # +selected-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:17.205 * +failover-state-send-slaveof-noone slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:17.269 * +failover-state-wait-promotion slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:18.078 # +promoted-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:18.078 # +failover-state-reconf-slaves master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:18.145 * +slave-reconf-sent slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-inprog slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-done slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:19.228 # -odown master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:19.228 # +failover-end master mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 17:07:19.228 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379 #可看出master节点已转移到10.0.0.27上 1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379 1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379 1491:X 11 Jul 2022 17:07:22.276 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
日志参数说明
+reset-master
:主服务器已被重置。
+slave:一个新的从服务器已经被 Sentinel 识别并关联。
+failover-state-reconf-slaves:故障转移状态切换到了 reconf-slaves 状态。
+failover-detected:另一个 Sentinel 开始了一次故障转移操作,或者一个从服务器转换成了主服务器。
+slave-reconf-sent:领头(leader)的 Sentinel 向实例发送了 SLAVEOF 命令,为实例设置新的主服务器。
+slave-reconf-inprog:实例正在将自己设置为指定主服务器的从服务器,但相应的同步过程仍未完成。
+slave-reconf-done:从服务器已经成功完成对新主服务器的同步。
-dup-sentinel:对给定主服务器进行监视的一个或多个 Sentinel 已经因为重复出现而被移除 —— 当 Sentinel 实例重启的时候,就会出现这种情况。
+sentinel:一个监视给定主服务器的新 Sentinel 已经被识别并添加。
+sdown:给定的实例现在处于主观下线状态。
-sdown:给定的实例已经不再处于主观下线状态。
+odown:给定的实例现在处于客观下线状态。
-odown:给定的实例已经不再处于客观下线状态。
+new-epoch:当前的纪元(epoch)已经被更新。
+try-failover:一个新的故障迁移操作正在执行中,等待被大多数 Sentinel 选中(waiting to be elected by the majority)。
+elected-leader:赢得指定纪元的选举,可以进行故障迁移操作了。
+failover-state-select-slave:故障转移操作现在处于 select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器。
no-good-slave:Sentinel 操作未能找到适合进行升级的从服务器。Sentinel 会在一段时间之后再次尝试寻找合适的从服务器来进行升级,又或者直接放弃执行故障转移操作。
selected-slave:Sentinel 顺利找到适合进行升级的从服务器。
failover-state-send-slaveof-noone:Sentinel 正在将指定的从服务器升级为主服务器,等待升级功能完成。
failover-end-for-timeout:故障转移因为超时而中止,不过最终所有从服务器都会开始复制新的主服务器(slaves will eventually be configured to replicate with the new master anyway)。
failover-end:故障转移操作顺利完成。所有从服务器都开始复制新的主服务器了。
+switch-master:配置变更,主服务器的 IP 和地址已经改变。 这是绝大多数外部用户都关心的信息。
+tilt :进入 tilt 模式。
-tilt :退出 tilt 模式。
故障转移后
redis配置文件中replicaof的master IP自动修改
[root@slave1 ~]# grep "^replicaof" /apps/redis/etc/redis.conf replicaof 10.0.0.27 6379
sentinel配置文件的sentinel monitor IP自动修改
[root@slave1 ~]# grep "^sentinel monitor" /apps/redis/etc/redis-sentinel.conf sentinel monitor mymaster 10.0.0.27 6379 2
redis状态
新master状态
[root@slave2 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:1 slave0:ip=10.0.0.17,port=6379,state=online,offset=4290787,lag=1 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff master_repl_offset:4290787 second_repl_offset:3910006 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:3242212 repl_backlog_histlen:1048576 127.0.0.1:6379>
另一个slave指向新的master
[root@slave1 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.27 master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:4296387 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff master_repl_offset:4296387 second_repl_offset:3910006 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:3247812 repl_backlog_histlen:1048576 127.0.0.1:6379>
恢复原故障master重新加入redis集群
[root@master redis]# systemctl start redis
原master状态
#redis配置指向新的master节点 [root@master redis]# grep "^replicaof" /apps/redis/etc/redis.conf replicaof 10.0.0.27 6379 #查看redis状态 [root@master redis]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.27 master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:4366815 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:4366815 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:4343555 repl_backlog_histlen:23261 #查看sentinel状态 [root@master redis]# redis-cli -a 123456 -p 26379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=10.0.0.27:6379,slaves=2,sentinels=3
新master状态
#redis状态 [root@slave2 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=10.0.0.17,port=6379,state=online,offset=4407027,lag=0 slave1:ip=10.0.0.7,port=6379,state=online,offset=4407160,lag=0 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff master_repl_offset:4407293 second_repl_offset:3910006 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:3358718 repl_backlog_histlen:1048576 #sentinel日志 [root@slave2 ~]# tail -f /apps/redis/log/sentinel_26379.log 1256:X 11 Jul 2022 17:07:17.049 # +new-epoch 4 1256:X 11 Jul 2022 17:07:17.052 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4 1256:X 11 Jul 2022 17:07:17.068 # +odown master mymaster 10.0.0.7 6379 #quorum 3/2 1256:X 11 Jul 2022 17:07:17.068 # Next failover delay: I will not start a failover before Mon Jul 11 17:13:17 2022 1256:X 11 Jul 2022 17:07:18.149 # +config-update-from sentinel c663d4b9db845d721cd6dccf608c7904d896b745 10.0.0.7 26379 @ mymaster 10.0.0.7 6379 1256:X 11 Jul 2022 17:07:18.149 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379 1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379 1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379 1256:X 11 Jul 2022 17:07:21.189 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379 1256:X 11 Jul 2022 17:43:54.361 # -sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
sentinel运维
手动让主节点下线
sentinel failover <masterName>
范例
#可指定优先级,值越小sentinel会优先将之选为新的master,默为值为100 [root@slave1 ~]# grep 'replica-priority' /apps/redis/etc/redis.conf replica-priority 30 [root@slave1 ~]# redis-cli -a 123456 -p 26379 127.0.0.1:26379> sentinel failover mymaster OK 127.0.0.1:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3
官方文档:https://redis.io/topics/cluster-tutorial
每个redis 节点采用相同的硬件配置、相同的密码、相同的redis版本
所有redis服务器必须没有任何数据
准备6台机器,三主三从架构
#集群节点 Redis-node1:10.0.0.7 Redis-node2:10.0.0.17 Redis-node3:10.0.0.27 Redis-node4: 10.0.0.37 Redis-node5: 10.0.0.47 Redis-node6: 10.0.0.57 #预留节点 10.0.0.67 10.0.0.77
修改redis配置
[root@node1 etc]# cat redis.conf ... bind 0.0.0.0 masterauth 123456 #建议配置,否则后期的master和slave主从复制无法成功,还需再配置 requirepass 123456 cluster-enabled yes #取消此行注释,必须开启集群,开启后redis 进程会有cluster显示 cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护 cluster-require-full-coverage no #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能 ... [root@node1 etc]#systemctl enable --now redis
#查看端口 [root@node1 ~]# ss -ntl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 511 *:6379 *:* LISTEN 0 128 *:111 *:* LISTEN 0 128 *:22 *:* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 511 *:16379 *:* LISTEN 0 128 [::]:111 [::]:* LISTEN 0 128 [::]:22 [::]:* LISTEN 0 100 [::1]:25 [::]:* #查看进程有[cluster]状态 [root@node1 ~]# ps aux|grep redis redis 24754 0.2 0.3 153996 3172 ? Ssl 21:28 0:02 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster] root 24822 0.0 0.0 112812 980 pts/0 R+ 21:44 0:00 grep --color=auto redis
[root@node1 ~]# redis-cli -a 123456 --cluster create 10.0.0.7:6379 10.0.0.17:6379 10.0.0.27:6379 10.0.0.37:6379 \ 10.0.0.47:6379 10.0.0.57:6379 --cluster-replicas 1 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. >>> Performing hash slots allocation on 6 nodes... Master[0] -> Slots 0 - 5460 Master[1] -> Slots 5461 - 10922 Master[2] -> Slots 10923 - 16383 Adding replica 10.0.0.47:6379 to 10.0.0.7:6379 Adding replica 10.0.0.57:6379 to 10.0.0.17:6379 Adding replica 10.0.0.37:6379 to 10.0.0.27:6379 M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379 #带M的为master slots:[0-5460] (5461 slots) master #当前master的槽位起始和结束位 M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379 slots:[5461-10922] (5462 slots) master M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379 slots:[10923-16383] (5461 slots) master S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379 #带S的slave replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379 replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379 replicates 12fdc235442ed40a838e77b246025799b4b3357b Can I set the above configuration? (type 'yes' to accept): yes #输入yes自动创建集群 >>> Nodes configuration updated >>> Assign a different config epoch to each node >>> Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join ... >>> Performing Cluster Check (using node 10.0.0.7:6379) M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379 slots:[0-5460] (5461 slots) master #已经分配的槽位 1 additional replica(s) #分配了一个slave S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379 slots: (0 slots) slave #slave没有分配槽位 replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 #对应的master的10.0.0.27的ID M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379 slots: (0 slots) slave replicates 12fdc235442ed40a838e77b246025799b4b3357b #对应的master的10.0.0.17的ID M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379 slots: (0 slots) slave replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec #对应的master的10.0.0.7的ID [OK] All nodes agree about slots configuration. #所有节点槽位分配完成 >>> Check for open slots... #检查打开的槽位 >>> Check slots coverage... #检查插槽覆盖范围 [OK] All 16384 slots covered. #所有槽位(16384个)分配完成 [root@node1 ~]#
观察以上结果,可以看到3组master/slave
master:10.0.0.7-->slave:10.0.0.47 master:10.0.0.17-->slave:10.0.0.57 master:10.0.0.27-->slave:10.0.0.37
node1(10.0.0.7)
[root@node1 ~]# redis-cli -a 123456 -c info replication Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Replication role:master connected_slaves:1 slave0:ip=10.0.0.47,port=6379,state=online,offset=1008,lag=1 master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1008 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:1008
node2(10.0.0.17)
[root@node2 etc]# redis-cli -a 123456 -c info replication Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Replication role:master connected_slaves:1 slave0:ip=10.0.0.57,port=6379,state=online,offset=1008,lag=0 master_replid:269568d06cb92748f583d6ea900e7563b1739f54 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1008 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:1008
node3(10.0.0.27)
[root@node3 ~]# redis-cli -a 123456 -c info replication Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Replication role:master connected_slaves:1 slave0:ip=10.0.0.37,port=6379,state=online,offset=1008,lag=0 master_replid:826e716b92aa4e287013a33f9786e529be2fff71 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1008 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:1008
node4(10.0.0.37)
[root@node4 ~]# redis-cli -a 123456 -c info replication Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Replication role:slave master_host:10.0.0.27 master_port:6379 master_link_status:up master_last_io_seconds_ago:6 master_sync_in_progress:0 slave_repl_offset:1008 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:826e716b92aa4e287013a33f9786e529be2fff71 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1008 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:1008
node5(10.0.0.47)
[root@node5 ~]# redis-cli -a 123456 -c info replication Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Replication role:slave master_host:10.0.0.7 master_port:6379 master_link_status:up master_last_io_seconds_ago:4 master_sync_in_progress:0 slave_repl_offset:1008 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1008 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:1008
node6(10.0.0.57)
[root@node6 ~]# redis-cli -a 123456 -c info replication Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Replication role:slave master_host:10.0.0.17 master_port:6379 master_link_status:up master_last_io_seconds_ago:10 master_sync_in_progress:0 slave_repl_offset:1008 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:269568d06cb92748f583d6ea900e7563b1739f54 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1008 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:1008
查看指定master节点的slave节点信息
#获取所有节点信息 [root@node1 ~]# redis-cli -a 123456 cluster nodes 2>/dev/null 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554345797 4 connected 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657554345000 1 connected 0-5460 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657554343746 2 connected 5461-10922 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657554344770 6 connected 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657554344000 3 connected 10923-16383 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657554344000 5 connected #查看master节点ID对应的slave节点信息,16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7为10.0.0.27 master节点ID [root@node1 ~]# redis-cli -a 123456 cluster slaves 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 2>/dev/null 1) "59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554778157 4 connected"
[root@node1 ~]# redis-cli -a 123456 cluster info Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 #6个节点 cluster_size:3 #3组集群 cluster_current_epoch:6 cluster_my_epoch:1 cluster_stats_messages_ping_sent:3639 cluster_stats_messages_pong_sent:3625 cluster_stats_messages_sent:7264 cluster_stats_messages_ping_received:3620 cluster_stats_messages_pong_received:3639 cluster_stats_messages_meet_received:5 cluster_stats_messages_received:7264 #查看任意节点的集群状态 [root@node1 ~]# redis-cli -a 123456 --cluster info 10.0.0.27:6379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves. 10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves. 10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves. [OK] 0 keys in 3 masters. 0.00 keys per slot on average.
#获取集群中所有节点 [root@node1 ~]# redis-cli -a 123456 cluster nodes Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657556036000 4 connected 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657556036000 1 connected 0-5460 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657556036033 2 connected 5461-10922 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657556038079 6 connected 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657556037057 3 connected 10923-16383 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657556036000 5 connected [root@node1 ~]# redis-cli -a 123456 --cluster check 10.0.0.27:6379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves. 10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves. 10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves. [OK] 0 keys in 3 masters. 0.00 keys per slot on average. >>> Performing Cluster Check (using node 10.0.0.27:6379) M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379 slots: (0 slots) slave replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379 slots: (0 slots) slave replicates 12fdc235442ed40a838e77b246025799b4b3357b M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379 slots: (0 slots) slave replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
#连接节点,可能会出现槽位不在当前node所以无法写入 [root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7 10.0.0.7:6379> set key1 v1 (error) MOVED 9189 10.0.0.17:6379 #需要连接指定node,才可写入 [root@shichu ~]# redis-cli -a 123456 -h 10.0.0.17 10.0.0.17:6379> set key1 values1 OK 10.0.0.17:6379> get key1 "values1" #使用选项-c以集群方式连接,连接至集群中任意一节点均可 [root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7 -c 10.0.0.7:6379> set key1 v1 -> Redirected to slot [9189] located at 10.0.0.17:6379 OK 10.0.0.17:6379> get key1 "v1"
官网下载地址:https://www.zabbix.com/cn/download
官网文档:https://www.zabbix.com/manuals
https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz
使用LNMP编译安装Zabbix 5
L:Linux(CentOS7)https://mirrors.aliyun.com/centos/7/isos/x86_64/ N:Nginx(1.18.0) https://nginx.org/en/download.html M:MySQL(8.0.19) https://dev.mysql.com/downloads/mysql/ P:PHP(7.4.11) http://php.net/downloads.php Zabbix (5.0.25) https://cdn.zabbix.com/zabbix/sources/graph LR A[Client] B[Linux</br>Nginx</br>PHP</br>Zabbix</br>10.0.0.100] C[Linux</br>MySQL</br>10.0.0.200] A--->B--->C
参考:基于CentOS 7 二进制安装Mysql 8.0
安装完成后创建zabbix用户
mysql -uroot -p123456 -e "create database zabbix character set utf8 collate utf8_bin;" mysql -uroot -p123456 -e "create user zabbix@'10.0.0.%' identified by '123456'" mysql -uroot -p123456 -e "grant all privileges on zabbix.* to zabbix@'10.0.0.%'" mysql -uroot -p123456 -e "use mysql;\ alter user zabbix@'10.0.0.%' identified with mysql_native_password by '123456';\ flush privileges;"
参考:基于CentOS 7 编译安装Nginx 1.18[^1]
参考:基于CentOS 7 编译安装PHP 7.4[^2]
#!/bin/bash # 编译安装Zabbix source /etc/init.d/functions #Zabbix版本 Zabbix_Version=zabbix-5.0.25 Suffix=tar.gz Zabbix=${Zabbix_Version}.${Suffix} Password=123456 #Zabbix源码下载地址 Zabbix_url=https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz #Zabbix安装路径 Zabbix_install_DIR=/apps/zabbix # CPU数量 CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'` # 系统类型 os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'` # 系统版本号 os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release` color () { if [[ $2 -eq 0 ]];then echo -e "\e[1;32m$1\t\t\t\t\t\t[ OK ]\e[0;m" else echo $2 echo -e "\e[1;31m$1\t\t\t\t\t\t[ FAILED ]\e[0;m" fi } install_Zabbix (){ #----------------------------下载源码包----------------------------- cd /opt if [ -e ${Zabbix} ];then color "Zabbix源码包已存在" 0 else color "开始下载Zabbix源码包" 0 wget ${Zabbix_url} if [ $? -ne 0 ];then color "下载Zabbix源码包失败,退出!" 1 exit 1 fi fi #----------------------------解压源码包----------------------------- color "开始解压源码包" 0 tar -zxvf /opt/${Zabbix} -C /usr/local/src ln -s /usr/local/src/${Zabbix_Version} /usr/local/src/zabbix #----------------------------安装依赖包-------------------------------- color "开始安装依赖包" 0 #wget https://dev.mysql.com/get/mysql80-community-release-el7-6.noarch.rpm yum install -y gcc libxml2-devel net-snmp net-snmp-devel curl curl-devel php-gd php-bcmath php-xml \ php-mbstring mariadb mariadb-devel OpenIPMI-devel libevent-devel java-1.8.0-openjdk-devel \ || { color "安装依赖包失败,请检查网络" 1 ;exit 1;} #---------------------------创建Zabbix用户--------------------------- if id zabbix &> /dev/null ;then color "Zabbix用户已存在" 1 else groupadd --system zabbix useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix color "Zabbix用户已创建完成" 0 fi #---------------------------编译--------------------------- color "开始编译zabbix" 0 cd /usr/local/src/zabbix ./configure --prefix=${Zabbix_install_DIR} \ --enable-server \ --enable-agent \ --with-mysql \ --with-net-snmp \ --with-libcurl \ --with-libxml2 \ --with-openipmi \ --enable-proxy \ --enable-java make -j ${CPUS} install if [ $? -ne 0 ];then color "Zabbix 编译安装失败!" 1 exit 1 else color "Zabbix编译安装成功" 0 fi #复制web界面相关文件 mkdir -pv /home/nginx/zabbix cp -rf /usr/local/src/zabbix/ui/* /home/nginx/zabbix/ chown nginx:nginx -R /home/nginx/zabbix /apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf if [ $? -eq 0 ];then color "zabbix_server测试能正常启动" 0 pkill zabbix fi color "zabbix安装完成" 0 } install_Zabbix exit 0
修改/apps/nginx/conf/nginx.conf配置文件
worker_processes 1; pid logs/nginx.pid; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; server { listen 80; server_name 10.0.0.100; #指定主机名 server_tokens off; #隐藏nginx版本信息 location / { root /home/nginx/zabbix; #指定数据目录 index index.php index.html index.htm; #指定默认主页 } error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } location ~ \.php$ { #实现php-fpm root /home/nginx/zabbix; fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; fastcgi_hide_header X-Powered-By; #隐藏php版本信息 } location ~ ^/(ping|pm_status)$ { #实现状态页 include fastcgi_params; fastcgi_pass 127.0.0.1:9000; fastcgi_param PATH_TRANSLATED $document_root$fastcgi_script_name; } } }
修改php配置文件
#修改/etc/php.ini sed -i -e "/memory_limit/c memory_limit = 256M" \ -e "/post_max_size/c post_max_size = 30M" \ -e "/upload_max_filesize/c upload_max_filesize = 20M" \ -e "/max_execution_time/c max_execution_time = 300" \ -e "/max_input_time/c max_input_time = 300" \ -e "/;date.timezone/c date.timezone = Asia/Shanghai" \ /etc/php.ini #修改/apps/php/etc/php-fpm.d/www.conf sed -i -e "/user = www/c user = nginx" \ -e "/group = www/c group = nginx" /apps/php/etc/php-fpm.d/www.conf
重启服务
systemctl restart nginx php-fpm
导入mysql数据
mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/schema.sql mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/images.sql mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/data.sql
修改zabbix配置文件
sed -i "/# DBHost=localhost/aDBHost=10.0.0.200" /apps/zabbix/etc/zabbix_server.conf sed -i "/# DBPassword=/aDBPassword=123456" /apps/zabbix/etc/zabbix_server.conf sed -i "/# DBPort=/aDBPort=3306" /apps/zabbix/etc/zabbix_server.conf sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /apps/zabbix/etc/zabbix_server.conf
设置zabbix_server启动服务脚本
cat /lib/systemd/system/zabbix-server.service
[Unit] Description=Zabbix Server After=syslog.target After=network.target [Service] Environment="CONFFILE=/apps/zabbix/etc/zabbix_server.conf" EnvironmentFile=-/etc/default/zabbix-server Type=forking Restart=on-failure PIDFile=/tmp/zabbix_server.pid KillMode=control-group ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE ExecStop=/bin/kill -SIGTERM $MAINPID RestartSec=10s TimeoutStopSec=5 [Install] WantedBy=multi-user.target
启动服务
systemctl daemon-reload systemctl enable --now zabbix-server
设置zabbix_agent启动服务脚本
cat /lib/systemd/system/zabbix-agent.service
[Unit] Description=Zabbix Agent After=syslog.target After=network.target [Service] Environment="CONFFILE=/apps/zabbix/etc/zabbix_agentd.conf" EnvironmentFile=-/etc/default/zabbix-agent Type=forking Restart=on-failure PIDFile=/tmp/zabbix_agentd.pid KillMode=control-group ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE ExecStop=/bin/kill -SIGTERM $MAINPID RestartSec=10s User=zabbix Group=zabbix [Install] WantedBy=multi-user.target
启动服务
systemctl daemon-reload systemctl enable --now zabbix-agent
查看状态
#可看到10050(agent)、10051(server)端口 [root@shichu apps]# ss -ntl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:22 *:* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 128 *:10050 *:* LISTEN 0 128 *:10051 *:* LISTEN 0 128 127.0.0.1:9000 *:* LISTEN 0 128 *:111 *:* LISTEN 0 128 *:80 *:* LISTEN 0 128 [::]:22 [::]:* LISTEN 0 100 [::1]:25 [::]:* LISTEN 0 128 [::]:111 [::]:*
[root@shichu apps]# systemctl status zabbix-server ● zabbix-server.service - Zabbix Server Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-14 00:47:09 CST; 52s ago Process: 8346 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS) Process: 8352 ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS) Main PID: 8360 (zabbix_server) CGroup: /system.slice/zabbix-server.service ├─8360 /apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf ├─8362 /apps/zabbix/sbin/zabbix_server: configuration syncer [synced configuration in 0.059399 sec, idle 6... ├─8363 /apps/zabbix/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.027609 sec durin... ├─8364 /apps/zabbix/sbin/zabbix_server: alerter #1 started ├─8365 /apps/zabbix/sbin/zabbix_server: alerter #2 started ├─8366 /apps/zabbix/sbin/zabbix_server: alerter #3 started ├─8367 /apps/zabbix/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 11 values, idle 5.00... ├─8368 /apps/zabbix/sbin/zabbix_server: preprocessing worker #1 started ├─8369 /apps/zabbix/sbin/zabbix_server: preprocessing worker #2 started ├─8370 /apps/zabbix/sbin/zabbix_server: preprocessing worker #3 started ├─8371 /apps/zabbix/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.008702sec during 5.0... ├─8372 /apps/zabbix/sbin/zabbix_server: lld worker #1 started ├─8373 /apps/zabbix/sbin/zabbix_server: lld worker #2 started ├─8374 /apps/zabbix/sbin/zabbix_server: housekeeper [startup idle for 30 minutes] ├─8375 /apps/zabbix/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.001868 sec, id... ├─8376 /apps/zabbix/sbin/zabbix_server: http poller #1 [got 0 values in 0.001502 sec, idle 5 sec] ├─8377 /apps/zabbix/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.004759 sec, idle 60 sec] ├─8378 /apps/zabbix/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000050 sec,... ├─8379 /apps/zabbix/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000175 sec,... ├─8380 /apps/zabbix/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000029 sec,... ├─8381 /apps/zabbix/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000019 sec,... ├─8382 /apps/zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.004440 sec, idle 3 sec]... ├─8383 /apps/zabbix/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000028 sec, id... ├─8384 /apps/zabbix/sbin/zabbix_server: self-monitoring [processed data in 0.000016 sec, idle 1 sec] ├─8385 /apps/zabbix/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000836 sec, idle 5 sec] ├─8386 /apps/zabbix/sbin/zabbix_server: poller #1 [got 0 values in 0.000050 sec, idle 1 sec] ├─8387 /apps/zabbix/sbin/zabbix_server: poller #2 [got 0 values in 0.000048 sec, idle 1 sec] ├─8388 /apps/zabbix/sbin/zabbix_server: poller #3 [got 1 values in 0.001602 sec, idle 1 sec] ├─8389 /apps/zabbix/sbin/zabbix_server: poller #4 [got 0 values in 0.000019 sec, idle 1 sec] ├─8390 /apps/zabbix/sbin/zabbix_server: poller #5 [got 0 values in 0.001402 sec, idle 1 sec] ├─8391 /apps/zabbix/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000039 sec, idle 5 sec] ├─8392 /apps/zabbix/sbin/zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection... ├─8393 /apps/zabbix/sbin/zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection... ├─8394 /apps/zabbix/sbin/zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection... ├─8395 /apps/zabbix/sbin/zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection... ├─8396 /apps/zabbix/sbin/zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection... ├─8397 /apps/zabbix/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000020 sec, idle 5 sec] └─8398 /apps/zabbix/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.001557 ... Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Server... Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Server.
zabbix-agent服务状态
[root@shichu apps]# systemctl status zabbix-agent ● zabbix-agent.service - Zabbix Agent Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-14 00:47:09 CST; 58s ago Process: 8349 ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE (code=exited, status=0/SUCCESS) Main PID: 8353 (zabbix_agentd) CGroup: /system.slice/zabbix-agent.service ├─8353 /apps/zabbix/sbin/zabbix_agentd -c /apps/zabbix/etc/zabbix_agentd.conf ├─8354 /apps/zabbix/sbin/zabbix_agentd: collector [idle 1 sec] ├─8355 /apps/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection] ├─8356 /apps/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection] ├─8357 /apps/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection] └─8358 /apps/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec] Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Agent... Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Agent.
启动
浏览器访问本地IP(10.0.0.100)
需要手动下载配置文件上传至zabbix sever的/home/nginx/zabbix/conf/目录下
默认用户名:Admin #注意A是大写 密码:zabbix
显示中文
具体路径为:/home/nginx/zabbix/assets/fonts
vim /home/nginx/zabbix/include/defines.inc.php #修改如下两处即可 //define('ZBX_GRAPH_FONT_NAME', 'DejaVuSans'); // font file name define('ZBX_GRAPH_FONT_NAME', 'simkai'); // font file name #define('ZBX_FONT_NAME', 'DejaVuSans'); define('ZBX_FONT_NAME', 'simkai');
字体自动生效,无需重启zabbix及nginx服务
通过yum安装agent yum install zabbix50-agent
修改agent配置文件
[root@nginx ~]# grep '^[a-Z]' /etc/zabbix_agentd.conf PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 Server=10.0.0.100 #zabbix-server的IP或Proxy的IP ListenPort=10050 #监听端口,默认值 StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口 ServerActive=10.0.0.100 #主动模式下的zabbix-server的IP或Proxy的IP Hostname=10.0.0.7 #区分大小写且在zabbix server中值唯一,默认填本机IP Include=/etc/zabbix_agentd.conf.d/*.conf #在文件末尾新增子配置文件路径
启动服务
mkdir -p /etc/zabbix_agentd.conf.d
systemctl start zabbix-agent
查看状态
[root@nginx ~]# systemctl status zabbix-agent ● zabbix-agent.service - Zabbix Monitor Agent Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-14 16:07:35 CST; 1s ago Main PID: 1511 (zabbix_agentd) CGroup: /system.slice/zabbix-agent.service ├─1511 /usr/sbin/zabbix_agentd -f ├─1512 /usr/sbin/zabbix_agentd: collector [idle 1 sec] ├─1513 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection] ├─1514 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection] └─1515 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection] Jul 14 16:07:35 nginx systemd[1]: Stopped Zabbix Monitor Agent. Jul 14 16:07:35 nginx systemd[1]: Started Zabbix Monitor Agent. Jul 14 16:07:35 nginx zabbix_agentd[1511]: Starting Zabbix Agent [10.0.0.7]. Zabbix 5.0.21 (revision 47104dd574). Jul 14 16:07:35 nginx zabbix_agentd[1511]: Press Ctrl+C to exit.
web界面添加被监控主机
配置——主机——创建主机
#添加nginx状态配置 [root@nginx ~]# cat /etc/nginx/nginx.conf #在server块中添加状态页信息 ... location /nginx_status { stub_status; allow 10.0.0.0/24; allow 127.0.0.1; }
[root@nginx etc]# cat /etc/zabbix_agentd.d/nginx_status.sh #!/bin/bash nginx_status_fun(){ #函数内容 NGINX_PORT=$1 #端口,函数的第一个参数是脚本的第二个参数,即脚本的第二个参数是端口号 NGINX_COMMAND=$2 #命令,函数的第二个参数是脚本的第三个参数,即脚本的第三个参数是命令 nginx_active(){ #获取nginx_active数量,以下相同,这是开启了nginx状态但是只能从本机看到 /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Active' | awk '{print $NF}' } nginx_reading(){ #获取状态的数量 /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Reading' | awk '{print $2}' } nginx_writing(){ /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Writing' | awk '{print $4}' } nginx_waiting(){ /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Waiting' | awk '{print $6}' } nginx_accepts(){ /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $1}' } nginx_handled(){ /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $2}' } nginx_requests(){ /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $3}' } case $NGINX_COMMAND in active) nginx_active; ;; reading) nginx_reading; ;; writing) nginx_writing; ;; waiting) nginx_waiting; ;; accepts) nginx_accepts; ;; handled) nginx_handled; ;; requests) nginx_requests; esac } main(){ #主函数内容 case $1 in nginx_status) #分支结构,用于判断用户的输入而进行响应的操作 nginx_status_fun $2 $3; #当输入nginx_status就调用nginx_status_fun,并传递第二和第三个参数 ;; status) #获取状态码 curl -I -s http://127.0.0.1/nginx_status 2>/dev/null | awk 'NR==1{print $2}'; ;; # -I仅输出HTTP请求头,-s不输出任何东西 *) #其他的输入打印帮助信息 echo $"Usage: $0 {nginx_status key}" esac } main $1 $2 $3
添加zabbix agent自定义监控项(通过子配置文件方式)
[root@nginx etc]# cat /etc/zabbix_agentd.conf.d/nginx_monitor.conf UserParameter=nginx_status[*],/etc/zabbix_agentd.d/nginx_status.sh "$1" "$2" "$3"
验证测试
#重启服务 systemctl restart nginx zabbix-agent #本地获取所有nginx状态 [root@nginx zabbix_agentd.d]# curl 127.0.0.1/nginx_status Active connections: 1 server accepts handled requests 21 21 21 Reading: 0 Writing: 1 Waiting: 0 #本机获取active连接数 [root@nginx zabbix_agentd.d]# /etc/zabbix_agentd.d/nginx_status.sh nginx_status 80 active 1 #server获取active连接数 [root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.7 -p 10050 -k "nginx_status["nginx_status",80,"active"]" 1
导入监控模板
模板参考:nginx-template.xml
关联模板
查看导入的nginx模板监控项
验证监控
master(10.0.0.17)
#修改配置 vim /etc/my.cnf.d/server.cnf [mysqld] bind=0.0.0.0 server-id=17 log-bin #重启数据库 systemctl restart mariadb #创建复制用户 MariaDB [(none)]> create user 'repluser'@'10.0.0.%'; Query OK, 0 rows affected (0.00 sec) #授权复制用户权限 MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%'; Query OK, 0 rows affected (0.00 sec) #备份数据 [root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 \ --lock-tables > /opt/backup.sql #将备份数据复制到slave节点 [root@mysql-master ~]# scp /opt/backup.sql 10.0.0.27:/opt/ #查看二进制文件和位置 [root@mysql-master ~]# mysql MariaDB [(none)]> show master logs; +--------------------+-----------+ | Log_name | File_size | +--------------------+-----------+ | mariadb-bin.000001 | 29733 | | mariadb-bin.000002 | 245 | +--------------------+-----------+ 2 rows in set (0.00 sec)
slave(10.0.0.27)
#修改配置 vim /etc/my.cnf.d/server.cnf [mysqld] bind=0.0.0.0 server-id=27 read-only #重启数据库 systemctl restart mariadb # 导入master节点备份数据 [root@slave ~]# mysql < /opt/backup.sql #根据master信息开启同步设置 #其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size(可通过命令show master logs查看) [root@mysql-slave ~]# mysql MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.17', MASTER_USER='repluser', MASTER_PASSWORD='', MASTER_PORT=3306, MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=29733, MASTER_CONNECT_RETRY=10; #开启slave MariaDB [(none)]> start slave; #显示状态信息 MariaDB [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.17 Master_User: repluser Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mariadb-bin.000002 Read_Master_Log_Pos: 245 Relay_Log_File: mariadb-relay-bin.000003 Relay_Log_Pos: 531 Relay_Master_Log_File: mariadb-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes ...... Master_Server_Id: 17
官网下载地址:https://www.percona.com/downloads/
安装包:https://www.percona.com/downloads/percona-monitoring-plugins/LATEST/
#下载 wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/redhat/7/x86_64/percona-zabbix-templates-1.1.8-1.noarch.rpm #安装 yum install -y percona-zabbix-templates-1.1.8-1.noarch.rpm #安装php yum install -y php php-mysql #复制模板 cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix_agentd.conf.d/ #创建连接mysql数据库的php配置文件 vim /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf <?php $mysql_user = 'root'; $mysql_pass = ''; #重启 systemctl restart zabbix-agent
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.17 -p 10050 -k MySQL.Key-reads 19 [root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.27 -p 10050 -k MySQL.Key-reads 0
关联主机模板
注意:默认的模板/var/lib/zabbix/percona/templates/zabbix_agent_template_percona_mysql_server_ht_2.0.9-sver1.1.8.xml不可用,需要进行修改。
模板参考:siyuan://blocks/20220715151809-f0mrj0m
验证监控
1. 主动模式下监控数据正常,但ZBX图标为灰色未变绿
解决方法:将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理,再添加Template Module Zabbix agent模板。
ZBX图标变绿
[root@nginx tmp]# grep '^[a-Z]' /etc/zabbix_agentd.conf PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 EnableRemoteCommands=1 #开启远程执行命令功能 Server=10.0.0.100 ListenPort=10050 StartAgents=3 ServerActive=10.0.0.100 Hostname=10.0.0.7 User=zabbix UnsafeUserParameters=1 #允许远程执行命令的时候使用不安全的参数(特殊的字符串) Include=/etc/zabbix_agentd.conf.d/*.conf
[root@nginx ~]# vim /etc/sudoers ...... root ALL=(ALL) ALL zabbix ALL=NOPASSWD:ALL #授权zabbix用户执行特殊命令不再需要密码,比如sudo命令
重启服务
systemctl restart zabbix-agent
添加具体操作指令
远程执行的命令要写绝对路径
进入个人邮箱,开启SMTP功能
发短信获取授权码
设置邮箱参考:https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=371
密码是前面获取的授权码
选择Admin用户
选择报警媒介,点击添加
类型选择前面创建的报警媒介,收件人选择要发送信息的对象
更新报警媒介
发送故障时的邮件通告内容
恢复后的邮件通告内容
查看80端口
nginx自动恢复