MHA:Master High Availability,对主节点进行监控,可实现自动故障转移至其它从节点;通过提升某一从节点为新的主节点,基于主从复制实现,还需要客户端配合实现,目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库。
MHA工作原理:
MHA利用 SELECT 1 As Value 指令判断master服务器的健康性,一旦master 宕机,MHA 从宕机崩溃的master保存二进制日志事件(binlog events)
识别含有最新更新的slave
应用差异的中继日志(relay log)到其他的slave
应用从master保存的二进制日志事件(binlog events)
提升一个slave为新的master
使其他的slave连接新的master进行复制
MHA软件
MHA软件由两部分组成,Manager工具包和Node工具包
Manager工具包主要包括以下几个工具:
masterha_check_ssh masterha_check_repl masterha_manger masterha_check_status masterha_master_monitor masterha_master_switch masterha_conf_host masterha_stop --conf=app1.cnf masterha_secondary_check | #检查MHA的SSH配置状况 #检查MySQL复制状况 #启动MHA #检测当前MHA运行状态 #检测master是否宕机 #故障转移(自动或手动) #添加或删除配置的server信息 #停止MHA #两个或多个网络线路检查MySQL主服务器的可用 |
Node工具包:
这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs apply_diff_relay_logs filter_mysqlbinlog purge_relay_logs | #保存和复制master的二进制日志 #识别差异的中继日志事件并将其差异的事件应用于其他的slave #去除不必要的ROLLBACK事件(MHA已不再使用此工具) #清除中继日志(不会阻塞SQL线程) |
MHA自定义扩展:
secondary_check_script master_ip_ailover_script shutdown_script report_script init_conf_load_script master_ip_online_change_script | #通过多条网络路由检测master的可用性 #更新Application使用的masterip #强制关闭master节点 #发送报告 #加载初始配置参数 #更新master节点ip地址 |
MHA配置文件:
global配置,为各application提供默认配置,默认文件路径 /etc/masterha_default.cnf
application配置:为每个主从复制集群
实现MHA实战案例
环境:四台主机
10.0.0.7 CentOS7 MHA管理端
10.0.0.8 CentOS8 MySQL8.0 Master
10.0.0.18 CentOS8 MySQL8.0 Slave1
10.0.0.28 CentOS8 MySQL8.0 Slave2
1.在管理节点上安装两个包mha4mysql-manager和mha4mysql-node
说明:
mha4mysql-manager-0.56-0.el6.noarch.rpm 不支持CentOS 8,只支持CentOS7 以下版本
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 支持MySQL5.7和MySQL8.0 ,但和CentOS8版本上的Mariadb -10.3.17不兼容
两个安装包:
mha4mysql-manager
mha4mysql-node
管理端安装两个RPM包:
[root@MHA-Manager ~]# ll total 144 -rw-------. 1 root root 1764 Apr 24 12:19 anaconda-ks.cfg -rw-r--r-- 1 root root 0 Apr 28 11:20 a.txt -rw-r--r-- 1 root root 587 May 12 21:13 CentOS-8.repo -rw-r--r-- 1 root root 920 Apr 29 14:50 f1.txt -rw-r--r-- 1 root root 1601 Apr 29 14:50 f2.txt -rw-r--r-- 1 root root 1076 May 2 09:38 ks-centos8.cfg -rw-r--r-- 1 root root 949 Apr 28 14:35 ks.cfg -rw-r--r-- 1 root root 81024 Jul 30 2020 mha4mysql-manager-0.58-0.el7.centos.noarch.rpm -rw-r--r-- 1 root root 36328 Jul 30 2020 mha4mysql-node-0.58-0.el7.centos.noarch.rpm -rw-r--r-- 1 root root 41 Apr 27 08:50 test.sh drwxr-xr-x. 2 root root 220 Apr 24 14:46 yum #这里一定要先装mha4mysql-node,然后再装mha4mysql-manager [root@MHA-Manager ~]# yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm [root@MHA-Manager ~]# yum install -y mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
2.在所有MySQL服务器上安装mha4mysql-node包
[root@Master ~]# yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm [root@Slave1 ~]# yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm [root@Slave2 ~]# yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
3.在所有节点实现相互之间ssh key验证
[root@MHA-Manager ~]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:C9ovIIe3BWcMBFTZgs0pi9ECVq7166py11oDCLgWfNM root@MHA-Manager.magedu.com The key's randomart image is: +---[RSA 2048]----+ |o++Oo+ | |* = O . | |o* B E | |.o*.+ + | |.o...=. S | |. o ++o. . | | +.*+ . | |. . +.oo | |.o.ooo .. | +----[SHA256]-----+ [root@MHA-Manager ~]# ssh-copy-id 127.0.0.1 [root@MHA-Manager ~]# rsync -av .ssh 10.0.0.8:/root/ [root@MHA-Manager ~]# rsync -av .ssh 10.0.0.18:/root/ [root@MHA-Manager ~]# rsync -av .ssh 10.0.0.28:/root/
4.在管理节点建立配置文件
[root@MHA-Manager ~]# mkdir /etc/mastermha/ [root@MHA-Manager ~]# vim /etc/mastermha/app1.conf [server default] user=mhauser #用于远程连接MySQL所有节点的用户,需要有管理员的权限 password=magedu manager_workdir=/data/mastermha/app1/ #目录会自动生成,无需手动创建 manager_log=/data/mastermha/app1/manager.log remote_workdir=/data/mastermha/app1/ ssh_user=root #用于实现远程ssh基于KEY的连接,访问二进制日志 repl_user=repluser #主从复制的用户信息 repl_password=magedu ping_interval=1 #健康性检查的时间间隔 master_ip_failover_script=/usr/local/bin/master_ip_failover #切换VIP的perl脚本 report_script=/usr/local/bin/sendmail.sh #当执行报警脚本 check_repl_delay=0 #默认值为1,表示如果slave中从库落后主库relay log超过100M,主库不会选择这个从库为新的master,因为这个从库进行恢复需要很长的时间.通过设置参数check_repl_delay=0,mha触发主从切换时会忽略复制的延时,对于设置candidate_master=1的从库非常有用,这样确保这个从库一定能成为最新的master master_binlog_dir=/data/mysql/ #指定二进制日志存放的目录,mha4mysql-manager-0.58必须指定,之前版本不需要指定 [server1] hostname=10.0.0.8 candidate_master=1 [server2] hostname=10.0.0.18 candidate_master=1 #设置为优先候选master,即使不是集群中事件最新的slave,也会优先当master [server3] hostname=10.0.0.28 #最终文件内容 [root@MHA-Manager ~]# cat /etc/mastermha/app1.conf [server default] user=mhauser password=magedu manager_workdir=/data/mastermha/app1/ manager_log=/data/mastermha/app1/manager.log remote_workdir=/data/mastermha/app1/ ssh_user=root repl_user=repluser repl_password=magedu ping_interval=1 master_ip_failover_script=/usr/local/bin/master_ip_failover report_script=/usr/local/bin/sendmail.sh check_repl_delay=0 master_binlog_dir=/data/mysql/ [server1] hostname=10.0.0.8 candidate_master=1 [server2] hostname=10.0.0.18 candidate_master=1 [server3] hostname=10.0.0.28 [root@MHA-Manager ~]#
说明:主库宕机谁来接管新的Master
1. 所有从节点日志都是一致的,默认会以配置文件的顺序去选择一个新主
2. 从节点日志不一致,自动选择最接近于主库的从库充当新主
3. 如果对于某节点设定了权重(candidate_master=1),权重节点会优先选择。但是此节点日志量落后主库超过100M日志的话,也不会被选择。可以配合check_repl_delay=0,关闭日志量的检查,强制选择候选节点
5.相关脚本
[root@MHA-Manager ~]# cat /usr/local/bin/sendmail.sh #!/bin/bash echo "MySQL is down" | mail -s "MHA Warning" 15762354477@139.com [root@MHA-Manager ~]# chmod +x /usr/local/bin/sendmail.sh [root@MHA-Manager ~]# vim /usr/local/bin/master_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); #执行时必须删除下面三行注释 my $vip = '10.0.0.100/24';#设置Virtual IP my $gateway = '10.0.0.254';#网关Gateway IP my $interface = 'eth0';#指定VIP所在网卡 my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1"; my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # # If you manage master ip address at global catalog database, # # activate new_master_ip here. # # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status -- orig_master_host=host --orig_master_ip=ip --orig_master_port=port -- new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } #最终文件内容 [root@MHA-Manager ~]# cat /usr/local/bin/master_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '10.0.0.100/24'; my $gateway = '10.0.0.254'; my $interface = 'eth0'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1"; my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # # If you manage master ip address at global catalog database, # # activate new_master_ip here. # # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status -- orig_master_host=host --orig_master_ip=ip --orig_master_port=port -- new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } [root@MHA-Manager ~]# chmod +x /usr/local/bin/master_ip_failover [root@MHA-Manager ~]#
6.实现Master
[root@Master ~]# yum install -y mysql-server [root@Master ~]# mkdir /data/mysql/ [root@Master ~]# chown mysql:mysql /data/mysql/ [root@Master ~]# vim /etc/my.cnf.d/mysql-server.cnf [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-error=/var/log/mysql/mysqld.log pid-file=/run/mysqld/mysqld.pid server-id=8 log-bin=/data/mysql/mysql-bin skip_name_resolve=1 general_log #观察结果,非必须项,生产无需启用 [root@Master ~]# systemctl enable --now mysqld.service [root@Master ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.21 Source distribution Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show master logs; +------------------+-----------+-----------+ | Log_name | File_size | Encrypted | +------------------+-----------+-----------+ | mysql-bin.000001 | 179 | No | | mysql-bin.000002 | 1201 | No | +------------------+-----------+-----------+ 2 rows in set (0.00 sec) mysql> create user repluser@'10.0.0.%' identified by 'magedu'; Query OK, 0 rows affected (0.01 sec) mysql> grant replication slave on *.* to repluser@'10.0.0.%' ; Query OK, 0 rows affected (0.00 sec) mysql> create user mhauser@'10.0.0.%' identified by 'magedu'; Query OK, 0 rows affected (0.01 sec) mysql> grant all on *.* to mhauser@'10.0.0.%' ; Query OK, 0 rows affected (0.01 sec) mysql> select user,host from mysql.user; +------------------+-----------+ | user | host | +------------------+-----------+ | mhauser | 10.0.0.% | | repluser | 10.0.0.% | | mysql.infoschema | localhost | | mysql.session | localhost | | mysql.sys | localhost | | root | localhost | +------------------+-----------+ 6 rows in set (0.00 sec) mysql> quit Bye #配置VIP [root@Master ~]# ifconfig eth0:1 10.0.0.100/24 [root@Master ~]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.8 netmask 255.255.255.0 broadcast 10.0.0.255 inet6 fe80::809c:7c3f:dc61:53bb prefixlen 64 scopeid 0x20<link> ether 00:0c:29:0a:08:a3 txqueuelen 1000 (Ethernet) RX packets 67605 bytes 91484149 (87.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 31053 bytes 2925668 (2.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.100 netmask 255.255.255.0 broadcast 10.0.0.255 ether 00:0c:29:0a:08:a3 txqueuelen 1000 (Ethernet) lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@Master ~]#
7.实现Slave
[root@Slave1 ~]# yum install -y mysql-server [root@Slave1 ~]# mkdir /data/mysql/ [root@Slave1 ~]# chown mysql:mysql /data/mysql/ [root@Slave1 ~]# vim /etc/my.cnf.d/mysql-server.cnf [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-error=/var/log/mysql/mysqld.log pid-file=/run/mysqld/mysqld.pid server_id=18 #不同节点此值各不相同 log-bin=/data/mysql/mysql-bin read_only relay_log_purge=0 skip_name_resolve=1 #禁止反向解析 general_log #方便观察的设置,生产无需启用 [root@Slave1 ~]# systemctl enable --now mysqld.service [root@Slave1 ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.21 Source distribution Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CHANGE MASTER TO -> MASTER_HOST='10.0.0.100', -> MASTER_USER='repluser', -> MASTER_PASSWORD='magedu', -> MASTER_LOG_FILE='mysql-bin.000002', -> MASTER_LOG_POS=1201; Query OK, 0 rows affected, 2 warnings (0.05 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.100 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 1201 Relay_Log_File: Slave1-relay-bin.000002 Relay_Log_Pos: 324 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1201 Relay_Log_Space: 534 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 8 Master_UUID: bb4f4671-baa2-11eb-8cf4-000c290a08a3 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set (0.00 sec) mysql> quit Bye [root@Slave1 ~]# [root@Slave2 ~]# yum install -y mysql-server [root@Slave2 ~]# mkdir -p /data/mysql/ [root@Slave2 ~]# chown mysql:mysql /data/mysql/ [root@Slave2 ~]# vim /etc/my.cnf.d/mysql-server.cnf [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-error=/var/log/mysql/mysqld.log pid-file=/run/mysqld/mysqld.pid server_id=28 #不同节点此值各不相同 log-bin=/data/mysql/mysql-bin read_only relay_log_purge=0 skip_name_resolve=1 #禁止反向解析 general_log #方便观察的设置,生产无需启用 [root@Slave2 ~]# systemctl enable --now mysqld.service [root@Slave2 ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.21 Source distribution Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CHANGE MASTER TO -> MASTER_HOST='10.0.0.100', -> MASTER_USER='repluser', -> MASTER_PASSWORD='magedu', -> MASTER_LOG_FILE='mysql-bin.000002', -> MASTER_LOG_POS=1201; Query OK, 0 rows affected, 2 warnings (0.03 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.100 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 1201 Relay_Log_File: Slave2-relay-bin.000002 Relay_Log_Pos: 324 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1201 Relay_Log_Space: 534 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 8 Master_UUID: bb4f4671-baa2-11eb-8cf4-000c290a08a3 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set (0.00 sec) mysql> quit Bye [root@Slave2 ~]#
8.检查MHA的环境
[root@MHA-Manager ~]# masterha_check_ssh --conf=/etc/mastermha/app1.conf Sat May 22 10:40:36 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat May 22 10:40:36 2021 - [info] Reading application default configuration from /etc/mastermha/app1.conf.. Sat May 22 10:40:36 2021 - [info] Reading server configuration from /etc/mastermha/app1.conf.. Sat May 22 10:40:36 2021 - [info] Starting SSH connection tests.. Sat May 22 10:40:37 2021 - [debug] Sat May 22 10:40:36 2021 - [debug] Connecting via SSH from root@10.0.0.8(10.0.0.8:22) to root@10.0.0.18(10.0.0.18:22).. Sat May 22 10:40:36 2021 - [debug] ok. Sat May 22 10:40:36 2021 - [debug] Connecting via SSH from root@10.0.0.8(10.0.0.8:22) to root@10.0.0.28(10.0.0.28:22).. Warning: Permanently added '10.0.0.28' (ECDSA) to the list of known hosts. Sat May 22 10:40:37 2021 - [debug] ok. Sat May 22 10:40:37 2021 - [debug] Sat May 22 10:40:36 2021 - [debug] Connecting via SSH from root@10.0.0.18(10.0.0.18:22) to root@10.0.0.8(10.0.0.8:22).. Sat May 22 10:40:37 2021 - [debug] ok. Sat May 22 10:40:37 2021 - [debug] Connecting via SSH from root@10.0.0.18(10.0.0.18:22) to root@10.0.0.28(10.0.0.28:22).. Sat May 22 10:40:37 2021 - [debug] ok. Sat May 22 10:40:38 2021 - [debug] Sat May 22 10:40:37 2021 - [debug] Connecting via SSH from root@10.0.0.28(10.0.0.28:22) to root@10.0.0.8(10.0.0.8:22).. Sat May 22 10:40:37 2021 - [debug] ok. Sat May 22 10:40:37 2021 - [debug] Connecting via SSH from root@10.0.0.28(10.0.0.28:22) to root@10.0.0.18(10.0.0.18:22).. Sat May 22 10:40:38 2021 - [debug] ok. Sat May 22 10:40:38 2021 - [info] All SSH connection tests passed successfully. [root@MHA-Manager ~]# masterha_check_repl --conf=/etc/mastermha/app1.conf Sat May 22 10:43:06 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat May 22 10:43:06 2021 - [info] Reading application default configuration from /etc/mastermha/app1.conf.. Sat May 22 10:43:06 2021 - [info] Reading server configuration from /etc/mastermha/app1.conf.. Sat May 22 10:43:06 2021 - [info] MHA::MasterMonitor version 0.58. Creating directory /data/mastermha/app1/.. done. Sat May 22 10:43:06 2021 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 10.0.0.18(10.0.0.18:3306) :1130:Host '10.0.0.7' is not allowed to connect to this MySQL server, but this is not a MySQL crash. Check MySQL server settings. Sat May 22 10:43:06 2021 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 10.0.0.28(10.0.0.28:3306) :1130:Host '10.0.0.7' is not allowed to connect to this MySQL server, but this is not a MySQL crash. Check MySQL server settings. Sat May 22 10:43:06 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297. Sat May 22 10:43:06 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297. Sat May 22 10:43:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations Sat May 22 10:43:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329. Sat May 22 10:43:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers. Sat May 22 10:43:07 2021 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK! #这里Slave从节点的健康性检查失败是因为Slave从节点的数据库中不存在repluser和mhauser用户,造成管理节点连不到两个从节点服务器,那么这里我在两个从节点上创建这两个账户用户并进行授权 [root@Slave1 ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 12 Server version: 8.0.21 Source distribution Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create user repluser@'10.0.0.%' identified by 'magedu'; Query OK, 0 rows affected (0.01 sec) mysql> grant replication slave on *.* to repluser@'10.0.0.%' ; Query OK, 0 rows affected (0.00 sec) mysql> create user mhauser@'10.0.0.%' identified by 'magedu'; Query OK, 0 rows affected (0.01 sec) mysql> grant all on *.* to mhauser@'10.0.0.%' ; Query OK, 0 rows affected (0.00 sec) mysql> select user,host from mysql.user; +------------------+-----------+ | user | host | +------------------+-----------+ | mhauser | 10.0.0.% | | repluser | 10.0.0.% | | mysql.infoschema | localhost | | mysql.session | localhost | | mysql.sys | localhost | | root | localhost | +------------------+-----------+ 6 rows in set (0.00 sec) mysql> quit Bye [root@Slave2 ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 12 Server version: 8.0.21 Source distribution Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create user repluser@'10.0.0.%' identified by 'magedu'; Query OK, 0 rows affected (0.01 sec) mysql> grant replication slave on *.* to repluser@'10.0.0.%' ; Query OK, 0 rows affected (0.00 sec) mysql> create user mhauser@'10.0.0.%' identified by 'magedu'; Query OK, 0 rows affected (0.01 sec) mysql> grant all on *.* to mhauser@'10.0.0.%' ; Query OK, 0 rows affected (0.00 sec) mysql> select user,host from mysql.user; +------------------+-----------+ | user | host | +------------------+-----------+ | mhauser | 10.0.0.% | | repluser | 10.0.0.% | | mysql.infoschema | localhost | | mysql.session | localhost | | mysql.sys | localhost | | root | localhost | +------------------+-----------+ 6 rows in set (0.00 sec) mysql> quit Bye #重新检查环境,检查正常 [root@MHA-Manager ~]# masterha_check_repl --conf=/etc/mastermha/app1.conf Sat May 22 15:42:57 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat May 22 15:42:57 2021 - [info] Reading application default configuration from /etc/mastermha/app1.conf.. Sat May 22 15:42:57 2021 - [info] Reading server configuration from /etc/mastermha/app1.conf.. Sat May 22 15:42:57 2021 - [info] MHA::MasterMonitor version 0.58. Sat May 22 15:42:58 2021 - [info] GTID failover mode = 0 Sat May 22 15:42:58 2021 - [info] Dead Servers: Sat May 22 15:42:58 2021 - [info] Alive Servers: Sat May 22 15:42:58 2021 - [info] 10.0.0.8(10.0.0.8:3306) Sat May 22 15:42:58 2021 - [info] 10.0.0.18(10.0.0.18:3306) Sat May 22 15:42:58 2021 - [info] 10.0.0.28(10.0.0.28:3306) Sat May 22 15:42:58 2021 - [info] Alive Slaves: Sat May 22 15:42:58 2021 - [info] 10.0.0.18(10.0.0.18:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled Sat May 22 15:42:58 2021 - [info] Replicating from 10.0.0.8(10.0.0.8:3306) Sat May 22 15:42:58 2021 - [info] Primary candidate for the new Master (candidate_master is set) Sat May 22 15:42:58 2021 - [info] 10.0.0.28(10.0.0.28:3306) Version=8.0.21 (oldest major version between slaves) log-bin:enabled Sat May 22 15:42:58 2021 - [info] Replicating from 10.0.0.8(10.0.0.8:3306) Sat May 22 15:42:58 2021 - [info] Current Alive Master: 10.0.0.8(10.0.0.8:3306) Sat May 22 15:42:58 2021 - [info] Checking slave configurations.. Sat May 22 15:42:58 2021 - [info] Checking replication filtering settings.. Sat May 22 15:42:58 2021 - [info] binlog_do_db= , binlog_ignore_db= Sat May 22 15:42:58 2021 - [info] Replication filtering check ok. Sat May 22 15:42:58 2021 - [info] GTID (with auto-pos) is not supported Sat May 22 15:42:58 2021 - [info] Starting SSH connection tests.. Sat May 22 15:43:00 2021 - [info] All SSH connection tests passed successfully. Sat May 22 15:43:00 2021 - [info] Checking MHA Node version.. Sat May 22 15:43:01 2021 - [info] Version check ok. Sat May 22 15:43:01 2021 - [info] Checking SSH publickey authentication settings on the current master.. Sat May 22 15:43:01 2021 - [info] HealthCheck: SSH to 10.0.0.8 is reachable. Sat May 22 15:43:01 2021 - [info] Master MHA Node version is 0.58. Sat May 22 15:43:01 2021 - [info] Checking recovery script configurations on 10.0.0.8(10.0.0.8:3306).. Sat May 22 15:43:01 2021 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/ --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000007 Sat May 22 15:43:01 2021 - [info] Connecting to root@10.0.0.8(10.0.0.8:22).. Creating /data/mastermha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysql/, up to mysql-bin.000007 Sat May 22 15:43:01 2021 - [info] Binlog setting check done. Sat May 22 15:43:01 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sat May 22 15:43:01 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=10.0.0.18 --slave_ip=10.0.0.18 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=8.0.21 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=Slave1-relay-bin.000008 --slave_pass=xxx Sat May 22 15:43:01 2021 - [info] Connecting to root@10.0.0.18(10.0.0.18:22).. Checking slave recovery environment settings.. Relay log found at /var/lib/mysql, up to Slave1-relay-bin.000008 Temporary relay log file is /var/lib/mysql/Slave1-relay-bin.000008 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat May 22 15:43:02 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=10.0.0.28 --slave_ip=10.0.0.28 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=8.0.21 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=Slave2-relay-bin.000010 --slave_pass=xxx Sat May 22 15:43:02 2021 - [info] Connecting to root@10.0.0.28(10.0.0.28:22).. Checking slave recovery environment settings.. Relay log found at /var/lib/mysql, up to Slave2-relay-bin.000010 Temporary relay log file is /var/lib/mysql/Slave2-relay-bin.000010 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat May 22 15:43:02 2021 - [info] Slaves settings check done. Sat May 22 15:43:02 2021 - [info] 10.0.0.8(10.0.0.8:3306) (current master) +--10.0.0.18(10.0.0.18:3306) +--10.0.0.28(10.0.0.28:3306) Sat May 22 15:43:02 2021 - [info] Checking replication health on 10.0.0.18.. Sat May 22 15:43:02 2021 - [info] ok. Sat May 22 15:43:02 2021 - [info] Checking replication health on 10.0.0.28.. Sat May 22 15:43:02 2021 - [info] ok. Sat May 22 15:43:02 2021 - [info] Checking master_ip_failover_script status: Sat May 22 15:43:02 2021 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.8 --orig_master_ip=10.0.0.8 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.0.100/24;/sbin/arping -I eth0 -c 3 -s 10.0.0.100/24 10.0.0.254 >/dev/null 2>&1=== Checking the Status of the script.. OK Sat May 22 15:43:03 2021 - [info] OK. Sat May 22 15:43:03 2021 - [warning] shutdown_script is not defined. Sat May 22 15:43:03 2021 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. [root@MHA-Manager ~]#
9.启动MHA
开启MHA,默认是前台运行,生产环境一般为后台执行
[root@MHA-Manager ~]# nohup masterha_manager --conf=/etc/mastermha/app1.conf &> /dev/null
查看状态
[root@MHA-Manager ~]# masterha_check_status --conf=/etc/mastermha/app1.conf app1 (pid:1321) is running(0:PING_OK), master:10.0.0.8
在Master服务器上进行健康性检查
[root@Master ~]# tail -f /var/lib/mysql/Master.log 2021-05-22T08:08:15.494093Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:16.494128Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:17.496217Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:18.498052Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:19.500948Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:20.504335Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:21.507257Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:22.507209Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:23.507075Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:24.509524Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:25.510962Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:26.511363Z 24 QuerySELECT 1 As Value 2021-05-22T08:08:27.513529Z 24 QuerySELECT 1 As Value
10.模拟故障
停掉Master服务器的MySQL服务
[root@Master ~]# systemctl stop mysqld.service
查看状态
[root@MHA-Manager ~]# masterha_check_status --conf=/etc/mastermha/app1.conf app1 is stopped(2:NOT_RUNNING).
验证VIP漂移至新的Master上
[root@Slave1 ~]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.18 netmask 255.255.255.0 broadcast 10.0.0.255 inet6 fe80::5a87:1464:beb7:38 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:9d:3e:68 txqueuelen 1000 (Ethernet) RX packets 70084 bytes 88687133 (84.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 35950 bytes 3857649 (3.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.100 netmask 255.255.255.0 broadcast 10.0.0.255 ether 00:0c:29:9d:3e:68 txqueuelen 1000 (Ethernet) lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 920 bytes 136130 (132.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 920 bytes 136130 (132.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@Slave1 ~]#
文章写得有不恰当之处,请多见谅。