MHA(Master High Availability)是一款开源的 MySQL 的高可用解决方案,基于perl语言实现,由日本DeNA公司开发。它为 MySQL 主从复制架构提供了 Automating Master Failover 功能。在MySQL故障切换过程中,MHA能做到0-30s之内自动完成数据库的故障切换操作,并且在进行故障切换过程中,MHA能最大程度保证数据库的一致性,以达到真正意义上的高可用。
MHA 高可用方案由两个部分构成:
MHA Manager(管理节点):Manager可以独立部署在一台独立的机器上管理多个Master-Slave集群,也可以部署在一台Slave上。当Master出现故障时,它可以自动将最新数据的Slave提升为Master,然后将所有其他的Slave重新指向新的Master。整个故障切换对应用程序是完全透明的。
Manager工具包主要包括以下几个工具: masterha_manger 启动mamager的脚本 masterha_stop 关闭manager的脚本 masterha_check_ssh 检查MHA的SSH配置状况 masterha_check_repl 检查MySQL主从复制状况 masterha_check_status 检测当前MHA运行状态 masterha_master_monitor 检测master是否宕机 masterha_master_switch 控制故障转移(自动或者手动) masterha_conf_host 添加或删除配置的server信息
MHA Node(数据节点):所有节点都要安装
Node工具包主要包括以下几个工具(这些工具通常由MHA Manager的脚本触发,无需人为操作): save_binary_logs 保存和复制master的二进制日志 apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的Slave purge_relay_logs 清除中继日志(不会阻塞SQL线程)
MHA适用场景:目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群必须最少有3台数据库服务器(一主二从,一台充当Master、一台充当备用Master、一台充当从库)。出于成本考虑,淘宝在此基础上进行了改造,目前淘宝开发的TMHA已经支持一主一从。
MHA监控节点 (通过配置文件获取所有节点信息)
系统、网络、SSH连接性、主从状态(重点是主库)
如果主库宕机,MHA处理过程如下:
2.1 选主:
(1) 如果判断从库(position或者GTID),数据有差异,最接近于Master的slave,成为备选主。
(2) 如果判断从库(position或者GTID),数据一致,按照配置文件顺序选主。
(3) 如果设定有权重(candidate_master=1),按照权重强制指定备选主。
a. 默认情况下如果一个slave落后master 100M的relay logs的话,即使有权重,也会失效。
b. 如果check_repl_delay=0的话,即使落后很多日志,也强制选择其为备选主。
2.2 数据补偿
(1) 当SSH能连接时,从库对比主库GTID 或者position号,立即将二进制日志保存至各个从节点并且应用(save_binary_logs脚本 )
(2) 当SSH不能连接时,MHA无法保存二进制文件,只能进行故障转移但是丢失最新数据,可以对比从库之间的relaylog的差异(apply_diff_relay_logs脚本)
2.3 Failover
将故障节点踢出集群,备选主进行身份切换,并对外提供服务;其余从库和新主库确认新的主从关系。
其他功能:
主库:192.168.1.5
db01 [(none)]>show processlist ; +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | 4 | root | localhost | NULL | Query | 0 | starting | show processlist | | 5 | repl | 192.168.1.6:41668 | NULL | Binlog Dump GTID | 31 | Master has sent all binlog to slave; waiting for more updates | NULL | | 6 | repl | 192.168.1.7:38244 | NULL | Binlog Dump GTID | 27 | Master has sent all binlog to slave; waiting for more updates | NULL | +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+ 3 rows in set (0.00 sec)
从库:192.168.1.6
db02 [(none)]>show slave status\G ; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.5 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 194 Relay_Log_File: slave01-relay-bin.000002 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes
从库:192.168.1.7
db03 [(none)]>show slave status \G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.5 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 194 Relay_Log_File: slave02-relay-bin.000002 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes
由于MHA软件脚本中写的是命令的绝对路径/usr/bin/mysqlbinlog与/usr/bin/mysql,因此为了使用MHA,就要为mysqlbinlog和mysql可执行文件建立软链接。
[root@localhost ~]# ln -s /data/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog [root@localhost ~]# ln -s /data/mysql/bin/mysql /usr/bin/mysql
各节点生成密钥
[root@master ~]# ssh-keygen -t rsa [root@slave01 ~]# ssh-keygen -t rsa [root@slave02 ~]# ssh-keygen -t rsa
将各节点的公钥文件汇总到一个总的认证文件authorized_keys中
[root@slave01 ~]# scp authorized_keys 192.168.1.5:/root/a1_keys [root@slave02 ~]# scp authorized_keys 192.168.1.5:/root/a2_keys [root@master ~]# cat /root/a1_keys >> authorized_keys [root@master ~]# cat /root/a2_keys >> authorized_keys
分发汇总的公钥文件
[root@master ~]# cp authorized_keys .ssh/ [root@master ~]# scp authorized_keys 192.168.1.6:/root/.ssh/ [root@master ~]# scp authorized_keys 192.168.1.7:/root/.ssh/
测试
[root@master ~]# ssh 192.168.1.5 date Mon Oct 4 20:09:48 CST 2021 [root@master ~]# ssh 192.168.1.6 date Mon Oct 4 20:09:51 CST 2021 [root@master ~]# ssh 192.168.1.7 date Mon Oct 4 20:09:53 CST 2021 [root@slave01 ~]# ssh 192.168.1.5 date Mon Oct 4 20:10:13 CST 2021 [root@slave01 ~]# ssh 192.168.1.6 date Mon Oct 4 20:10:16 CST 2021 [root@slave01 ~]# ssh 192.168.1.7 date Mon Oct 4 20:10:18 CST 2021 [root@slave02 ~]# ssh 192.168.1.5 date Mon Oct 4 20:10:43 CST 2021 [root@slave02 ~]# ssh 192.168.1.6 date Mon Oct 4 20:10:44 CST 2021 [root@slave02 ~]# ssh 192.168.1.7 date Mon Oct 4 20:10:47 CST 2021
下载MHA软件
[root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm [root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
各节点安装node依赖包
[root@slave01 ~]# yum install -y perl-DBD-MySQL
各节点安装node软件
[root@master ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm [root@slave01 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm [root@slave02 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
db03节点安装manager依赖包
[root@slave02 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
db03节点安装manager软件
[root@slave02 ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
创建MHA专用监控管理用户(主库)
db01 [(none)]>grant all privileges on *.* to mha@'192.168.1.%' identified by 'mha';
准备manager配置文件
# 创建配置文件目录 [root@slave02 ~]# mkdir -p /etc/mha # 创建日志目录 [root@slave02 ~]# mkdir -p /var/log/mha/mysql # 编辑MHA配置文件 [root@slave02 ~]# vim /etc/mha/mysql.cnf [server default] # 总配置信息 manager_log=/var/log/mha/mysql/manager # 日志信息 manager_workdir=/var/log/mha/mysql # 存放日志的目录 master_binlog_dir=/data/binlog # 主库二进制日志的位置 user=mha # 专用mha监控用户 password=mha # mha密码 ping_interval=2 # 探测节点状态的时间间隔 repl_user=repl # 主从复制用户 (构建新的主从环境change master to需要主从复制用户和密码) repl_password=123 # 主从复制密码 ssh_user=root # ssh连接用户 [server1] # 节点1 hostname=192.168.1.5 port=3306 [server2] # 节点2 hostname=192.168.1.6 port=3306 [server3] # 节点3 hostname=192.168.1.7 port=3306
主库宕机谁来接管?
额外参数:
ping_interval=1:设置监控主库发送ping包的时间间隔,尝试三次没有回应的时候自动进行failover
candidate_master=1:设为候选master,设置该参数后发送主从切换会将此库提升为主库,即使该库不是集群中事件最新的slave
check_repl_delay=0:默认情况下如果一个slave落后master 100M的relaylog的话,MHA不会选择该slave作为一个新的master(即使设置了candidate_master=1);通过设置check_repl_delay=0,MHA触发切换在选择一个新的master 的时候会忽略复制延时
candidate_master=1 + check_repl_delay=0可以保证设置candidate_master=1一定是新的master
应用场景:两地三中心、Keepalived实现VIP
互信状态检查
[root@slave02 ~]# masterha_check_ssh --conf=/etc/mha/mysql.cnf Mon Oct 4 21:02:33 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 4 21:02:33 2021 - [info] Reading application default configuration from /etc/mha/mysql.cnf.. Mon Oct 4 21:02:33 2021 - [info] Reading server configuration from /etc/mha/mysql.cnf.. Mon Oct 4 21:02:33 2021 - [info] Starting SSH connection tests.. Mon Oct 4 21:02:40 2021 - [debug] Mon Oct 4 21:02:33 2021 - [debug] Connecting via SSH from root@192.168.1.5(192.168.1.5:22) to root@192.168.1.6(192.168.1.6:22).. Mon Oct 4 21:02:39 2021 - [debug] ok. Mon Oct 4 21:02:39 2021 - [debug] Connecting via SSH from root@192.168.1.5(192.168.1.5:22) to root@192.168.1.7(192.168.1.7:22).. Mon Oct 4 21:02:40 2021 - [debug] ok. Mon Oct 4 21:02:40 2021 - [debug] Mon Oct 4 21:02:33 2021 - [debug] Connecting via SSH from root@192.168.1.6(192.168.1.6:22) to root@192.168.1.5(192.168.1.5:22).. Mon Oct 4 21:02:39 2021 - [debug] ok. Mon Oct 4 21:02:39 2021 - [debug] Connecting via SSH from root@192.168.1.6(192.168.1.6:22) to root@192.168.1.7(192.168.1.7:22).. Mon Oct 4 21:02:40 2021 - [debug] ok. Mon Oct 4 21:02:40 2021 - [debug] Mon Oct 4 21:02:34 2021 - [debug] Connecting via SSH from root@192.168.1.7(192.168.1.7:22) to root@192.168.1.5(192.168.1.5:22).. Mon Oct 4 21:02:39 2021 - [debug] ok. Mon Oct 4 21:02:39 2021 - [debug] Connecting via SSH from root@192.168.1.7(192.168.1.7:22) to root@192.168.1.6(192.168.1.6:22).. Mon Oct 4 21:02:40 2021 - [debug] ok. Mon Oct 4 21:02:40 2021 - [info] All SSH connection tests passed successfully.
主从状态检查
[root@slave02 ~]# masterha_check_repl --conf=/etc/mha/mysql.cnf Mon Oct 4 21:04:31 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 4 21:04:31 2021 - [info] Reading application default configuration from /etc/mha/mysql.cnf.. Mon Oct 4 21:04:31 2021 - [info] Reading server configuration from /etc/mha/mysql.cnf.. Mon Oct 4 21:04:31 2021 - [info] MHA::MasterMonitor version 0.58. Mon Oct 4 21:04:34 2021 - [info] GTID failover mode = 1 Mon Oct 4 21:04:34 2021 - [info] Dead Servers: Mon Oct 4 21:04:34 2021 - [info] Alive Servers: Mon Oct 4 21:04:34 2021 - [info] 192.168.1.5(192.168.1.5:3306) Mon Oct 4 21:04:34 2021 - [info] 192.168.1.6(192.168.1.6:3306) Mon Oct 4 21:04:34 2021 - [info] 192.168.1.7(192.168.1.7:3306) Mon Oct 4 21:04:34 2021 - [info] Alive Slaves: Mon Oct 4 21:04:34 2021 - [info] 192.168.1.6(192.168.1.6:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled Mon Oct 4 21:04:34 2021 - [info] GTID ON Mon Oct 4 21:04:34 2021 - [info] Replicating from 192.168.1.5(192.168.1.5:3306) Mon Oct 4 21:04:34 2021 - [info] 192.168.1.7(192.168.1.7:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled Mon Oct 4 21:04:34 2021 - [info] GTID ON Mon Oct 4 21:04:34 2021 - [info] Replicating from 192.168.1.5(192.168.1.5:3306) Mon Oct 4 21:04:34 2021 - [info] Current Alive Master: 192.168.1.5(192.168.1.5:3306) Mon Oct 4 21:04:34 2021 - [info] Checking slave configurations.. Mon Oct 4 21:04:34 2021 - [info] read_only=1 is not set on slave 192.168.1.6(192.168.1.6:3306). Mon Oct 4 21:04:34 2021 - [info] read_only=1 is not set on slave 192.168.1.7(192.168.1.7:3306). Mon Oct 4 21:04:34 2021 - [info] Checking replication filtering settings.. Mon Oct 4 21:04:34 2021 - [info] binlog_do_db= , binlog_ignore_db= Mon Oct 4 21:04:34 2021 - [info] Replication filtering check ok. Mon Oct 4 21:04:34 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Mon Oct 4 21:04:34 2021 - [info] Checking SSH publickey authentication settings on the current master.. Mon Oct 4 21:04:34 2021 - [info] HealthCheck: SSH to 192.168.1.5 is reachable. Mon Oct 4 21:04:34 2021 - [info] 192.168.1.5(192.168.1.5:3306) (current master) +--192.168.1.6(192.168.1.6:3306) +--192.168.1.7(192.168.1.7:3306) Mon Oct 4 21:04:34 2021 - [info] Checking replication health on 192.168.1.6.. Mon Oct 4 21:04:34 2021 - [info] ok. Mon Oct 4 21:04:34 2021 - [info] Checking replication health on 192.168.1.7.. Mon Oct 4 21:04:34 2021 - [info] ok. Mon Oct 4 21:04:34 2021 - [warning] master_ip_failover_script is not defined. Mon Oct 4 21:04:34 2021 - [warning] shutdown_script is not defined. Mon Oct 4 21:04:34 2021 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
db03开启MHA
[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/mysql/manager.log 2>&1 & [1] 3035
[root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf mysql (pid:3035) is running(0:PING_OK), master:192.168.1.5
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.5 -e "show variables like 'server_id'" mysql: [Warning] Using a password on the command line interface can be insecure. +---------------+-------+ | Variable_name | Value | +---------------+-------+ | server_id | 5 | +---------------+-------+ [root@slave02 ~]# mysql -umha -pmha -h 192.168.1.6 -e "show variables like 'server_id'" mysql: [Warning] Using a password on the command line interface can be insecure. +---------------+-------+ | Variable_name | Value | +---------------+-------+ | server_id | 6 | +---------------+-------+ [root@slave02 ~]# mysql -umha -pmha -h 192.168.1.7 -e "show variables like 'server_id'" mysql: [Warning] Using a password on the command line interface can be insecure. +---------------+-------+ | Variable_name | Value | +---------------+-------+ | server_id | 7 | +---------------+-------
模拟master宕机
[root@master ~]# systemctl stop mysqld
查看db02主从信息
db02 [(none)]>show slave status \G Empty set (0.00 sec) db02 [(none)]>show processlist; +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | 35 | root | localhost | NULL | Query | 1 | starting | show processlist | | 44 | repl | 192.168.1.7:53166 | NULL | Binlog Dump GTID | 150 | Master has sent all binlog to slave; waiting for more updates | NULL | +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+ 2 rows in set (0.12 sec)
查看MHA manager主机,manager停止
[root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf mysql is stopped(2:NOT_RUNNING).
查看MHA manager日志信息
[root@slave02 ~]# tail -f /var/log/mha/mysql/manager ... ----- Failover Report ----- mysql: MySQL Master failover 192.168.1.5(192.168.1.5:3306) to 192.168.1.6(192.168.1.6:3306) succeeded Master 192.168.1.5(192.168.1.5:3306) is down! Check MHA Manager logs at slave02:/var/log/mha/mysql/manager for details. Started automated(non-interactive) failover. Selected 192.168.1.6(192.168.1.6:3306) as a new master. 192.168.1.6(192.168.1.6:3306): OK: Applying all logs succeeded. 192.168.1.7(192.168.1.7:3306): OK: Slave started, replicating from 192.168.1.6(192.168.1.6:3306) 192.168.1.6(192.168.1.6:3306): Resetting slave info succeeded. Master failover to 192.168.1.6(192.168.1.6:3306) completed successfully.
查看MHA manager配置文件,可以发现server1已经没了
[root@slave02 ~]# vim /etc/mha/mysql.cnf [server default] manager_log=/var/log/mha/mysql/manager manager_workdir=/var/log/mha/mysql master_binlog_dir=/data/binlog password=mha ping_interval=2 repl_password=123 repl_user=repl ssh_user=root user=mha [server2] hostname=192.168.1.6 port=3306 [server3] hostname=192.168.1.7 port=3306
可见MHA已经自动处理了故障(但是只是一次性处理).
修复db01
[root@master ~]# systemctl start mysqld
修复主从
查看manager日志中change master to命令
[root@slave02 ~]# vim /var/log/mha/mysql/manager CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
在db01中执行该命令,并启动slave
db01 [(none)]>CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123'; Query OK, 0 rows affected, 2 warnings (0.16 sec) db01 [(none)]>start slave; Query OK, 0 rows affected (0.14 sec)
修改db03上的manager配置文件,将db01加入
[root@slave02 ~]# vim /etc/mha/mysql.cnf [server1] hostname=192.168.1.5 port=3306
启动MHA
[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/mysql/manager.log 2>&1 & [1] 7635 [root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf mysql (pid:7635) is running(0:PING_OK), master:192.168.1.6
将脚本文件放置在/usr/local/bin/目录下,并增加执行权限(master_ip_failover事先准备)
[root@slave02 ~]# mv master_ip_failover /usr/local/bin/ [root@slave02 ~]# chmod +x /usr/local/bin/master_ip_failover
master_ip_failover脚本如下:
#!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); my $vip = '192.168.1.100/24'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { # updating global catalog, etc $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &stop_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
修改master_ip_failover配置信息
[root@slave02 ~]# vim /usr/local/bin/master_ip_failover my $vip = '192.168.1.10/24'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
编辑manager配置文件
[root@slave02 ~]# vim /etc/mha/mysql.cnf master_ip_failover_script=/usr/local/bin/master_ip_failover
在主库手工生成第一个vip地址
[root@slave01 ~]# ifconfig ens33:1 192.168.1.10/24
注意:第一次配置VIP时,需要在主库手工生成VIP,因为只有在故障切换时才会自动生成VIP。
重启MHA
[root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf Stopped mysql successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1 [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/mysql/manager.log 2>&1 & [1] 8498 [root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf mysql (pid:8498) is running(0:PING_OK), master:192.168.1.6
测试VIP功能:将此时的主库db02关闭mysqld服务模拟故障,预想结果应是db01接管master,且vip跳转到db01
# 模拟故障,关闭db02 mysqld服务 [root@slave01 ~]# systemctl stop mysqld # 查看db01是否有vip信息 [root@master ~]# ifconfig ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.10 netmask 255.255.255.0 broadcast 192.168.1.255 ether 00:0c:29:df:f3:fd txqueuelen 1000 (Ethernet) # 查看db03的从库状态 db03 [(none)]>show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.5 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000006 Read_Master_Log_Pos: 194 Relay_Log_File: slave02-relay-bin.000002 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000006 Slave_IO_Running: Yes Slave_SQL_Running: Yes
MHA是一次性的高可用解决方案,因此出现了宕机需要得到及时的通知。MHA提供了report_script参数进行邮件告警。
[root@slave02 ~]# mv send_report /usr/local/bin/ [root@slave02 ~]# chmod +x /usr/local/bin/send_report
[root@slave02 ~]# vim /etc/mha/mysql.cnf report_script=/usr/local/bin/send_report
[root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
[root@master ~]# systemctl stop mysqld
额外配置一台机器,记录主库所有二进制日志文件,防止SSH无法连接的情况。
注意:额外机器必须与原库版本一致,本测试采用db03机器。
[root@slave02 ~]# vim /etc/mha/mysql.cnf [binlog1] no_master=1 hostname=192.168.1.7 master_binlog_dir=/binlog
[root@slave02 ~]# mkdir /binlog [root@slave02 ~]# chown -R mysql.mysql /binlog/
[root@slave02 ~]# cd /binlog/ # 必须进入创建好的目录 [root@slave02 ~]# mysqlbinlog -R --host=192.168.1.5 --user=mha --password=mha --raw --stop-never mysql-bin.000006 & # 注意:拉取日志的起点应该按照该目前从库已经获得到的二进制日志点为起点
[root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
故障处理:主库宕机,binlog server自动停止,manager也自动停止
处理思路: