前言
对于刚接触MySQL数据库不久的小伙伴来说,或多或少会遇到一些棘手的小问题;比如在数据库磁盘无损坏的情况下,我们经常会遇到数据库服务器的磁盘io压力很大,但由于服务器上只有MySQL程序,我们第一想法肯定是某个连接用户在连到数据库后,背着我们在瞎搞着什么;然而通过show processlist命令,却并没有看到不良的操作,可恶。下面模拟一个简单场景,让我们揪出这狗人的邪恶操作。
场景模拟,制造IO压力
[root@tidb-master ~]# /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=10000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 prepare sysbench 1.0.17 (using system LuaJIT 2.0.4) Creating table 'sbtest1'... Inserting 10000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... [root@tidb-master ~]# /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=10000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 run sysbench 1.0.17 (using system LuaJIT 2.0.4) Running the test with following options: Number of threads: 1 Report intermediate results every 2 second(s) Initializing random number generator from current time Initializing worker threads... Threads started! [ 2s ] thds: 1 tps: 19.48 qps: 399.05 (r/w/o: 279.69/79.91/39.46) lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 1 tps: 30.01 qps: 599.76 (r/w/o: 420.18/119.55/60.03) lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 1 tps: 38.00 qps: 751.49 (r/w/o: 524.99/150.50/76.00) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 1 tps: 41.00 qps: 827.05 (r/w/o: 581.03/164.01/82.00) lat (ms,95%): 61.08 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 1 tps: 35.50 qps: 709.43 (r/w/o: 496.45/141.99/70.99) lat (ms,95%): 64.47 err/s: 0.00 reconn/s: 0.00
问题分析,捉出狗人
1、通过top命令发现io wait比较高
top - 16:46:49 up 176 days, 6:46, 1 user, load average: 3.72, 2.39, 3.04 Tasks: 182 total, 1 running, 181 sleeping, 0 stopped, 0 zombie %Cpu0 : 4.8 us, 2.0 sy, 0.0 ni, 91.1 id, 1.7 wa, 0.0 hi, 0.0 si, 0.3 st %Cpu1 : 53.2 us, 10.8 sy, 0.0 ni, 14.6 id, 13.9 wa, 0.0 hi, 7.1 si, 0.3 st %Cpu2 : 11.7 us, 4.8 sy, 0.0 ni, 73.8 id, 9.3 wa, 0.0 hi, 0.0 si, 0.3 st %Cpu3 : 11.4 us, 3.1 sy, 0.0 ni, 78.9 id, 6.6 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 3.8 us, 1.4 sy, 0.0 ni, 93.4 id, 1.0 wa, 0.0 hi, 0.0 si, 0.3 st %Cpu5 : 5.2 us, 2.8 sy, 0.0 ni, 89.6 id, 2.4 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 6.2 us, 2.4 sy, 0.0 ni, 83.3 id, 7.6 wa, 0.0 hi, 0.0 si, 0.3 st %Cpu7 : 6.8 us, 2.1 sy, 0.0 ni, 86.0 id, 4.8 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu8 : 1.7 us, 1.4 sy, 0.0 ni, 94.3 id, 2.4 wa, 0.0 hi, 0.0 si, 0.3 st %Cpu9 : 16.1 us, 5.1 sy, 0.0 ni, 70.9 id, 7.9 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 10.3 us, 2.7 sy, 0.0 ni, 82.5 id, 4.1 wa, 0.0 hi, 0.0 si, 0.3 st %Cpu11 : 10.1 us, 3.1 sy, 0.0 ni, 75.0 id, 11.8 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 12137024 total, 5308908 free, 3601408 used, 3226708 buff/cache KiB Swap: 0 total, 0 free, 0 used. 7555800 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8858 actiont+ 20 0 4867792 697536 15840 S 172.4 5.7 6:12.06 mysqld 26447 tidb 20 0 4000364 1.0g 5384 S 6.0 9.0 9785:41 TiFlashMain 22761 actiont+ 20 0 2526348 37548 10980 S 4.3 0.3 1031:05 ustats 25894 tidb 20 0 11.2g 159668 13560 S 2.3 1.3 3682:45 pd-server 25975 tidb 20 0 2637164 1.0g 7392 S 2.0 8.8 4121:29 tikv-server 25967 actiont+ 20 0 1879304 16292 6112 S 1.3 0.1 166:08.91 uguard-agent
2、通过iostat命令进一步分析,我们看到磁盘比较繁忙,忙于读写中(%uitl值比较大,一般来说,值大于75是表明磁盘比较繁忙了)
[root@tidb-tikv01 ~]# iostat -x 1 Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01) 04/19/2022 _x86_64_ (12 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.20 0.00 0.44 0.17 0.10 98.09 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.01 0.18 3.15 21.14 80.79 232.37 25.79 0.19 7.69 6.15 7.92 0.75 1.82 scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.00 0.00 0.50 0.50 0.00 0.50 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 9.60 0.00 3.43 9.43 0.09 77.46 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.00 0.00 1328.00 3237.00 21248.00 39091.50 26.44 2.71 0.59 1.09 0.39 0.17 79.50 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 11.27 0.00 3.76 9.31 0.00 75.66 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.00 0.00 1304.00 3561.00 20864.00 45302.00 27.20 2.85 0.58 1.16 0.37 0.16 79.60 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 8.06 0.00 2.66 19.04 0.17 70.07 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.00 0.00 917.00 3091.00 14672.00 43048.50 28.80 8.35 2.09 3.19 1.76 0.21 82.50 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3、通过iop命令可以定位是哪个线程消耗IO比较厉害,根据以下显示,发现线程值为22056的MySQL线程ID占用IO高达50.62%
Total DISK READ : 5.78 M/s | Total DISK WRITE : 16.91 M/s Actual DISK READ: 6.01 M/s | Actual DISK WRITE: 23.69 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 22056 be/4 actionte 4.83 M/s 233.93 K/s 0.00 % 50.62 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --daemonize ~ch-mysql --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8887 be/4 actionte 0.00 B/s 0.00 B/s 0.00 % 6.31 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --daemonize ~ch-mysql --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 8911 be/4 actionte 211.29 K/s 0.00 B/s 0.00 % 2.54 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --daemonize ~ch-mysql --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 22771 be/4 actionte 0.00 B/s 3.77 K/s 0.00 % 1.72 % ustats
4、将上面获取的线程ID带入到MySQL中,便定位到是数据库中哪个连接搞的鬼了,抽死你丫的(performance_schama.threads中THREAD_OS_ID为系统线程ID)
[root@tidb-tikv01 ~]# /opt/mysql/base/8.0.21/bin/mysql -S /opt/mysql/data/3306/mysqld.sock -pxx mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 13832 Server version: 8.0.21 MySQL Community Server - GPL Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show processlist; +-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 232299 | Waiting on empty queue | NULL | | 38 | universe_op | 127.0.0.1:58456 | NULL | Sleep | 1 | | NULL | | 39 | universe_op | 127.0.0.1:58454 | NULL | Sleep | 1 | | NULL | | 40 | universe_op | 127.0.0.1:58458 | NULL | Sleep | 1 | | NULL | | 41 | universe_op | 127.0.0.1:58460 | NULL | Sleep | 1 | | NULL | | 42 | universe_op | 127.0.0.1:58466 | NULL | Sleep | 1 | | NULL | | 43 | universe_op | 127.0.0.1:58462 | NULL | Sleep | 1 | | NULL | | 44 | universe_op | 127.0.0.1:58464 | NULL | Sleep | 1 | | NULL | | 47 | universe_op | 127.0.0.1:58656 | NULL | Sleep | 7 | | NULL | | 48 | universe_op | 127.0.0.1:58658 | NULL | Sleep | 7 | | NULL | | 49 | universe_op | 127.0.0.1:58660 | NULL | Sleep | 7 | | NULL | | 50 | universe_op | 127.0.0.1:58662 | NULL | Sleep | 7 | | NULL | | 51 | universe_op | 127.0.0.1:58666 | NULL | Sleep | 7 | | NULL | | 52 | universe_op | 127.0.0.1:58664 | NULL | Sleep | 7 | | NULL | | 53 | universe_op | 127.0.0.1:58668 | NULL | Sleep | 7 | | NULL | | 54 | universe_op | 127.0.0.1:58670 | NULL | Sleep | 7 | | NULL | | 56 | universe_op | 10.186.61.45:55982 | NULL | Binlog Dump GTID | 232282 | Master has sent all binlog to slave; waiting for more updates | NULL | | 231 | universe_op | 127.0.0.1:41766 | NULL | Sleep | 1 | | NULL | | 11155 | sun | 10.186.61.16:53874 | testdb | Sleep | 0 | | NULL | | 13832 | root | localhost | NULL | Query | 0 | starting | show processlist | +-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+ 20 rows in set (0.00 sec) mysql> select * from performance_schema.threads where THREAD_OS_ID=22056\G *************************** 1. row *************************** THREAD_ID: 11200 NAME: thread/sql/one_connection TYPE: FOREGROUND PROCESSLIST_ID: 11155 PROCESSLIST_USER: sun PROCESSLIST_HOST: 10.186.61.16 PROCESSLIST_DB: testdb PROCESSLIST_COMMAND: Query PROCESSLIST_TIME: 0 PROCESSLIST_STATE: statistics PROCESSLIST_INFO: SELECT c FROM sbtest1 WHERE id=6433600 PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: TCP/IP THREAD_OS_ID: 22056 RESOURCE_GROUP: USR_default 1 row in set (0.01 sec) mysql>
总结
对于一些新入门的小可爱来说,以上方法可以循序渐进的帮助你定位到是哪些连接线程把你的服务器IO折腾高的,可以让你更加了解Linux系统与MySQL之间的联动关系。老鸟嘛,emmm... 咦,飞走了。