之前写的《mysql B+Tree索引的一点理解》一文中,介绍了MySQL在使用辅助索引的原理,通过辅助索引进行回表不难理解就相当于Oracle的index skip scan.但是mysql5.6版本中推出了mrr功能,其实就是将随机访问的数据,通过内部机制缓存到线程内存read_rnd_buffer_size中,然后进行排序,排序后的数据再访问主键索引,将随机访问改变为了顺序访问。
如果没有这个功能,那么每获取一个辅助索引的叶子块就会遍历一下主键,找到对应的数据--该过程我们又称为回表。
mrr功能,将这些辅助索引扫描后的数据同一进行缓存,然后一次性访问主键索引,然后找到对应的数据,这样就大大减少了访问数据块的数量
mysql数据库有一个预读功能,也就是访问一个页的数据时,将临近页也会加载到内存中,刚好需要下一页的数据时就不再需要进行物理IO
说明:本测试在mysql 5.7.35中进行测试。
Create Table: CREATE TABLE `salaries` ( `emp_no` int(11) NOT NULL, `salary` int(11) NOT NULL, `from_date` date NOT NULL, `to_date` date NOT NULL, PRIMARY KEY (`emp_no`,`from_date`), KEY `idx_salaries_salary` (`salary`), CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8
root@localhost [employees]>show variables like '%optimizer_switch%'\G; *************************** 1. row *************************** Variable_name: optimizer_switch Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,prefer_ordering_index=on 1 row in set (0.00 sec)
从这里并没有发现该执行步骤使用了mrr功能,还是每行检索之后访问主键索引,然后进行回表
root@localhost [employees]>explain select * from salaries where salary>10000 and salary<40000; +----+-------------+----------+------------+-------+---------------------+---------------------+---------+------+-------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+-------+---------------------+---------------------+---------+------+-------+----------+-----------------------+ | 1 | SIMPLE | salaries | NULL | range | idx_salaries_salary | idx_salaries_salary | 4 | NULL | 21450 | 100.00 | Using index condition | +----+-------------+----------+------------+-------+---------------------+---------------------+---------+------+-------+----------+-----------------------+
root@localhost [employees]>set optimizer_switch='mrr_cost_based=off'; Query OK, 0 rows affected (0.00 sec) root@localhost [employees]>show variables like '%optimizer_switch%'\G; *************************** 1. row *************************** Variable_name: optimizer_switch Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=off,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,prefer_ordering_index=on 1 row in set (0.00 sec)
这时我们发现,执行计划已经使用了mrr功能,对辅助索引数据进行缓存之后,一次回表,
root@localhost [employees]>explain select * from salaries where salary>10000 and salary<40000; +----+-------------+----------+------------+-------+---------------------+---------------------+---------+------+-------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+-------+---------------------+---------------------+---------+------+-------+----------+----------------------------------+ | 1 | SIMPLE | salaries | NULL | range | idx_salaries_salary | idx_salaries_salary | 4 | NULL | 21450 | 100.00 | Using index condition; Using MRR | +----+-------------+----------+------------+-------+---------------------+---------------------+---------+------+-------+----------+----------------------------------+
但是上面基于成本MySQL为什么没有使用这种方式呢?
显然上面回表效率是高效的,但是MySQL优化器对于MRR功能又是相当的悲观。还是尽可能的选择索引扫描回表。这是我们需要注意的地方