MySQL 优化器之Index merge Multi-Range Read MRR与Batched Key Access介绍
覆盖索引是在索引里即有查询时需要的字段,而在一个表上的查询往往条件会有多个组合且较为复杂,所以很难全部由覆盖索引包含到。那么此时我们就会用到索引合并(index merge)算法。
该算法主要包含三类情况:
算法 | 应用场景 | 执行计划 |
Intersection | and | Using intersect(...) |
Union | or | Using or(...) |
Sort-Union | 带范围查询的or | sort_union(...) |
EXPLAIN ANALYZE SELECT * FROM sakila.payment WHERE staff_id = 1 AND customer_id = 75; --结果 -> Filter: ((payment.customer_id = 75) and (payment.staff_id = 1)) (cost=8.12 rows=20) (actual time=2.329..2.637 rows=22 loops=1) -> Index range scan on payment using intersect(idx_fk_customer_id,idx_fk_staff_id) (cost=8.12 rows=20) (actual time=2.325..2.617 rows=22 loops=1)
从执行计划里我们不难看出这里用到了索引交(intersect)
而通过explain命令则可以看出是
mysql> EXPLAIN SELECT * FROM sakila.payment WHERE staff_id = 1 AND customer_id = 75\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_staff_id,idx_fk_customer_id key: idx_fk_customer_id,idx_fk_staff_id key_len: 2,1 ref: NULL rows: 20 filtered: 100.00 Extra: Using intersect(idx_fk_customer_id,idx_fk_staff_id); Using where
通过观察该查询和表的结构可以看出
show create table sakila.payment --结果 CREATE TABLE `payment` ( `payment_id` smallint unsigned NOT NULL AUTO_INCREMENT, `customer_id` smallint unsigned NOT NULL, `staff_id` tinyint unsigned NOT NULL, `rental_id` int DEFAULT NULL, `amount` decimal(5,2) NOT NULL, `payment_date` datetime NOT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`payment_id`) USING BTREE, KEY `idx_fk_staff_id` (`staff_id`) USING BTREE, KEY `idx_fk_customer_id` (`customer_id`) USING BTREE, KEY `fk_payment_rental` (`rental_id`) USING BTREE, CONSTRAINT `fk_payment_customer` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`customer_id`) ON DELETE RESTRICT ON UPDATE CASCADE, CONSTRAINT `fk_payment_rental` FOREIGN KEY (`rental_id`) REFERENCES `rental` (`rental_id`) ON DELETE SET NULL ON UPDATE CASCADE, CONSTRAINT `fk_payment_staff` FOREIGN KEY (`staff_id`) REFERENCES `staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=16050 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
字段staff_id 和customer_id 有各自的索引idx_fk_staff_id、idx_fk_customer_id,而我们的查询是等值条件的交集。
同理如下查询也是应用到了index merge的intersection算法。
EXPLAIN SELECT * FROM sakila.payment WHERE payment_id > 10 AND customer_id = 318;
当一个表查询中用到一系列的含OR的等值条件时一般会用到并算法。示例查询见下:
SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318; SELECT * FROM sakila.payment WHERE payment_id > 15000 OR customer_id = 318; -- 通过explain或者explain、EXPLAIN ANALYZE() mysql> EXPLAIN SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_staff_id,idx_fk_customer_id key: idx_fk_staff_id,idx_fk_customer_id key_len: 1,2 ref: NULL rows: 8069 filtered: 100.00 Extra: Using union(idx_fk_staff_id,idx_fk_customer_id); Using where mysql> EXPLAIN ANALYZE SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G *************************** 1. row *************************** EXPLAIN: -> Filter: ((sakila.payment.staff_id = 1) or (sakila.payment.customer_id = 318)) (cost=2306.30 rows=8069) (actual time=0.224..103.595 rows=8062 loops=1) -> Index range scan on payment using union(idx_fk_staff_id,idx_fk_customer_id) (cost=2306.30 rows=8069) (actual time=0.221..98.988 rows=8062 loops=1) mysql> EXPLAIN FORMAT=tree SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G *************************** 1. row *************************** EXPLAIN: -> Filter: ((sakila.payment.staff_id = 1) or (sakila.payment.customer_id = 318)) (cost=2116.85 rows=8069) -> Index range scan on payment using union(idx_fk_staff_id,idx_fk_customer_id) (cost=2116.85 rows=8069)
跟并算法类似,不过这里用的不是等值条件而是范围,详见如下例子:
-- 查询payment表里customer_id小于30或者rental_id小于10的信息。 mysql> EXPLAIN SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_customer_id,fk_payment_rental key: idx_fk_customer_id,fk_payment_rental key_len: 2,5 ref: NULL rows: 826 filtered: 100.00 Extra: Using sort_union(idx_fk_customer_id,fk_payment_rental); Using where 查询payment表里customer_id小于20或者rental_id大于18000的信息。 mysql> EXPLAIN SELECT * FROM sakila.payment WHERE customer_id < 20 OR rental_id >18000\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_customer_id,fk_payment_rental key: idx_fk_customer_id,fk_payment_rental key_len: 2,5 ref: NULL rows: 514 filtered: 100.00 Extra: Using sort_union(idx_fk_customer_id,fk_payment_rental); Using where EXPLAIN ANALYZE SELECT /*+ NO_INDEX_MERGE(payment) */ * FROM sakila.payment WHERE staff_id = 1 AND customer_id = 75; -> Filter: (payment.staff_id = 1) (cost=12.30 rows=21) (actual time=0.151..0.179 rows=22 loops=1) -> Index lookup on payment using idx_fk_customer_id (customer_id=75) (cost=12.30 rows=41) (actual time=0.149..0.167 rows=41 loops=1) 指定不用index merge的hint。
多范围读即Multi-Range Read (MRR)的主要作用是减少扫描辅助索引时产生的IO随机读。优化器一般先读取索引覆盖的字段,其它字段通过rowid关联到聚集索引而找出。多范围读(Multi-Range Read)优化可用于使用索引的范围扫描和等价连接,它不支持虚拟列上的二级索引。
具体的说当读取到辅助索引里字段后再关联到主键,对主键在内存里进行排序后再顺序的读取其它字段的信息。这样就会减少IO随机读。
InnoDB多范围读优化(MRR)的主要应用于没有覆盖索引的磁盘受限查询(当写比读快很多时,读就即被认为是磁盘受限)。
用optimizer_switch的参数来控制是否使用MRR.设置mrr=on时,表示启用MRR优化。
mrr_cost_based表示是否通过cost base的方式来启用MRR。
当mrr=on,mrr_cost_based=on,则表示cost base的方式还选择启用MRR优化,当发现优化后的代价过高时就会不使用该项优化。
当mrr=on,mrr_cost_based=off,则表示总是开启MRR优化。
mysql> EXPLAIN SELECT /*+ MRR(city) */ * FROM world.city WHERE CountryCode BETWEEN 'AUS' AND 'CHN'\G **************************** 1. row ***************************** id: 1 select_type: SIMPLE table: city partitions: NULL type: range possible_keys: CountryCode key: CountryCode key_len: 3 ref: NULL rows: 812 filtered: 100 Extra: Using index condition; Using MRR
批量键值访问(Batched Key Access)简称BKA,该优化操作是块内嵌循环和多范围读的整合。这样非索引联接就可以使用类似连接缓冲区索引联接的方式进行优化,通过使用多范围读取优化来减少随机I/O的数量。
由于批处理键值访问优化主要受益的查询范围相对较窄,而且其他查询的性能可能会下降,因此默认情况下禁用优化。
SET SESSION optimizer_switch = 'mrr=on,mrr_cost_based=off,batched_key_access=on'; mysql> EXPLAIN -> SELECT /*+ BKA(ci) */ -> co.Code, co.Name AS Country, -> ci.Name AS City -> FROM world.country co -> INNER JOIN world.city ci -> ON ci.CountryCode = co.Code\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: co partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 239 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: ci partitions: NULL type: ref possible_keys: CountryCode key: CountryCode key_len: 12 ref: world.co.Code rows: 17 filtered: 100.00 Extra: Using join buffer (Batched Key Access) In this example, the Batched Key Access is enabled using an optimizer hint for the join on the city (ci) table using the CountryCode index
上例中通过优化器提示(hint)启用了批量键访问选项,即在关联city表时用到CountryCode索引。
批量键值范围优化主要应用在数据量大的关联中,关联缓存则要相应的设置的多些,一般它对应的参数join_buffer_size要设置成4M或者更大。