Storage engines collect statistics about tables for use by the optimizer.
储引擎收集关于表的统计信息,供优化器使用。
Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value.
表统计信息基于值组,其中值组是一组具有相同键前缀值的行。
For optimizer purposes, an important statistic is the average value group size.
出于优化器的目的,一个重要的统计信息是平均值组大小。
MySQL uses the average value group size in the following ways:
MySQL以以下方式使用平均值组大小:
To estimate how many rows must be read for each ref access
估计每次引用访问必须读取多少行
To estimate how many rows a partial join produces; that is, the number of rows that an operation of this form produces:
估计部分连接生成的行数;即,这种形式的操作产生的行数:
As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table.
作为索引值组的平均尺寸的增加,该指数不太有用的两个目的因为每个查询的平均行数增加:指数有利于优化的目的,最好是每个索引价值目标表中的行数。
When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.
当给定的索引值产生大量的行时,索引就不那么有用了,MySQL也不太可能使用它。
The average value group size is related to table cardinality, which is the number of value groups.
平均值组大小与表基数有关,表基数是值组的数量。
The SHOW INDEX statement displays a cardinality value based on N/S
, where N
is the number of rows in the table and S
is the average value group size.
SHOW INDEX语句显示基于N/S的基数值,其中N是表中的行数,S是平均值组大小。
That ratio yields an approximate number of value groups in the table.
该比率产生表中值组的大致数目。
For a join based on the <=>
comparison operator, NULL
is not treated differently from any other value: NULL <=> NULL
, just as
for any other N
<=> N
N
.
对于基于<=>比较操作符的连接,NULL与其他值没有区别:NULL <=> NULL,就像N <=> N对于任何其他N一样。
However, for a join based on the =
operator, NULL
is different from non-NULL
values:
is not true when expr1
= expr2
expr1
or expr2
(or both) are NULL
. This affects ref accesses for comparisons of the form tbl_name.
然而,对于基于=操作符的连接,NULL与非NULL值不同:当expr1或expr2(或两者)为NULL时,expr1 = expr2不为真。这影响了表单tbl_name的引用访问。
: MySQL does not access the table if the current value of key
= expr
expr
is NULL
, because the comparison cannot be true.
key = expr:如果expr的当前值为NULL, MySQL不会访问该表,因为比较不可能是真的。
For =
comparisons, it does not matter how many NULL
values are in the table.
对于=比较,表中有多少NULL值并不重要。
For optimization purposes, the relevant value is the average size of the non-NULL
value groups. However, MySQL does not currently enable that average size to be collected or used.
出于优化目的,相关值是非null值组的平均大小。然而,MySQL目前还不允许收集或使用平均大小。
For InnoDB
and MyISAM
tables, you have some control over collection of table statistics by means of the innodb_stats_method and myisam_stats_method system variables, respectively. These variables have three possible values, which differ as follows:
对于InnoDB和MyISAM表,你可以通过innodb_stats_method和myisam_stats_method系统变量来控制表统计信息的收集。这些变量有三个可能的值,它们的不同如下:
When the variable is set to nulls_equal
, all NULL
values are treated as identical (that is, they all form a single value group).
当变量被设置为nulls_equal时,所有NULL值都被视为相同的(也就是说,它们都组成一个值组)。
If the NULL
value group size is much higher than the average non-NULL
value group size, this method skews the average value group size upward.
如果NULL值组的大小远高于非NULL值组的平均大小,该方法将使平均值组的大小向上倾斜。
This makes index appear to the optimizer to be less useful than it really is for joins that look for non-NULL
values.
这使得索引在优化器看来并不像在查找非null值的连接中那样有用。
Consequently, the nulls_equal
method may cause the optimizer not to use the index for ref accesses when it should.
因此,nulls_equal方法可能会导致优化器在应该使用索引进行引用访问时不使用索引。
When the variable is set to nulls_unequal
, NULL
values are not considered the same. Instead, each NULL
value forms a separate value group of size 1.
当变量被设置为nulls_不等时,NULL值就不被认为是相同的。相反,每个NULL值形成一个大小为1的单独值组。
If you have many NULL
values, this method skews the average value group size downward.
如果有很多NULL值,这个方法会使平均值组的大小向下倾斜。
If the average non-NULL
value group size is large, counting NULL
values each as a group of size 1 causes the optimizer to overestimate the value of the index for joins that look for non-NULL
values.
如果非NULL值组的平均大小很大,那么将每个NULL值都计算为大小为1的组会导致优化器高估查找非NULL值的连接的索引值。
Consequently, the nulls_unequal
method may cause the optimizer to use this index for ref lookups when other methods may be better.
因此,当其他方法可能更好时,nulls_equal方法可能会导致优化器使用这个索引进行引用查找。
When the variable is set to nulls_ignored
, NULL
values are ignored.
当变量被设置为nulls_ignored时,NULL值将被忽略。
If you tend to use many joins that use <=>
rather than =
, NULL
values are not special in comparisons and one NULL
is equal to another. In this case, nulls_equal
is the appropriate statistics method.
如果您倾向于使用许多使用<=>而不是=的连接,那么NULL值在比较中并不特殊,一个NULL值等于另一个NULL值。在本例中,nulls_equal是合适的统计方法。
The innodb_stats_method system variable has a global value; the myisam_stats_method system variable has both global and session values.
innodb_stats_method系统变量有一个全局值;myisam_stats_method系统变量具有全局值和会话值。
Setting the global value affects statistics collection for tables from the corresponding storage engine.
设置全局值将影响来自相应存储引擎的表的统计信息收集。
Setting the session value affects statistics collection only for the current client connection.
设置会话值只影响当前客户端连接的统计信息收集。
This means that you can force a table's statistics to be regenerated with a given method without affecting other clients by setting the session value of myisam_stats_method.
这意味着可以通过设置myisam_stats_method的会话值,强制使用给定的方法重新生成表的统计信息,而不会影响其他客户端。
To regenerate MyISAM
table statistics, you can use any of the following methods:
要重新生成MyISAM表统计信息,可以使用以下任何一种方法:
Execute myisamchk --stats_method=method_name --analyze
执行myisamchk——stats_method=method_name——analyze
Change the table to cause its statistics to go out of date (for example, insert a row and then delete it), and then set myisam_stats_method and issue an ANALYZE TABLE statement
更改表以使其统计数据过期(例如,插入一行,然后删除该行),然后设置myisam_stats_method并发出ANALYZE table语句
Some caveats regarding the use of innodb_stats_method and myisam_stats_method:
关于innodb_stats_method和myisam_stats_method使用的一些注意事项:
You can force table statistics to be collected explicitly, as just described. However, MySQL may also collect statistics automatically.
以强制显式收集表统计信息,如上所述。然而,MySQL也可以自动收集统计信息。
For example, if during the course of executing statements for a table, some of those statements modify the table, MySQL may collect statistics.
例如,如果在执行表的语句过程中,其中一些语句修改了表,MySQL可能会收集统计信息。
(This may occur for bulk inserts or deletes, or some ALTER TABLE statements, for example.) If this happens, the statistics are collected using whatever value innodb_stats_method or myisam_stats_method has at the time.
(这可能发生在批量插入或删除,或一些ALTER TABLE语句,例如。)如果发生这种情况,则使用innodb_stats_method或myisam_stats_method当时的任何值收集统计信息。
Thus, if you collect statistics using one method, but the system variable is set to the other method when a table's statistics are collected automatically later, the other method is used.
因此,如果您使用一种方法收集统计信息,但是当稍后自动收集表的统计信息时,系统变量被设置为另一种方法,那么将使用另一种方法。
There is no way to tell which method was used to generate statistics for a given table.
无法判断使用了哪种方法来生成给定表的统计信息。
These variables apply only to InnoDB
and MyISAM
tables. Other storage engines have only one method for collecting table statistics. Usually it is closer to the nulls_equal
method.
这些变量只适用于InnoDB和MyISAM表。其他存储引擎只有一种收集表统计信息的方法。通常它更接近nulls_equal方法。