在Oracle中,什么是基数(Cardinality)和可选择率(Selectivity)?
基数(Cardinality)是Oracle预估的返回行数,即对目标SQL的某个具体执行步骤的执行结果所包含记录数的估算值。如果是针对整个目标SQL,那么此时的Cardinality就表示该SQL最终执行结果所包含记录数的估算值。例如,一张表T有1000行数据,列COL1上没有直方图,没有空值,并且不重复的值(Distinct Value)有500个。那么,在使用条件“WHERE COL1=<VALUE>”去访问表的时候,优化器会假设数据均匀分布,它估计出会有1000/500=2行被选出来,2就是这步操作的Cardinality。通常情况下,Cardinality越准确,生成的执行计划就会越高效。
可选择率(Selectivity)是指施加指定谓词条件后返回结果集的记录数占未施加任何谓词条件的原始结果集的记录数的比率。可选择率的取值范围显然是0~1,它的值越小,就表明可选择性越好。当可选择率为1时的可选择性是最差的。CBO就是用可选择率来估算对应结果集的Cardinality的,可选择率和Cardinality之间的关系如下所示:
1cardinality=NUM_ROWS*selectivity
其中,NUM_ROWS表示表的总行数。
在Oracle数据库中,Oracle会默认认为SQL语句的WHERE条件中出现的各列彼此之间是独立的,是没有关联关系的。所以,如果目标SQL语句各列之间是以AND来组合的话,那么该SQL语句整个WHERE条件的组合可选择率就等于各个列各自施加查询条件后可选择率的乘积。在得到了SQL语句整个WHERE条件的组合可选择率后,Oracle会用它来估算整个SQL语句返回结果集的Cardinality,估算的方法就是用目标表的总记录数(NUM_ROWS)乘组合可选择率。但Oracle默认认为的各列之间是独立的、没有关联关系的前提条件并不总是正确的,在实际的应用中各列之间有关联关系的情况实际上并不罕见。在这种情况下如果还用上述计算方法来计算目标SQL语句整个WHERE条件的组合可选择率并用它来估算返回结果集的Cardinality的话,那么估算结果可能就会与实际结果有较大的偏差,进而可能导致CBO选错执行计划,所以Oracle又引入了动态采样和多列统计信息。
下表给出了一些常见的可选择率计算公式:
下面给出一个示例:
1DROP TABLE T_ROWS_20170605_LHR; 2CREATE TABLE T_ROWS_20170605_LHR AS SELECT ROWNUM ID,'NAME1' SAL FROM DUAL CONNECT BY LEVEL<=10000; 3UPDATE T_ROWS_20170605_LHR T SET T.ID='' WHERE T.ID<=100; 4EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ROWS_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',estimate_percent => 100); 5LHR@orclasm > COL LOW_VALUE FORMAT A20 6LHR@orclasm > COL HIGH_VALUE FORMAT A20 7LHR@orclasm > SELECT D.LOW_VALUE,D.HIGH_VALUE,UTL_RAW.CAST_TO_NUMBER(D.LOW_VALUE) LOW_VALUE2,UTL_RAW.CAST_TO_NUMBER(D.HIGH_VALUE) HIGH_VALUE2, D.NUM_DISTINCT,D.NUM_NULLS FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ROWS_20170605_LHR' AND D.OWNER = 'LHR' AND D.COLUMN_NAME='ID'; 9LOW_VALUE HIGH_VALUE LOW_VALUE2 HIGH_VALUE2 NUM_DISTINCT NUM_NULLS 10-------------------- -------------------- ---------- ----------- ------------ ---------- 11C20202 C302 101 10000 9900 100 13LHR@orclasm > SELECT MIN(T.ID),DUMP(MIN(T.ID),16) LOW_VALUE,MAX(T.ID),DUMP(MAX(T.ID),16) HIGH_VALUE FROM T_ROWS_20170605_LHR T; MIN(T.ID) LOW_VALUE MAX(T.ID) HIGH_VALUE 15---------- -------------------- ---------- -------------------- Typ=2 Len=3: c2,2,2 10000 Typ=2 Len=2: c3,2
下面分别执行如下4条SQL语句并获取执行计划:
1SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=1000;--1 2SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>1000; --9000 3SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>=1000; --9001 4SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID BETWEEN 1000 AND 1100; --101 7LHR@orclasm > set autot on exp 8LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=1000; COUNT(1) 10---------- 12Execution Plan 13---------------------------------------------------------- 14Plan hash value: 612708570 15------------------------------------------------------------------------------------------ 16| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 17------------------------------------------------------------------------------------------ 18| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | 19| 1 | SORT AGGREGATE | | 1 | 4 | | | 20|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 1 | 4 | 9 (0)| 00:00:01 | 21------------------------------------------------------------------------------------------ 22Predicate Information (identified by operation id): 23--------------------------------------------------- - filter("T"."ID"=1000) 26-- ROUND(NUM_ROWS*(1/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)) 27LHR@orclasm > SELECT ROUND(10000*1/9900*((10000-100)/10000)) VALUE FROM DUAL; VALUE 30---------- 33LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>1000; COUNT(1) 35---------- 37Execution Plan 38---------------------------------------------------------- 39Plan hash value: 612708570 40------------------------------------------------------------------------------------------ 41| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 42------------------------------------------------------------------------------------------ 43| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | 44| 1 | SORT AGGREGATE | | 1 | 4 | | | 45|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 9001 | 36004 | 9 (0)| 00:00:01 | 46------------------------------------------------------------------------------------------ 47Predicate Information (identified by operation id): 48--------------------------------------------------- - filter("T"."ID">1000) 51--ROUND(NUM_ROWS*((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE))*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)) 52LHR@orclasm > SELECT ROUND(10000*((10000-1000)/(10000-101))*((10000-100)/10000)) VALUE FROM DUAL; VALUE 55---------- 58 59LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>=1000; COUNT(1) 61---------- 63Execution Plan 64---------------------------------------------------------- 65Plan hash value: 612708570 66------------------------------------------------------------------------------------------ 67| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 68------------------------------------------------------------------------------------------ 69| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | 70| 1 | SORT AGGREGATE | | 1 | 4 | | | 71|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 9002 | 36008 | 9 (0)| 00:00:01 | 72------------------------------------------------------------------------------------------ 73Predicate Information (identified by operation id): 74--------------------------------------------------- - filter("T"."ID">=1000) 77--ROUND(NUM_ROWS*((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)) 78LHR@orclasm > SELECT ROUND(10000*((10000-1000)/(10000-101)+1/9900)*((10000-100)/10000)) VALUE FROM DUAL; VALUE 81---------- 84 85LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID BETWEEN 1000 AND 1100; COUNT(1) 87---------- 89Execution Plan 90---------------------------------------------------------- 91Plan hash value: 612708570 92------------------------------------------------------------------------------------------ 93| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 94------------------------------------------------------------------------------------------ 95| 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | 96| 1 | SORT AGGREGATE | | 1 | 4 | | | 97|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 102 | 408 | 9 (0)| 00:00:01 | 98------------------------------------------------------------------------------------------ 99Predicate Information (identified by operation id): 100--------------------------------------------------- - filter("T"."ID"<=1100 AND "T"."ID">=1000) 102LHR@orclasm > 104--ROUND(NUM_ROWS*(((VAL2-VAL1)/(HIGH_VALUE-LOW_VALUE)+2/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS))) 105LHR@orclasm > SELECT ROUND(10000*(((1100-1000)/(10000-101)+2/9900)*((10000-100)/10000))) VALUE FROM DUAL; VALUE 108----------
可见预估行数和用公式计算出来的结果相吻合。
下面再查看有频率直方图的时候基数的计算。
1DROP TABLE T_ROWS_20170605_LHR; 2CREATE TABLE T_ROWS_20170605_LHR AS SELECT ROWNUM ID,'NAME1' SAL FROM DUAL CONNECT BY LEVEL<=10000; 3UPDATE T_ROWS_20170605_LHR T SET T.ID='' WHERE T.ID<=100; 4UPDATE T_ROWS_20170605_LHR SET ID=2 WHERE ID BETWEEN 101 AND 200; 5UPDATE T_ROWS_20170605_LHR SET ID=3 WHERE ID BETWEEN 200 AND 3000; 6UPDATE T_ROWS_20170605_LHR SET ID=9 WHERE ID BETWEEN 3000 AND 9999; 7SELECT T.ID,COUNT(*) FROM T_ROWS_20170605_LHR T GROUP BY T.ID;
查看数据分布:
1LHR@orclasm > SELECT T.ID,COUNT(*) FROM T_ROWS_20170605_LHR T GROUP BY T.ID; ID COUNT(*) 4---------- ---------- 10000 1 100 2800 6999
收集频率直方图:
1LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ROWS_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS ID SIZE 6',estimate_percent => 100); 3PL/SQL procedure successfully completed. 5LHR@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM,D.DENSITY FROM DBA_TAB_COLUMNS D WHERE D.TABLE_NAME = 'T_ROWS_20170605_LHR' AND D.COLUMN_NAME='ID'; 7COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM DENSITY 8------------------------------ ------------ ---------- ----------- --------------- ---------- 9ID 4 100 4 FREQUENCY .000050505 11LHR@orclasm > COL COLUMN_NAME FORMAT A6 12LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE,NVL((ENDPOINT_NUMBER-(LAG(ENDPOINT_NUMBER) OVER (ORDER BY ENDPOINT_VALUE))),ENDPOINT_NUMBER) COUNTS FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ROWS_20170605_LHR' AND COLUMN_NAME='ID'; 14TABLE_NAME COLUMN ENDPOINT_NUMBER ENDPOINT_VALUE COUNTS 15------------------------------ ------ --------------- -------------- ---------- 16T_ROWS_20170605_LHR ID 100 2 100 17T_ROWS_20170605_LHR ID 2900 3 2800 18T_ROWS_20170605_LHR ID 9899 9 6999 19T_ROWS_20170605_LHR ID 9900 10000 1
当目标列有频率直方图并且对目标列施加等值查询条件时,如果查询条件的输入值等于目标列的某个Bucket的ENDPOINT_VALUE,那么cardinality=Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER:
1LHR@orclasm > SET AUTOT ON 2LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=3; COUNT(1) 5---------- 8Execution Plan 9---------------------------------------------------------- 10Plan hash value: 612708570 12------------------------------------------------------------------------------------------ 13| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 14------------------------------------------------------------------------------------------ 15| 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 | 16| 1 | SORT AGGREGATE | | 1 | 3 | | | 17|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 2800 | 8400 | 9 (0)| 00:00:01 | 18------------------------------------------------------------------------------------------ 20Predicate Information (identified by operation id): 21--------------------------------------------------- 2 - filter("T"."ID"=3) 26Statistics 27---------------------------------------------------------- recursive calls db block gets consistent gets physical reads redo size bytes sent via SQL*Net to client bytes received via SQL*Net from client SQL*Net roundtrips to/from client sorts (memory) sorts (disk) rows processed
可见,预估行数为2800,和直方图中存储的值吻合(2900-100)。
当目标列有频率直方图并且对目标列施加等值查询条件时,如果查询条件的输入值不等于目标列的任意一个Bucket的ENDPOINT_VALUE,那么cardinality=MIN(Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER)/2:
1LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=4; COUNT(1) 4---------- 7Execution Plan 8---------------------------------------------------------- 9Plan hash value: 612708570 11------------------------------------------------------------------------------------------ 12| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 13------------------------------------------------------------------------------------------ 14| 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 | 15| 1 | SORT AGGREGATE | | 1 | 3 | | | 16|* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 1 | 3 | 9 (0)| 00:00:01 | 17------------------------------------------------------------------------------------------ 19Predicate Information (identified by operation id): 20--------------------------------------------------- 2 - filter("T"."ID"=4) 25Statistics 26---------------------------------------------------------- recursive calls db block gets consistent gets physical reads redo size bytes sent via SQL*Net to client bytes received via SQL*Net from client SQL*Net roundtrips to/from client sorts (memory) sorts (disk) rows processed 38LHR@orclasm > select round(1/2) from dual; 40ROUND(1/2) 41----------
在直方图中,由于MIN(Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER)=1,所以,ROUND(1/2)=1,和执行计划中的预估行数相吻合。
& 说明:
有关CBO的更多计算公式可以参考作者BLOG:http://blog.itpub.net/26736162/viewspace-2140358/
本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗
详细内容可以添加麦老师微信或QQ私聊。
● 本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用
● 作者博客地址:http://blog.itpub.net/26736162/abstract/1/
● 本系列题目来源于作者的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解
● 版权所有,欢迎分享本文,转载请保留出处
● QQ:646634621 QQ群:618766405
● 提供OCP、OCM和高可用部分最实用的技能培训
● 题目解答若有不当之处,还望各位朋友批评指正,共同进步
DBA宝典
长按下图识别二维码或微信扫描下图二维码来关注小麦苗的微信公众号:xiaomaimiaolhr,学习最实用的数据库技术。
本文分享自微信公众号 - DB宝(lhrdba)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。