在Oracle中,文本型字段直方图示例2个。
首先准备基础表:
1CREATE TABLE T_ST_20170605_LHR(ID NUMBER,STR VARCHAR2(30)); 2INSERT INTO T_ST_20170605_LHR SELECT ROWNUM ID,1 STR FROM DUAL CONNECT BY LEVEL<=10001; 3UPDATE T_ST_20170605_LHR T SET T.STR=6 WHERE T.ID=10001; 4EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ST_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS STR SIZE 2');
查看直方图信息:
1LHR@orclasm > COL COLUMN_NAME FORMAT A15 2LHR@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ST_20170605_LHR'; 4COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM 5--------------- ------------ ---------- ----------- --------------- 6STR 2 0 2 FREQUENCY 7LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170605_LHR'; 9TABLE_NAME COLUMN_NAME ENDPOINT_VALUE ENDPOINT_NUMBER 10------------------------------ --------------- -------------- --------------- 11T_ST_20170605_LHR STR 2.5442E+35 10000 12T_ST_20170605_LHR STR 2.8038E+35 10001
这里的ENDPOINT_VALUE值需要去转换,字符‘1’的16进制的dump值为0x31,字符‘6’的16进制的dump值为0x36,
1LHR@orclasm > SELECT DUMP('1',16),DUMP('6',16) FROM DUAL; 3DUMP('1',16) DUMP('6',16) 4---------------- ---------------- 5Typ=96 Len=1: 31 Typ=96 Len=1: 36
将0x31右边补0一直补到15个字节(共30位),再将其转换为10进制数,0x36类似,如下所示:
1LHR@orclasm > SELECT TO_NUMBER('310000000000000000000000000000','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') C1,TO_NUMBER('360000000000000000000000000000','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') C2 FROM DUAL; C1 C2 4---------- ---------- 52.5442E+35 2.8038E+35
可以看到转换后的结果和之前查询出来的结果一致。为了方便转换给出如下函数:
1CREATE OR REPLACE FUNCTION HEXSTR(P_NUMBER IN NUMBER) RETURN VARCHAR2 AS L_STR LONG := TO_CHAR(P_NUMBER, 'fm' || RPAD('x', 50, 'x')); L_RETURN VARCHAR2(4000); 4BEGIN WHILE (L_STR IS NOT NULL) LOOP L_RETURN := L_RETURN || CHR(TO_NUMBER(SUBSTR(L_STR, 1, 2), 'xx')); L_STR := SUBSTR(L_STR, 3); END LOOP; RETURN(SUBSTR(L_RETURN, 1, 6)); 11END;
再次查询:
1LHR@orclasm > COL ENDPOINT_VALUE2 FORMAT A15 2LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER,HEXSTR(ENDPOINT_VALUE) ENDPOINT_VALUE2 FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170605_LHR'; 4TABLE_NAME COLUMN_NAME ENDPOINT_VALUE ENDPOINT_NUMBER ENDPOINT_VALUE2 5------------------------------ --------------- -------------- --------------- --------------- 6T_ST_20170605_LHR STR 2.5442E+35 10000 1 7T_ST_20170605_LHR STR 2.8038E+35 10001 6
示例2:
准备如下的表:
1DROP TABLE T_HG_20170601_LHR; 2CREATE TABLE T_HG_20170601_LHR AS SELECT LEVEL RN,'1' NAMES FROM DUAL D CONNECT BY LEVEL<=10001; 3SELECT COUNT(1) FROM T_HG_20170601_LHR; 4UPDATE T_HG_20170601_LHR T SET T.NAMES=2 WHERE T.RN=10001; 5SELECT T.NAMES,COUNT(1) FROM T_HG_20170601_LHR T GROUP BY T.NAMES; 6CREATE INDEX IDX_NAME ON T_HG_20170601_LHR(NAMES); 7EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_HG_20170601_LHR',NO_INVALIDATE => FALSE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1');--不收集直方图
数据分布情况如下所示:
1LHR@orclasm > SELECT T.NAMES,COUNT(1) FROM T_HG_20170601_LHR T GROUP BY T.NAMES; 2N COUNT(1) 3- ---------- 1
NAMES为2的SQL执行计划:
1LHR@orclasm > SELECT * FROM T_HG_20170601_LHR T WHERE T.NAMES='2'; RN N 3---------- - 2 5Execution Plan 6---------------------------------------------------------- 7Plan hash value: 2479558392 8--------------------------------------------------------------------------------------- 9| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 10--------------------------------------------------------------------------------------- 11| 0 | SELECT STATEMENT | | 5001 | 30006 | 7 (0)| 00:00:01 | 12|* 1 | TABLE ACCESS FULL| T_HG_20170601_LHR | 5001 | 30006 | 7 (0)| 00:00:01 | 13---------------------------------------------------------------------------------------
上述SQL应该走列NAMES上的索引IDX_NAME,但实际上CBO这里却选择了全表扫描。这是因为CBO默认认为列NAMES的数据是均匀分布的,而其实该列上的DISTINCT值只有1和2这两个值,所以CBO评估出来的对列B施加等值查询条件的可选择率就是1/2,进而评估出来的对列B施加等值查询条件的结果集的Cardinality就是5001:
1LHR@orclasm > SELECT ROUND(10001*(1/2)) FROM DUAL; 2ROUND(10001*(1/2)) 3------------------
正是因为CBO评估出上述等值查询要返回结果集的Cardinality是5001,己经占了表T_HG_20170601_LHR总记录数的一半,所以CBO认为此时再走列B上的索引IDX_NAME就己经不合适了,进而就选择了全表扫描。但实际上,CBO对上述等值查询要返回结果集的Cardinality的评估己经与事实严重不符,评估出来的值是5001,其实却只有1,差了好几个数量级。
对表T_HG_20170601_LHR的列NAMES收集了直方图统计信息后,从如下结果可以看到,此时CBO正确地评估出了返回结果集的Cardinality不是5001而是1,进而就正确地选择了走索引IDX_NAME的执行计划:
1LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_HG_20170601_LHR',NO_INVALIDATE => FALSE,METHOD_OPT=>'FOR COLUMNS NAMES SIZE AUTO'); 2PL/SQL procedure successfully completed. 3LHR@orclasm > SELECT * FROM T_HG_20170601_LHR T WHERE T.NAMES='2'; RN N 5---------- - 2 7Execution Plan 8---------------------------------------------------------- 9Plan hash value: 2033494884 10------------------------------------------------------------------------------------------------- 11| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 12------------------------------------------------------------------------------------------------- 13| 0 | SELECT STATEMENT | | 1 | 6 | 2 (0)| 00:00:01 | 14| 1 | TABLE ACCESS BY INDEX ROWID| T_HG_20170601_LHR | 1 | 6 | 2 (0)| 00:00:01 | 15|* 2 | INDEX RANGE SCAN | IDX_NAME | 1 | | 1 (0)| 00:00:01 | 16-------------------------------------------------------------------------------------------------
& 说明:
有关直方图的更多内容可以参考我的BLOG:http://blog.itpub.net/26736162/viewspace-2139293/
本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗
详细内容可以添加麦老师微信或QQ私聊。
● 本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用
● 作者博客地址:http://blog.itpub.net/26736162/abstract/1/
● 本系列题目来源于作者的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解
● 版权所有,欢迎分享本文,转载请保留出处
● QQ:646634621 QQ群:618766405
● 提供OCP、OCM和高可用部分最实用的技能培训
● 题目解答若有不当之处,还望各位朋友批评指正,共同进步
DBA宝典
长按下图识别二维码或微信扫描下图二维码来关注小麦苗的微信公众号:xiaomaimiaolhr,学习最实用的数据库技术。
本文分享自微信公众号 - DB宝(lhrdba)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。