在Oracle中,什么是待定的统计信息(Pending Statistic)?
在数据库系统运维中,DBA常常希望维持SQL执行计划的稳定。很多DBA和开发人员对于Hint的依赖,很大程度上也是源于在CBO情况下,执行计划对于统计量过于依赖,容易形成不稳定执行计划。所以,SQL语句执行计划的稳定性,就变成统计信息的稳定性问题。更进一步,就是新的统计信息更新,无论是手动收集还是自动收集,能否促进SQL语句生成更高效的执行计划。所以,一种思路是:在新的统计信息收集生成时,暂时不要生效投入执行计划生成。等待最后确认统计信息正确之后,再投入生产环境。
在Oracle 11g中,推出了统计信息管理的一种新技术——待定的统计信息(Pending Statistic)技术。简单的说,DBA可以对一系列的数据表设置PENDING属性。设置PENDING属性之后,数据的统计信息在数据字典中相当于已经锁定。当新的统计信息生成之后,不是直接替换原有的数据,而是存放在PENDING数据字典中。在PENDING字典中的统计信息在默认情况下是不会参与SQL执行计划的生成的。只有在进行SQL测试通过的时候,经过用户手工的确定,才会将其PUBLISH出来,替换原有的统计信息。这样,就给运维DBA提供了一种维持执行计划稳定的思路。通过固定统计信息,将新统计信息以PENDING的方式将原有的统计信息固定,从而稳定执行计划。
可以使用如下的SQL语句查询统计信息在全局、SCHEMA和表级别是否自动发布(默认情况下都是自动发布):
1SELECT DBMS_STATS.GET_PREFS('PUBLISH') GLOBAL,DBMS_STATS.GET_PREFS('PUBLISH','LHR') SCHEMA,DBMS_STATS.GET_PREFS('PUBLISH','LHR','T') TB_LEVEL FROM DUAL;
以上SQL语句的查询结果返回TRUE或FALSE。TRUE表明收集统计信息完成后自动发布,而FALSE表明收集统计信息完成后将待定。可以使用下面的包来改变各个级别的默认PUBLISH选项:
l 全局:EXEC DBMS_STATS.SET_GLOBAL_PREFS(PNAME=>'PUBLISH',PVALUE=>'FALSE');
l SCHEMA:EXEC DBMS_STATS.SET_SCHEMA_PREFS(OWNNAME=>USER,PNAME=>'PUBLISH',PVALUE=>'TRUE');
l 表:EXEC DBMS_STATS.SET_TABLE_PREFS(USER,'T_LHR','PUBLISH','FALSE');
缺省情况下,优化器使用数据字典视图中已发布的统计信息。如果希望优化器使用新收集的待定统计信息,那么可以设置初始化参数OPTIMIZER_USE_PENDING_STATISTICS的值为TRUE(缺省值为FALSE)。可以使用下面的SQL语句为一个特定的数据对象发布待定统计信息:
1EXEC DBMS_STATS.PUBLISH_PENDING_STATS('SH','CUSTOMERS');
如果不想发布待定的统计信息,那么可以执行下面的语句删除这些待定的统计信息:
1EXEC DBMS_STATS.DELETE_PENDING_STATS('SH','CUSTOMERS');
可以通过视图DBA_TAB_STATISTICS和DBA_IND_STATISTICS查询发布的统计信息,通过视图DBA_TAB_PENDING_STATS和DBA_IND_PENDING_STATS查询待定的统计信息。可以使用存储过程DBMS_STATS.EXPORT_PENDING_STATS导出待定的统计信息。如果已经发布了统计信息,想要恢复以前的统计信息,那么可以根据DBA_TAB_STATS_HISTORY中的STATS_UPDATE_TIME来确定TIMESTAMP,执行下面的操作,最后一个参数AS_OF_TIMESTAMP指的是恢复在这个时间点生效的统计信息,所以可以多1秒:
1LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; 2TABLE_NAME STATS_UPDATE_TIME 3------------------------------ ------------------- 4T_PS_20170605_LHR 2017-06-05 15:54:16 5T_PS_20170605_LHR 2017-06-05 16:17:29 6 7LHR@orclasm > EXEC DBMS_STATS.RESTORE_TABLE_STATS(OWNNAME => USER,TABNAME =>'T_PS_20170605_LHR',AS_OF_TIMESTAMP => TO_DATE('2017-06-05 15:54:17','YYYY-MM-DD HH24:MI:SS'));--恢复以前的统计信息 8PL/SQL procedure successfully completed.
下面给出一个使用Pending Statistic的完整示例:
1CREATE TABLE T_PS_20170605_LHR AS SELECT LEVEL ID, 'name' || LEVEL NAME FROM DUAL CONNECT BY LEVEL<= 10000 ; 2CREATE INDEX IDX_T_PS_20170605_LHR_ID ON T_PS_20170605_LHR(ID) ; 3EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_PS_20170605_LHR') ;
查询一下历史统计信息:
1LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; 2 3TABLE_NAME STATS_UPDATE_TIME 4------------------------------ ------------------- 5T_PS_20170605_LHR 2017-06-05 15:54:16
进行简单查询:
1LHR@orclasm > SET AUTOT ON 2LHR@orclasm > SELECT P.ID,P.NAME FROM T_PS_20170605_LHR P WHERE ID=1 ; 3 4 ID NAME 5---------- -------------------------------------------- 6 1 name1 7 8 9Execution Plan 10---------------------------------------------------------- 11Plan hash value: 2892875560 12 13-------------------------------------------------------------------------------------------------------- 14| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 15-------------------------------------------------------------------------------------------------------- 16| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 | 17| 1 | TABLE ACCESS BY INDEX ROWID| T_PS_20170605_LHR | 1 | 13 | 2 (0)| 00:00:01 | 18|* 2 | INDEX RANGE SCAN | IDX_T_PS_20170605_LHR_ID | 1 | | 1 (0)| 00:00:01 | 19-------------------------------------------------------------------------------------------------------- 20 21Predicate Information (identified by operation id): 22--------------------------------------------------- 23 24 2 - access("ID"=1) 25 26 27Statistics 28---------------------------------------------------------- 29 1 recursive calls 30 0 db block gets 31 4 consistent gets 32 0 physical reads 33 0 redo size 34 596 bytes sent via SQL*Net to client 35 519 bytes received via SQL*Net from client 36 2 SQL*Net roundtrips to/from client 37 0 sorts (memory) 38 0 sorts (disk) 39 1 rows processed 40 41LHR@orclasm >
设定表的PUBLISH选项为FALSE:
1LHR@orclasm > EXEC DBMS_STATS.SET_TABLE_PREFS(USER,'T_PS_20170605_LHR', 'PUBLISH', 'FALSE'); 2 3PL/SQL procedure successfully completed. 4 5LHR@orclasm > SELECT DBMS_STATS.GET_PREFS('PUBLISH',USER,'T_PS_20170605_LHR') FROM DUAL ; 6 7DBMS_STATS.GET_PREFS('PUBLISH',USER,'T_PS_20170605_LHR') 8--------------------------------------------------------------- 9FALSE
再次向表中插入2W行ID列都为1的数据:
1INSERT INTO T_PS_20170605_LHR(ID,NAME) SELECT 1, 'name' || LEVEL FROM DUAL CONNECT BY LEVEL<= 20000 ; 2COMMIT ;
再次收集一下统计信息,这个时候收集的统计信息不会立刻被优化器使用:
1LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_PS_20170605_LHR') ; 2 3PL/SQL procedure successfully completed. 4 5LHR@orclasm > SET AUTOT TRACEONLY 6 7LHR@orclasm > SELECT P.ID,P.NAME FROM T_PS_20170605_LHR P WHERE ID=1 ; 8 9 1020001 rows selected. 11 12 13Execution Plan 14---------------------------------------------------------- 15Plan hash value: 2892875560 16 17-------------------------------------------------------------------------------------------------------- 18| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 19-------------------------------------------------------------------------------------------------------- 20| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 | 21| 1 | TABLE ACCESS BY INDEX ROWID| T_PS_20170605_LHR | 1 | 13 | 2 (0)| 00:00:01 | 22|* 2 | INDEX RANGE SCAN | IDX_T_PS_20170605_LHR_ID | 1 | | 1 (0)| 00:00:01 | 23-------------------------------------------------------------------------------------------------------- 24 25Predicate Information (identified by operation id): 26--------------------------------------------------- 27 28 2 - access("ID"=1) 29 30 31Statistics 32---------------------------------------------------------- 33 0 recursive calls 34 0 db block gets 35 2778 consistent gets 36 0 physical reads 37 0 redo size 38 597478 bytes sent via SQL*Net to client 39 15182 bytes received via SQL*Net from client 40 1335 SQL*Net roundtrips to/from client 41 0 sorts (memory) 42 0 sorts (disk) 43 20001 rows processed
如所料,这里还是使用旧的统计信息,依旧使用INDEX RANGE SCAN代价比较高。看一下统计信息的情况,已经发布的统计信息还是比较老的,而如下所示PENDING表里面的统计信息表示新收集的待定的统计信息:
1LHR@orclasm > SELECT 'publish' AS STAT,T.NUM_ROWS,T.BLOCKS,TO_CHAR(T.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_STATISTICS T WHERE TABLE_NAME='T_PS_20170605_LHR' 2 2 UNION ALL 3 3 SELECT 'pending' AS STAT,S.NUM_ROWS,S.BLOCKS,TO_CHAR(S.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_PENDING_STATS S WHERE TABLE_NAME='T_PS_20170605_LHR'; 4 5STAT NUM_ROWS BLOCKS TO_CHAR(T.LAST_ANAL 6------- ---------- ---------- ------------------- 7publish 10000 29 2017-06-05 15:54:16 8pending 30000 84 2017-06-05 16:07:39
下面来验证一下新的统计信息是否有助于改善SQL语句的执行:
1LHR@orclasm > ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS = TRUE; 2 3Session altered. 4 5LHR@orclasm > SET AUTOT TRACEONLY 6LHR@orclasm > SELECT P.ID,P.NAME FROM T_PS_20170605_LHR P WHERE ID=1 ; 7 820001 rows selected. 9 10 11Execution Plan 12---------------------------------------------------------- 13Plan hash value: 4079616360 14 15--------------------------------------------------------------------------------------- 16| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 17--------------------------------------------------------------------------------------- 18| 0 | SELECT STATEMENT | | 19488 | 228K| 25 (0)| 00:00:01 | 19|* 1 | TABLE ACCESS FULL| T_PS_20170605_LHR | 19488 | 228K| 25 (0)| 00:00:01 | 20--------------------------------------------------------------------------------------- 21 22Predicate Information (identified by operation id): 23--------------------------------------------------- 24 25 1 - filter("ID"=1) 26 27 28Statistics 29---------------------------------------------------------- 30 0 recursive calls 31 0 db block gets 32 1414 consistent gets 33 0 physical reads 34 0 redo size 35 533474 bytes sent via SQL*Net to client 36 15182 bytes received via SQL*Net from client 37 1335 SQL*Net roundtrips to/from client 38 0 sorts (memory) 39 0 sorts (disk) 40 20001 rows processed
可以看到,使用优化器使用待定的统计信息生成的查询计划使用的是全表扫描,更加有效率验证结束,无误,可以发布新的统计信息了:
1LHR@orclasm > EXEC DBMS_STATS.PUBLISH_PENDING_STATS(USER,'T_PS_20170605_LHR'); 2 3PL/SQL procedure successfully completed. 4 5LHR@orclasm > ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS = FALSE; 6 7Session altered. 8 9LHR@orclasm > 10LHR@orclasm > set autot off 11LHR@orclasm > SELECT 'publish' AS STAT,T.NUM_ROWS,T.BLOCKS,TO_CHAR(T.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_STATISTICS T WHERE TABLE_NAME='T_PS_20170605_LHR' 12 2 UNION ALL 13 3 SELECT 'pending' AS STAT,S.NUM_ROWS,S.BLOCKS,TO_CHAR(S.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_PENDING_STATS S WHERE TABLE_NAME='T_PS_20170605_LHR'; 14 15STAT NUM_ROWS BLOCKS TO_CHAR(T.LAST_ANAL 16------- ---------- ---------- ------------------- 17publish 30000 84 2017-06-05 16:07:39 18 19LHR@orclasm > 20LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; 21 22TABLE_NAME STATS_UPDATE_TIME 23------------------------------ ------------------- 24T_PS_20170605_LHR 2017-06-05 15:54:16 25T_PS_20170605_LHR 2017-06-05 16:17:29
可以看到PENDING的统计信息已经发布并且从USER_TAB_PENDING_STATS中删除,USER_TAB_STATISTICS表中的LAST_ANALYZED时间显示的是统计信息收集的时间。如果已经发布了统计信息,想要恢复从前的统计信息,可以根据USER_TAB_STATS_HISTORY中的STATS_UPDATE_TIME来确定TIMESTAMP,执行下面的操作,最后一个参数AS_OF_TIMESTAMP指的是恢复在这个时间点生效的统计信息吗,所以可以多1秒:
1LHR@orclasm > EXEC DBMS_STATS.RESTORE_TABLE_STATS(OWNNAME => USER,TABNAME =>'T_PS_20170605_LHR',AS_OF_TIMESTAMP => TO_DATE('2017-06-05 15:54:17','YYYY-MM-DD HH24:MI:SS')); 2 3PL/SQL procedure successfully completed. 4 5LHR@orclasm > SELECT H.TABLE_NAME, TO_CHAR(H.STATS_UPDATE_TIME, 'YYYY-MM-DD HH24:MI:SS') STATS_UPDATE_TIME FROM USER_TAB_STATS_HISTORY H WHERE H.TABLE_NAME = 'T_PS_20170605_LHR'; 6 7TABLE_NAME STATS_UPDATE_TIME 8------------------------------ ------------------- 9T_PS_20170605_LHR 2017-06-05 15:54:16 10T_PS_20170605_LHR 2017-06-05 16:17:29 11T_PS_20170605_LHR 2017-06-05 16:22:20 12 13LHR@orclasm > SELECT 'publish' AS STAT,T.NUM_ROWS,T.BLOCKS,TO_CHAR(T.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_STATISTICS T WHERE TABLE_NAME='T_PS_20170605_LHR' 14 2 UNION ALL 15 3 SELECT 'pending' AS STAT,S.NUM_ROWS,S.BLOCKS,TO_CHAR(S.LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') FROM USER_TAB_PENDING_STATS S WHERE TABLE_NAME='T_PS_20170605_LHR'; 16 17STAT NUM_ROWS BLOCKS TO_CHAR(T.LAST_ANAL 18------- ---------- ---------- ------------------- 19publish 10000 29 2017-06-05 15:54:16