在Oracle中,直方图使用示例。
下面给出直方图的一个示例,先准备数据分布不均衡的一张表:
1CREATE TABLE T_ST_20170604_LHR AS SELECT ROWNUM ID,ROWNUM SAL FROM DUAL CONNECT BY LEVEL<=10000; 2UPDATE T_ST_20170604_LHR SET SAL=5000 WHERE SAL BETWEEN 6 AND 9995; --9990 3CREATE INDEX T_ST_20170604_LHR_SAL ON T_ST_20170604_LHR(SAL); 4EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ST_20170604_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS SAL SIZE 1');
查询数据信息:
1SYS@orclasm > SET LINESIZE 9999 2SYS@orclasm > COL COLUMN_NAME FORMAT A15 3SYS@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170604_LHR'; 4TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE 5------------------------------ --------------- --------------- -------------- 6T_ST_20170604_LHR SAL 0 1 7T_ST_20170604_LHR SAL 1 10000 8 9SYS@orclasm > SELECT D.COLUMN_NAME, 10 2 D.NUM_DISTINCT, 11 3 D.NUM_NULLS, 12 4 D.NUM_BUCKETS, 13 5 D.HISTOGRAM 14 6 FROM DBA_TAB_COL_STATISTICS D 15 7 WHERE D.TABLE_NAME = 'T_ST_20170604_LHR'; 16 17COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM 18--------------- ------------ ---------- ----------- --------------- 19SAL 11 0 1 NONE
列SAL上只有最大值,最小值两条记录分别对应端点号(ENDPOINT_NUMBER)0和1,这种显示说明列SAL没有直方图信息,从DBA_TAB_COL_STATISTICS的HISTOGRAM列值为NONE也说明列SAL没有直方图。
1SYS@orclasm > SET AUTOT TRACE 2SYS@orclasm > SELECT * FROM T_ST_20170604_LHR WHERE SAL=1; 3 4 5Execution Plan 6---------------------------------------------------------- 7Plan hash value: 738598333 8 9----------------------------------------------------------------------------------------------------- 10| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 11----------------------------------------------------------------------------------------------------- 12| 0 | SELECT STATEMENT | | 909 | 6363 | 4 (0)| 00:00:01 | 13| 1 | TABLE ACCESS BY INDEX ROWID| T_ST_20170604_LHR | 909 | 6363 | 4 (0)| 00:00:01 | 14|* 2 | INDEX RANGE SCAN | T_ST_20170604_LHR_SAL | 909 | | 2 (0)| 00:00:01 | 15----------------------------------------------------------------------------------------------------- 16 17Predicate Information (identified by operation id): 18--------------------------------------------------- 19 20 2 - access("SAL"=1) 21 22 23Statistics 24---------------------------------------------------------- 25 0 recursive calls 26 0 db block gets 27 4 consistent gets 28 0 physical reads 29 0 redo size 30 592 bytes sent via SQL*Net to client 31 520 bytes received via SQL*Net from client 32 2 SQL*Net roundtrips to/from client 33 0 sorts (memory) 34 0 sorts (disk) 35 1 rows processed 36 37SYS@orclasm > SELECT * FROM T_ST_20170604_LHR WHERE SAL=5000; 38 399990 rows selected. 40 41Execution Plan 42---------------------------------------------------------- 43Plan hash value: 738598333 44 45----------------------------------------------------------------------------------------------------- 46| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 47----------------------------------------------------------------------------------------------------- 48| 0 | SELECT STATEMENT | | 909 | 6363 | 4 (0)| 00:00:01 | 49| 1 | TABLE ACCESS BY INDEX ROWID| T_ST_20170604_LHR | 909 | 6363 | 4 (0)| 00:00:01 | 50|* 2 | INDEX RANGE SCAN | T_ST_20170604_LHR_SAL | 909 | | 2 (0)| 00:00:01 | 51----------------------------------------------------------------------------------------------------- 52 53Predicate Information (identified by operation id): 54--------------------------------------------------- 55 56 2 - access("SAL"=5000) 57 58Statistics 59---------------------------------------------------------- 60 0 recursive calls 61 0 db block gets 62 1368 consistent gets 63 0 physical reads 64 0 redo size 65 244008 bytes sent via SQL*Net to client 66 7835 bytes received via SQL*Net from client 67 667 SQL*Net roundtrips to/from client 68 0 sorts (memory) 69 0 sorts (disk) 70 9990 rows processed 71 72SYS@orclasm > SELECT ROUND(10000*(1/11)) FROM DUAL; 73 74ROUND(10000*(1/11)) 75------------------- 76 909
SAL列为等值查询,全部选择索引范围扫描,且预估行数(Rows)均为909。Rows的值来源于:ROUND(NUM_ROWS*(1/NUM_DISTINCT_VAL))=ROUND(10000*(1/11))=909,和执行计划里的909相吻合,因为没有收集列的直方图信息,所以优化器估算返回行数和实际返回行数还是有不少差距。
下面针对SAL列收集直方图:
1SYS@orclasm > SET AUTOT OFF 2SYS@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ST_20170604_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS SAL SIZE AUTO'); 3 4PL/SQL procedure successfully completed. 5 6SYS@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ST_20170604_LHR'; 7 8COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM 9--------------- ------------ ---------- ----------- --------------- 10SAL 11 0 11 FREQUENCY 11 12 13SYS@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER,NVL((ENDPOINT_NUMBER-(LAG(ENDPOINT_NUMBER) OVER (ORDER BY ENDPOINT_VALUE))),ENDPOINT_NUMBER) COUNTS FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170604_LHR'; 14 15TABLE_NAME COLUMN_NAME ENDPOINT_VALUE ENDPOINT_NUMBER COUNTS 16------------------------------ --------------- -------------- --------------- ---------- 17T_ST_20170604_LHR SAL 1 1 1 18T_ST_20170604_LHR SAL 2 2 1 19T_ST_20170604_LHR SAL 3 3 1 20T_ST_20170604_LHR SAL 4 4 1 21T_ST_20170604_LHR SAL 5 5 1 22T_ST_20170604_LHR SAL 5000 9995 9990 23T_ST_20170604_LHR SAL 9996 9996 1 24T_ST_20170604_LHR SAL 9997 9997 1 25T_ST_20170604_LHR SAL 9998 9998 1 26T_ST_20170604_LHR SAL 9999 9999 1 27T_ST_20170604_LHR SAL 10000 10000 1 28 2911 rows selected. 30 31SYS@orclasm > SELECT SAL ENDPOINT_VALUE,SUM(COUNT(*)) OVER(ORDER BY SAL RANGE UNBOUNDED PRECEDING) ENDPOINT_NUMBER,COUNT(*) COUNTS FROM T_ST_20170604_LHR T GROUP BY T.SAL; 32 33ENDPOINT_VALUE ENDPOINT_NUMBER COUNTS 34-------------- --------------- ---------- 35 1 1 1 36 2 2 1 37 3 3 1 38 4 4 1 39 5 5 1 40 5000 9995 9990 41 9996 9996 1 42 9997 9997 1 43 9998 9998 1 44 9999 9999 1 45 10000 10000 1 4611 rows selected. 47 48SYS@orclasm >
由于列SAL唯一值的个数没有超过254,所以Oracle自动收集频率直方图。NUM_BUCKETS表示桶数,一共有11个Buckets,所以在DBA_TAB_HISTOGRAMS中,ENDPOINT_VALUE列记录的就是这11个不同的DISTINCT值。ENDPOINT_NUMBER则记录了到此DISTINCT值为止累加的行数。
1SYS@orclasm > SET AUTOT TRACE 2SYS@orclasm > SELECT * FROM T_ST_20170604_LHR WHERE SAL=1; 3 4 5Execution Plan 6---------------------------------------------------------- 7Plan hash value: 738598333 8 9----------------------------------------------------------------------------------------------------- 10| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 11----------------------------------------------------------------------------------------------------- 12| 0 | SELECT STATEMENT | | 1 | 7 | 2 (0)| 00:00:01 | 13| 1 | TABLE ACCESS BY INDEX ROWID| T_ST_20170604_LHR | 1 | 7 | 2 (0)| 00:00:01 | 14|* 2 | INDEX RANGE SCAN | T_ST_20170604_LHR_SAL | 1 | | 1 (0)| 00:00:01 | 15----------------------------------------------------------------------------------------------------- 16 17Predicate Information (identified by operation id): 18--------------------------------------------------- 19 20 2 - access("SAL"=1) 21 22 23Statistics 24---------------------------------------------------------- 25 0 recursive calls 26 0 db block gets 27 4 consistent gets 28 0 physical reads 29 0 redo size 30 592 bytes sent via SQL*Net to client 31 520 bytes received via SQL*Net from client 32 2 SQL*Net roundtrips to/from client 33 0 sorts (memory) 34 0 sorts (disk) 35 1 rows processed 36SYS@orclasm > SELECT * FROM T_ST_20170604_LHR WHERE SAL=5000; 37 389990 rows selected. 39 40 41Execution Plan 42---------------------------------------------------------- 43Plan hash value: 513072079 44 45--------------------------------------------------------------------------------------- 46| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 47--------------------------------------------------------------------------------------- 48| 0 | SELECT STATEMENT | | 9990 | 69930 | 7 (0)| 00:00:01 | 49|* 1 | TABLE ACCESS FULL| T_ST_20170604_LHR | 9990 | 69930 | 7 (0)| 00:00:01 | 50--------------------------------------------------------------------------------------- 51 52Predicate Information (identified by operation id): 53--------------------------------------------------- 54 55 1 - filter("SAL"=5000) 56 57 58Statistics 59---------------------------------------------------------- 60 0 recursive calls 61 0 db block gets 62 1368 consistent gets 63 0 physical reads 64 0 redo size 65 244008 bytes sent via SQL*Net to client 66 7835 bytes received via SQL*Net from client 67 667 SQL*Net roundtrips to/from client 68 0 sorts (memory) 69 0 sorts (disk) 70 9990 rows processed
在SAL=1时候使用了索引扫描,而SAL=5000时候,已经使用全表扫描了,说明直方图起了作用。从执行计划的Rows部分也可以看出Oracle计算出来的Cardinality是9990,和实际的情况完全吻合(这里的Rows为CURRENT_ENDPOINT_NUMBER-PREVIOUS_ENDPOINT_NUMBER)。可以看出这种频率直方图统计的列的信息是非常精确的。
下面重新对列SAL收集直方图,让Bucket的数量为9,小于SAL列的DISTINCT的值,那么Oracle会收集高度直方图。
1SYS@orclasm > SET AUTOT OFF 2SYS@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ST_20170604_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS SAL SIZE 9'); 3 4PL/SQL procedure successfully completed. 5 6SYS@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ST_20170604_LHR'; 7 8COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM 9--------------- ------------ ---------- ----------- --------------- 10SAL 11 0 9 HEIGHT BALANCED 11 12SYS@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_VALUE,ENDPOINT_NUMBER FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ST_20170604_LHR'; 13 14TABLE_NAME COLUMN_NAME ENDPOINT_VALUE ENDPOINT_NUMBER 15------------------------------ --------------- -------------- --------------- 16T_ST_20170604_LHR SAL 1 0 17T_ST_20170604_LHR SAL 5000 8 18T_ST_20170604_LHR SAL 10000 9
在高度平衡直方图中,在DBA_TAB_HISTOGRAMS视图中,EDNPOINT_NUMBER代表桶号,且自动省去EDNPOINT_VALUE值相同且ENDPOINT_NUMBER相邻的桶的值(节省空间,合并存储)。ENDPOINT_VALUE表示每一个桶中的最大值,而第一个桶记录的是最小值(Bucket为0的行,即EDNPOINT_NUMBER为0的行)。重复出现为ENDPOINT_VALUE的值称为Popular Value,这里的0即Popular Value。显然,Popular Value所在记录的ENDPOINT_NUMBER值和它上一条记录的ENDPOINT_NUMBER值之间的差值越大,则意味着该Popular Value在目标表中所占的比例也就越大,它所对应的Cardinality也就越大。
一共有9个桶(不包含0号Bucket),在该直方图图中,1到7号桶被省略存储,说明和8号桶的存储是一样的。
1SYS@orclasm > SET AUTOT TRACE 2SYS@orclasm > SELECT * FROM T_ST_20170604_LHR WHERE SAL=1; 3 4 5Execution Plan 6---------------------------------------------------------- 7Plan hash value: 738598333 8 9----------------------------------------------------------------------------------------------------- 10| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 11----------------------------------------------------------------------------------------------------- 12| 0 | SELECT STATEMENT | | 111 | 777 | 2 (0)| 00:00:01 | 13| 1 | TABLE ACCESS BY INDEX ROWID| T_ST_20170604_LHR | 111 | 777 | 2 (0)| 00:00:01 | 14|* 2 | INDEX RANGE SCAN | T_ST_20170604_LHR_SAL | 111 | | 1 (0)| 00:00:01 | 15----------------------------------------------------------------------------------------------------- 16 17Predicate Information (identified by operation id): 18--------------------------------------------------- 19 20 2 - access("SAL"=1) 21 22 23Statistics 24---------------------------------------------------------- 25 0 recursive calls 26 0 db block gets 27 4 consistent gets 28 0 physical reads 29 0 redo size 30 592 bytes sent via SQL*Net to client 31 520 bytes received via SQL*Net from client 32 2 SQL*Net roundtrips to/from client 33 0 sorts (memory) 34 0 sorts (disk) 35 1 rows processed 36 37SYS@orclasm > SELECT * FROM T_ST_20170604_LHR WHERE SAL=5000; 38 399990 rows selected. 40 41 42Execution Plan 43---------------------------------------------------------- 44Plan hash value: 513072079 45 46--------------------------------------------------------------------------------------- 47| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 48--------------------------------------------------------------------------------------- 49| 0 | SELECT STATEMENT | | 8889 | 62223 | 7 (0)| 00:00:01 | 50|* 1 | TABLE ACCESS FULL| T_ST_20170604_LHR | 8889 | 62223 | 7 (0)| 00:00:01 | 51--------------------------------------------------------------------------------------- 52 53Predicate Information (identified by operation id): 54--------------------------------------------------- 55 56 1 - filter("SAL"=5000) 57 58 59Statistics 60---------------------------------------------------------- 61 0 recursive calls 62 0 db block gets 63 1368 consistent gets 64 0 physical reads 65 0 redo size 66 244008 bytes sent via SQL*Net to client 67 7835 bytes received via SQL*Net from client 68 667 SQL*Net roundtrips to/from client 69 0 sorts (memory) 70 0 sorts (disk) 71 9990 rows processed 72 73SYS@orclasm >
在高度平衡直方图中执行计划的列的选择性就不是那么的精确了,而在现实很多时候,列的唯一值是超过254的,那么只能使用高度平衡直方图了。
如果需要删除直方图信息,在Oracle 10g中可以通过设置“METHOD_OPT=>'FOR COLUMNS SAL SIZE 1'”,但这却得再次收集表的统计信息,十分不合理,所以,在Oracle 11g中,有如下方法可以直接删除直方图信息:
1EXEC DBMS_STATS.DELETE_COLUMN_STATS(USER,'T_ST_20170604_LHR','SAL',COL_STAT_TYPE => 'HISTOGRAM');
其中,COL_STAT_TYPE默认为ALL,表示删除列的基本统计信息和直方图信息。
本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗

详细内容可以添加麦老师微信或QQ私聊。
● 本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用
● 作者博客地址:http://blog.itpub.net/26736162/abstract/1/
● 本系列题目来源于作者的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解
● 版权所有,欢迎分享本文,转载请保留出处
● QQ:646634621 QQ群:618766405
● 提供OCP、OCM和高可用部分最实用的技能培训
● 题目解答若有不当之处,还望各位朋友批评指正,共同进步
DBA宝典
长按下图识别二维码或微信扫描下图二维码来关注小麦苗的微信公众号:xiaomaimiaolhr,学习最实用的数据库技术。
