程序员人生 网站导航

关于动态抽样(Dynamic Sampling)

栏目:数据库应用时间:2015-01-14 09:04:58
关于动态抽样(Dynamic Sampling)

原文:http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html


本文将回答:甚么是动态抽样?动态抽样有啥作用?和不同级别的动态抽样的意思?


1、甚么是动态采样?
动态抽样从 oracle 9i第2版引入。它使得优化器(CBO)在硬解析期间有能力抽样1个未分析的表
(any table that has been created and loaded but not yet analyzed)的统计(决定表默许统计),并且可以验证优化器的”料想“。
因其只在查询硬解析期间为优化器动态生成更好的统计,得名动态采样。


动态采样提供11个设置级别。注意:9i中其默许值为1 到了10g默许值为2


2、动态采样如何工作?
有两种使用方式:
△ 设置OPTIMIZER_DYNAMIC_SAMPLING参数,可以再实例和会话级别设置动态采样。
△ 使用DYNAMIC_SAMPLING hint


来看1下不使用动态采样的日子怎样过的
create table t
as
select owner, object_type
from all_objects
/




select count(*) from t;


  COUNT(*)
------------------------
      68076


code1: 禁用动态采样视察默许基数


set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;


Execution Plan
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16010 |   437K|    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16010 |   437K|    55   (0)| 00:00:01 |
--------------------------------------------------------------------------


--注意0级别即为禁用动态采样,环境默许是开启动态采样的


履行计划显示基数:16010远低于上面查询的68076,明显不靠谱。


code2: 更加接近显示的基数
select * from t;


Execution Plan
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 77871 |  2129K|    56   (2)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 77871 |  2129K|    56   (2)| 00:00:01 |
--------------------------------------------------------------------------


Note
------------------------------------------
- dynamic sampling used for this statement




code3: 被高估的基数
SQL> delete from t; 
68076 rows deleted. 


SQL> commit; 
Commit complete. 


SQL> set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;


Execution Plan 
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16010 |   437K|    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16010 |   437K|    55   (0)| 00:00:01 |
--------------------------------------------------------------------------


SQL> select * from t;


Execution Plan
-----------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    55   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |     1 |    28 |    5    (0)| 00:00:01 |
--------------------------------------------------------------------------


Note
---------------------------------------
- dynamic sampling used for this statement




3、动态采样什么时候帮助优化器验证其猜想?
我们知道当使用DBMS_STATS搜集了表信息后,优化器会得到以下统计:
1)表,行数,平均行宽等;
2)单独列,高低值,唯1值数量,直方图(可能)等;
3)单独索引,聚集因素,叶子块数量,索引高度等。


但注意这里面缺少了某些关键统计信息,例如表中不同列数据之间的关联!
假定你你有1个全球人口普查表!
1个属性是:诞生月份MONTH_BORN_IN,另外一个属性是:所属星座ZODIAC_SIGN。搜集信息后,你问优化器诞生在11月份的人数?
假定12个月人数正常散布,那末优化器很快给出答案是全量数据的1/12!再问1个:星座是双鱼座的人数呢?答案也是1/12!
迄今为止优化器对答如流!!!nice work!
但是第3个问题来了:诞生在11月份并且星座是双鱼座的人数是多少呢?
明眼人转下头脑就知道答案是0(双鱼座2月19日-3月20日)!但是我们看优化器的答案:1/12/12!!! 多么想入非非的答案,思惟定式!这样就会诞生差的履行计划,
也正是在此时我们的动态采样开始干预:


code4: 创建摹拟数据
SQL > create table t
  as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
               decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.*
    from all_objects a
/
Table created.


SQL > create index t_idx on t(flag1,flag2);
Index created.


SQL > begin
         dbms_stats.gather_table_stats
         ( user, 'T',
         method_opt=>'for all indexed columns size 254' );
end;
/
PL/SQL procedure successfully completed.


SQL> select num_rows, num_rows/2, 
num_rows/2/2 from user_tables 
where table_name = 'T';


NUM_ROWS  NUM_ROWS/2  NUM_ROWS/2/2
--------  ----------  ------------  
   68076       34038         17019


code5:验证1下上面的说法:
SQL> set autotrace traceonly explain
SQL> select * from t where flag1='N';


Execution Plan
------------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 33479 |  3432K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 33479 |  3432K|   292   (1)| 00:00:04 |
--------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("FLAG1"='N')


SQL> select * from t where flag2='N';


Execution Plan
----------------------------
Plan hash value: 1601196873


---------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 34597 |   3547K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 34597 |   3547K|   292   (1)| 00:00:04 |
---------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------


   1 - filter("FLAG2"='N')


--至此1切正常!so far, so good!


code5: here comes the problem   
SQL> select * from t where flag1 = 'N' and flag2 = 'N';


Execution Plan
----------------------------
Plan hash value: 1601196873


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 17014 |  1744K|   292   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 17014 |  1744K|   292   (1)| 00:00:04 |
--------------------------------------------------------------------------


Predicate Information (identified by operation id):
----------------------------------------------------


   1 - filter("FLAG1" = 'N' AND "FLAG2" = 'N')


--验证了我们前面说的优化器此时想入非非了


code7: 动态采样听令,开始参与
SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';


Execution Plan
-----------------------------
Plan hash value: 470836197


------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    6 |   630 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |    6 |   630 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |    6 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
----------------------------------------------------


   2 - access("FLAG1"='N' AND "FLAG2"='N')


code8: 我们打开SQL_TRACE会看到以下语句:
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE
   NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
   NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"),
   NVL(SUM(C2),:"SYS_B_01"), NVL(SUM(C3),:"SYS_B_02")
FROM
  (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T")
   NO_PARALLEL_INDEX("T") */ :"SYS_B_03" AS C1, CASE WHEN "T"."FLAG1"=
   :"SYS_B_04" AND "T"."FLAG2"=:"SYS_B_05" THEN :"SYS_B_06" ELSE :"SYS_B_07"
   END AS C2, CASE WHEN "T"."FLAG2"=:"SYS_B_08" AND "T"."FLAG1"=:"SYS_B_09"
   THEN :"SYS_B_10" ELSE :"SYS_B_11" END AS C3 FROM "T" SAMPLE BLOCK
   (:"SYS_B_12" , :"SYS_B_13") SEED (:"SYS_B_14") "T") SAMPLESUB   
   
可以看出来优化器在验证其料想。。。   
   
   
4、动态采样级别:
现在列出11个级别,详细请参考:http://docs.oracle.com/cd/B19306_01/server.102/b14211/stats.htm#i43032
1)Level 0: Do not use dynamic sampling. 
0级:不使用动态采样。


2)Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
1级:满足以下条件则采样所有没被分析的表:
(1)查询中最少有1个未分析表;
(2)这个未分析表被关联另外1个表或出现在子查询或非merge视图中;
(3)这个未分析表有索引;
(4)这个未分析表有过剩动态采样默许的数据块数(默许是32块)。


3)Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
2级:对所有未分析表进行动态采样。采样数据块数量是默许数量的2倍。


4)Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
3级:在2级基础上加上那些使用了料想选择消除表,采样数据块数量等于默许数量。对未分析表,采样数量2倍于默许数量。




5)Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
4级:在3级基础上加上那些有单表谓词关联2个或多个列,采样数据块数量等于默许数量。对未分析表,采样数量2倍于默许数量。




6)Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
5,6,7,8,9级在4级基础上分别使用2,4,8,32,128倍于默许动态采样数据块数量。


7)Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
10级:在9级基础上对表中所有数据块进行采样。


5、甚么时候合适采取动态采样?


这是1个狡猾的问题,没有1定使用经验,还真不好意思说。
通常:
1)我们使用3和4级进行动态采样。
2)如果我们SQL的解析时间很快但是履行时间巨慢,可以斟酌使用动态采样。典型的就是数据仓库系统。
3)OLTP系统中都是1个SQL重复履行,解析和履行都在瞬息之间,所以不建议使用高级别的动态采样。这会给SQL带来硬解析消耗。
这个时候可以斟酌SQL Profile,你可以理解为“静态采样”。


关于SQL Profiles参考:http://docs.oracle.com/cd/B28359_01/server.111/b28274/sql_tune.htm#PFGRF02605


-------------------------------------

Dylan    Presents.












------分隔线----------------------------
------分隔线----------------------------

最新技术推荐