Oracle大数据量分区表统计信息的管理(DBMS_STATS.COPY_TABLE_STATS)
Oracle大数据量分区表统计信息的管理(DBMS_STATS.COPY_TABLE_STATS)
对于大数据量分区表,当数据加载后,如果不能及时更新统计信息,将导致sql产生不正确的执行计划,引起查询性能的下降,而对于一个数据加载比较频繁的分区表,往往无法在每次加载数据后及时收集统计信息,而且对于数据量比较大的表来说,收集统计信息本身就是比较耗费资源的操作。 这种情况下,可以采用DBMS_STATS.COPY_TABLE_STATS这个存储过程来将一个分区的
统计信息复制到另外一个分区,从而避免由于无法及时更新统计信息而导致的sql执行计划出现问题。
关于DBMS_STATS.COPY_TABLE_STATS的介绍Statistics can be copied from one partition of a table to another partition using DBMS_STATS.COPY_TABLE_STATS.
Statistics of all dependent object such as columns and local indexes will also be copied as will histogram information for all columns apart from the partition key column.
The histogram information for the partition key cannot be copied as we do not know the exact values and mapping of the partition key.
Statistics can optionally be scaled based on the given scale_factor.
The definition of the procedure is as follows:DBMS_STATS.COPY_TABLE_STATS ( ownname VARCHAR2, tabname VARCHAR2, srcpartnameVARCHAR2, dstpartnameVARCHAR2, scale_factor VARCHAR2 DEFAULT 1, flagsNUMBER DEFAULT, forceBOOLEAN DEFAULT FALSE);
Where:· Ownname: Name of the schema· Tabname: Table name of source and destination [sub] partition· Srcpartname: Source [sub] partition· Dtspartname: Destination [sub] partition· Scale factor: To scale nblks, nrows etc. in destination partition· Flags: For internal Oracle use (should be left as NULL)· Force: When TRUE, copy statistics will be copied even when statisitcs are locked by DBMS_STATS.LOCK_*_STATISTICS
Note: If there are no statistics present for the source partition then nothing is copied.
Note:
In 10g, this procedure is FOR INTERNAL USE ONLY. Use of the procedure in 10g may work but is not supported due to this.
In 11g the procedure is externalised and is supported.
我们下面来看一个关于DBMS_STATS.COPY_TABLE_STATS的测试:
SQL> create or replace function display_raw(rawval raw, type varchar2) return varchar2 is cn number; cv varchar2(32); cd date; cnv nvarchar2(32); cr rowid; cc char(32);begin if (type = 'NUMBER') then dbms_stats.convert_raw_value(rawval, cn); return to_char(cn); elsif (type = 'VARCHAR2') then dbms_stats.convert_raw_value(rawval, cv); return to_char(cv); elsif (type = 'DATE') then dbms_stats.convert_raw_value(rawval, cd); return to_char(cd); elsif (type = 'NVARCHAR2') then dbms_stats.convert_raw_value(rawval, cnv); return to_char(cnv); elsif (type = 'ROWID') then dbms_stats.convert_raw_value(rawval, cr); return to_char(cnv); elsif (type = 'CHAR') then dbms_stats.convert_raw_value(rawval, cc); return to_char(cc); else return 'UNKNOWN DATATYPE'; end if;end;/
Function created.
SQL> set lines 132 pages 100
SQL> col COLUMN_NAME for a20
SQL> col PARTITION_NAME for a10
SQL> col low_val for a30
SQL> col high_val for a30
SQL> col data_type for a20
SQL> alter session set nls_date_format='yyyymmdd hh24:mi:ss';
Session altered.
SQL> SELECT TABLE_NAME,PARTITION_NAME,NUM_ROWS,TO_CHAR(LAST_ANALYZED,'YYYYMMDD HH24:MI:SS') FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='SALES';
TABLE_NAME PARTITION_NAME NUM_ROWS TO_CHAR(LAST_ANAL
------------------------------ ------------------------------ ---------- -----------------
SALES P201301 99920 20140207 20:50:59
SALES P201302 96640 20140207 20:50:59
SALES P201303100680 20140207 20:50:59
SALES P201304100500 20140207 20:50:59
SALES P201305 99780 20140207 20:50:59
SALES P201306 0 20140207 20:50:59
SALES P201307 0 20140207 20:50:59
SALES P201308 0 20140207 20:50:59
SALES P201309 0 20140207 20:50:59
SALES P201310 0 20140207 20:50:59
SALES P201311 0 20140207 20:50:59
SALES P201312 0 20140207 20:50:59
12 rows selected.
SQL> select a.column_name,a.partition_name,a.num_distinct,display_raw(a.low_value, b.data_type) as low_val,display_raw(a.high_value, b.data_type) as high_val,b.data_type from dba_part_col_statistics a, dba_tab_cols bwhere a.owner = 'SYS' and b.owner = 'SYS' and a.table_name = 'SALES' and a.table_name = b.table_name and a.column_name = b.column_name and a.column_name = 'TIME_ID' and a.low_value is not nullorder by 1, 2/
COLUMN_NAME PARTITION_ NUM_DISTINCT LOW_VAL HIGH_VAL DATA_TYPE
-------------------- ---------- ------------ ------------------------------ ------------------------------
TIME_ID P201301 1 20130110 00:00:00 20130110 00:00:00 DATE
TIME_ID P201302 1 20130210 00:00:00 20130210 00:00:00 DATE
TIME_ID P201303 1 20130310 00:00:00 20130310 00:00:00 DATE
TIME_ID P201304 1 20130410 00:00:00 20130410 00:00:00 DATE
TIME_ID P201305 1 20130510 00:00:00 20130510 00:00:00 DATE
12 rows selected.
SQL> EXEC DBMS_STATS.COPY_TABLE_STATS (user, 'SALES', 'p201305', 'p201306', FORCE=>TRUE);
PL/SQL procedure successfully completed.
SQL> select a.column_name, a.partition_name, a.num_distinct, display_raw(a.low_value, b.data_type) as low_val, display_raw(a.high_value, b.data_type) as high_val, b.data_type from dba_part_col_statistics a, dba_tab_cols bwhere a.owner = 'SYS' and b.owner = 'SYS' and a.table_name = 'SALES' and a.table_name = b.table_name and a.column_name = b.column_name and a.column_name = 'TIME_ID' and a.low_value is not nullorder by 1, 2/
COLUMN_NAME PARTITION_ NUM_DISTINCT LOW_VAL HIGH_VAL DATA_TYPE
-------------------- ---------- ------------ ------------------------------ ------------------------------
TIME_ID P201301 1 20130110 00:00:00 20130110 00:00:00 DATE
TIME_ID P201302 1 20130210 00:00:00 20130210 00:00:00 DATE
TIME_ID P201303 1 20130310 00:00:00 20130310 00:00:00 DATE
TIME_ID P201304 1 20130410 00:00:00 20130410 00:00:00 DATE
TIME_ID P201305 1 20130510 00:00:00 20130510 00:00:00 DATE
TIME_ID P201306 1 20130601 00:00:00 20130701 00:00:00 DATE
6 rows selected.
SQL> SELECT TABLE_NAME,PARTITION_NAME,NUM_ROWS,TO_CHAR(LAST_ANALYZED,'YYYYMMDD HH24:MI:SS') FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='SALES';
TABLE_NAME PARTITION_ NUM_ROWS TO_CHAR(LAST_ANAL
------------------------------ ---------- ---------- -----------------
SALES P201301 100020 20140207 08:42:09
SALES P201302 98820 20140207 08:42:09
SALES P201303 98960 20140207 08:42:09
SALES P201304 100520 20140207 08:42:09
SALES P201305 100280 20140207 08:42:09
SALES P201306 100280 20140207 08:42:09
SALES P201307 0 20140207 08:42:09
SALES P201308 0 20140207 08:42:09
SALES P201309 0 20140207 08:42:09
SALES P201310 0 20140207 08:42:09
SALES P201311 0 20140207 08:42:09
SALES P201312 0 20140207 08:42:09
12 rows selected.
SQL> EXEC DBMS_STATS.COPY_TABLE_STATS (user, 'SALES', 'p201305', 'p201307', FORCE=>TRUE);
PL/SQL procedure successfully completed.
SQL> select a.column_name, a.partition_name, a.num_distinct, display_raw(a.low_value, b.data_type) as low_val, display_raw(a.high_value, b.data_type) as high_val, b.data_type from dba_part_col_statistics a, dba_tab_cols bwhere a.owner = 'SYS' and b.owner = 'SYS' and a.table_name = 'SALES' and a.table_name = b.table_name and a.column_name = b.column_name and a.column_name = 'TIME_ID' and a.low_value is not nullorder by 1, 2/
COLUMN_NAME PARTITION_ NUM_DISTINCT LOW_VAL HIGH_VAL DATA_TYPE
-------------------- ---------- ------------ ------------------------------ ------------------------------
TIME_ID P201301 1 20130110 00:00:00 20130110 00:00:00 DATE
TIME_ID P201302 1 20130210 00:00:00 20130210 00:00:00 DATE
TIME_ID P201303 1 20130310 00:00:00 20130310 00:00:00 DATE
TIME_ID P201304 1 20130410 00:00:00 20130410 00:00:00 DATE
TIME_ID P201305 1 20130510 00:00:00 20130510 00:00:00 DATE
TIME_ID P201306 1 20130601 00:00:00 20130701 00:00:00 DATE
TIME_ID P201307 1 20130701 00:00:00 20130801 00:00:00 DATE
7 rows selected.
SQL> SELECT TABLE_NAME,PARTITION_NAME,NUM_ROWS,TO_CHAR(LAST_ANALYZED,'YYYYMMDD HH24:MI:SS') FROM USER_TAB_PARTITIONS WHERE TABLE_NAME='SALES';
TABLE_NAME PARTITION_ NUM_ROWS TO_CHAR(LAST_ANAL
------------------------------ ---------- ---------- -----------------
SALES P201301 100020 20140207 08:42:09
SALES P201302 98820 20140207 08:42:09
SALES P201303 98960 20140207 08:42:09
SALES P201304 100520 20140207 08:42:09
SALES P201305 100280 20140207 08:42:09
SALES P201306 100280 20140207 08:42:09
SALES P201307 100280 20140207 08:42:09
SALES P201308 0 20140207 08:42:09
SALES P201309 0 20140207 08:42:09
SALES P201310 0 20140207 08:42:09
SALES P201311 0 20140207 08:42:09
SALES P201312 0 20140207 08:42:09
12 rows selected.
可以看出,对于分区键所在的列,oracle自动将列的最大值和最小值置为分区边界值,对于其它列的统计信息则是直接复制,虽然这不能完全反映数据分布的真实情况,但能最大程度的接近真实的统计信息。
以下我们看一个官方的例子:Example
SetupREM CREATE TABLEREM ============CREATE TABLE sales_range(salesman_id NUMBER(5),salesman_name VARCHAR2(30),sales_amount NUMBER(10),sales_date DATE)PARTITION BY RANGE(sales_date)(PARTITION sales_jan2000 VALUES LESS THAN(TO_DATE('02/01/2000','DD/MM/YYYY')),PARTITION sales_feb2000 VALUES LESS THAN(TO_DATE('03/01/2000','DD/MM/YYYY')),PARTITION sales_mar2000 VALUES LESS THAN(TO_DATE('04/01/2000','DD/MM/YYYY')),PARTITION sales_apr2000 VALUES LESS THAN(TO_DATE('05/01/2000','DD/MM/YYYY')));REM CREATE LOCAL INDEXREM ==================create index salesman_id_ix on sales_range (salesman_id) local;REM POPULATE TABLEREM ===============insert into sales_range values(10,'SCOTT',1000,'01-JAN-2000');insert into sales_range values(20,'SMITH',1200,'01-JAN-2000');insert into sales_range values(30,'ALLEN',1300,'01-JAN-2000');commit;REM GATHER STATISTICSREM ==================exec dbms_stats.gather_table_stats(ownname=>'', tabname=> 'sales_range', granularity=>'ALL');· Table SALES_RANGE has been created with 3 partitions containing a local index.· Data has been inserted into partition SALES_JAN2000 and statistics gathered on all partitions
SQL> select count(*) from sales_range partition (SALES_JAN2000); COUNT(*)---------- 3
SQL> select count(*) from sales_range partition (SALES_FEB2000); COUNT(*)---------- 0
SQL> select num_rows,AVG_ROW_LEN,BLOCKS,PARTITION_NAME from user_tab_partitions where table_name='SALES_RANGE'; NUM_ROWS AVG_ROW_LEN BLOCKS PARTITION_NAME---------- ----------- ---------- ------------------------------ 3 20 46 SALES_JAN2000 0 0 0 SALES_FEB2000 0 0 0 SALES_MAR2000 0 0 0 SALES_APR2000
SQL> select partition_name, num_rows,leaf_blocks,avg_leaf_blocks_per_key from user_ind_partitions where index_name='SALESMAN_ID_IX';PARTITION_NAME NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY------------------------------ ---------- ----------- -----------------------SALES_JAN2000 3 1 1SALES_FEB2000 0 0 0SALES_MAR2000 0 0 0SALES_APR2000 0 0 0· SALES_JAN2000 shows a count of 3 rows whereas SALES_FEB2000 shows 0 rows· USER_TAB_PARTITIONS and USER_TAB_INDEXES show non-zero information for partition SALES_JAN2000 that has been populated with data· The other partitions that do not contain data show all statisitc information as 0.Copying Statistics
SQL> EXEC DBMS_STATS.COPY_TABLE_STATS ('','SALES_RANGE','SALES_JAN2000','SALES_FEB2000',2);PL/SQL procedure successfully completed.Statistics are copied from SALES_JAN2000 partition to SALES_FEB2000 partition using a scale-factor of 2.
SQL> select num_rows,AVG_ROW_LEN,BLOCKS,PARTITION_NAME from user_tab_partitions where table_name='SALES_RANGE'; NUM_ROWS AVG_ROW_LEN BLOCKS PARTITION_NAME---------- ----------- ---------- ------------------------------ 3 20 46 SALES_JAN2000 6 20 92 SALES_FEB2000 0 0 0 SALES_MAR2000 0 0 0 SALES_APR2000
SQL> select partition_name, num_rows,leaf_blocks,avg_leaf_blocks_per_key from user_ind_partitions where index_name='SALESMAN_ID_IX';PARTITION_NAME NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY------------------------------ ---------- ----------- -----------------------SALES_JAN2000 3 1 1SALES_FEB2000 6 2 2SALES_MAR2000 0 0 0SALES_APR2000 0 0 0USER_TAB_PARTITIONS and USER_TAB_INDEXES now show non-zero information for partition SALES_FEB2000.
As we have used a scale-factor of 2, the copied statistics have been mutliplied by a factor of 2.