数据库Oracle10g与Oracle11g分区表统计信息收集的方法
Oracle数据库10g与11g分区表统计信息收集的方法 How To Collect Statistics On Partitioned Table in 10g and 11g (文档ID1417133.1)
General guidelines[list]
[*]Gather optimizer statistics, dynamic sampling is not an adequate solution for large volume of data that are typical in tables in a Datawarehousing environment.
[*]If feasible run queries against empty table ( before data load ) to populate the columns usage. This provides valuable information for statistics gathering later.
[*]Statistics collection should be taking place after the data loaded to the table, but before indexes are created (Oracle automatically gathers statistics on indexes as they are created).
[*]For estimate percent: in 11g, use auto sample size, it has a new hash algorithm, very accurate for NDV (Number of Distinct Values), yet it takes minimal time.
[*]For sample size in 10g use 100% if possible, use histogram if you know data is skewed, otherwise be cautious. (Auto sample size tends to use small samples, gives poor estimates of NDVs, often obtains histograms due to column usage statistics, as columns are used in predicated over time.)
[*]Use Incremental Statistics Collection in 10g and 11g
Partitioned Tables Benefit from Global Statistics. Without Global Stats, CBO estimates stats from individual partitions, which can be inaccurate for NDVs. Global stats take longer to obtain and reduce manageability. Without incremental stats collection full table scans used to update global stats, with incremental stats collection when new partitions added, CBO gathers stats on new partition, global stats updated by scanning only updated partitions.
Incremental statistics collection in 10g
In 10.2.0.4 + Patch 6526370 and in 10.2.0.5 new value, 'APPROX_GLOBAL AND PARTITION' for the GRANULARITY parameter of the GATHER_TABLE_STATS procedures simulates the incremental maintenance of global statistics.
We update the NDV of the table. For partitioned column we update the NDV as the sum of NDV at the partition levels. Also we set the NDV of columns of local unique indexes as the number of rows of the table. For non-partitioning columns we do not update the NDV and number of distinct keys of the index at the global level.
In the following example we consider table SALES_INC which is partitioned by time_id, and has statistics gathered. The last partition is empty SALES_Q4_2001. After the partition SALES_Q4_2001 is loaded, we gather statistics by specifying 'APPROX_GLOBAL AND PARTITION' for the GRANULARITY.
Before loading partition SALES_Q4_2001
USER_TABLES show the following values for columns NUM_ROWS and Last_Analyzed:
TABLE_NAME NUM_ROWS Last Analyzed GLOBAL_ST
--------------- ---------- ------------------------------------------------ ---------
SALES_INC 849094 02/20/2012 14:24 YES
USER_TAB_PARTITIONS shows partition SALES_Q4_2001 empty:
TABLE_NAME PARTITION_NAME Last Analyzed GLOBAL_ST NUM_ROWS AVG_ROW_LEN HIGH_VALUE
--------------- --------------- ------------------------------------------------ --------- ---------- ----------- -------------------------------------------------- ...
SALES_INC SALES_Q4_2000 02/20/2012 14:24 YES 55984 28 TO_DATE(' 2001-01-01 00:00:00', 'SYYYY-MM-DD HH24:
SALES_INC SALES_Q4_2001 02/20/2012 14:24 YES 0 0 TO_DATE(' 2002-01-01 00:00:00', 'SYYYY-MM-DD HH24:
USER_TAB_COLUMNS
shows the following NDVs for the columns of the table:
TABLE_NAME COLUMN_NAME NUM_DISTINCT
--------------- -------------------- ------------
SALES_INC AMOUNT_SOLD 3296
SALES_INC CHANNEL_ID 4
SALES_INC CUST_ID 7056
SALES_INC PROD_ID 72
SALES_INC PROMO_ID 4
SALES_INC QUANTITY_SOLD 1
SALES_INC TIME_ID 1368
Then data into partition SALES_Q4_2001 has been loaded.
Statistics has been collected at 14:50 with
EXEC DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SH', tabname =>'SALES_INC', partname=>'SALES_Q4_2001', GRANULARITY => 'APPROX_GLOBAL AND PARTITION');
After data load and statistics collection for partition SALES_Q4_2001
USER_TAB_PARTITIONS shows partition level statistics were collected for SALES_Q4_2001 (compare columns NUM_ROWS and LAST_ANALYZED with the result obtained above)
TABLE_NAME PARTITION_NAME Last Analyzed GLOBAL_ST NUM_ROWS AVG_ROW_LEN HIGH_VALUE
--------------- --------------- ------------------------------------------------ --------- ---------- ----------- -------------------------------------------------- ...
SALES_INC SALES_Q4_2000 02/20/2012 14:24 YES 55984 28 TO_DATE(' 2001-01-01 00:00:00', 'SYY
SALES_INC SALES_Q4_2001 02/20/2012 14:50 YES 67872 28 TO_DATE(' 2002-01-01 00:00:00', 'SYY
Note last analyzed for the partition on which statistics needed to be collected shows 14:50. No stats collection was done on other partitions as those did not change since previous collection.
USER_TABLES shows NUM_ROWS increased by the rows of inserted during the data load to the latest partition. The value for the LAST_ANALYZED for the table changed to 14:50,
showing the last update time stamp of the global stats of the table.
TABLE_NAME NUM_ROWS Last Analyzed GLOBAL_ST
--------------- ---------- ------------------------------------------------ ---------
SALES_INC 916966 02/20/2012 14:50 YES
Note NDV of the partition key TIME_ID in USER_TAB_COLUMNS has been made up-to-date
TABLE_NAME COLUMN_NAME NUM_DISTINCT
--------------- -------------------- ------------
SALES_INC AMOUNT_SOLD 3296
SALES_INC CHANNEL_ID 4
SALES_INC CUST_ID 7056
SALES_INC PROD_ID 72
SALES_INC PROMO_ID 4
SALES_INC QUANTITY_SOLD 1
SALES_INC TIME_ID 1460
Incremental Statistics Collection in 11g
Oracle will update the global table statistics by scanning only the partitions or subpartitions that have been changed instead of the entire table if all of the following conditions hold:
[list]
[*]value of the INCREMENTAL property for the partitioned table is set to TRUE (default is false)
DBMS_STATS.SET_TABLE_PREFS(
[*]value of the PUBLISH property for the partitioned table is set to TRUE (default)
DBMS_STATS.SET_TABLE_PREFS(
[*]statistics collection is running on the table with auto sample size and granularity auto
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZEgranularity => 'AUTO'
Note about further restrictions:
[list]
[*]Granularity set to AUTO does not collect statistics on hash sub-partition tables, hence incremental statistics is not happening when the sub-partiton type is hash, see Note 1390718.1 for more information.
See Note 1592404.1 about how and when statistics are collected on partitioned table by the Automatic Optimizer Statistics Collection job.