关于Oracle出现WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!的分析过程

教程发布:风哥 教程分类:ITPUX技术网 更新日期:2022-02-12 浏览学习:988

关于Oracle出现WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!的分析过程

关于这个报错,主要根据trc文件来判断Enqueue Type,根据不同的类似去分析处理,一般有以下几种类型:
DC_TABLESPACES
DC_SEQUENCES
DC_USERS
DC_OBJECT_IDS
DC_SEGMENTS
DC_ROLLBACK_SEGMENTS
DC_TABLE_SCNS
DC_AWR_CONTROL

如果是RAC/OPS环境,前台进程发出锁请求,LCK0进程发出锁请求。如果是单实例模式,由前台进程直接发出锁请求。
在RAC/OPS环境下,前台进程会循环等待锁的获取,最多会等待60秒钟。在单实例环境,前台进程会循环1000次,等待3秒钟。PMON进程无论在哪种模式,都会等待5秒钟。
要注意的是单实例模式下和多实例模式下申请该锁调用的模块是不同的(kqrget()- 单实例,kqgigt()- 多实例)。
如果发现这个等待十分高,一般来说可能由于2种原因,一是共享池太小了,需要增加共享池,另外一种情况是SQL分析过于频繁,对于共享池的并发访问量过大。对于任何一种情况,绝大多数情况下加大共享池会有助于降低该等待,不过加大共享池的时候也要注意,并不一定所有的情况下增加共享池都会有明显的效果,特别是对于第二种情况,精确的分析十分重要。另外进一步分析,弄清楚哪些ROW CACHE的等待最为严重,有助于解决问题。

以下部分内容来自Oracle官方ID 278316.1

Troubleshooting StepsWhat is a Row Cache Enqueue Lock?

The Row Cache or Data Dictionary Cache is a memory area in the shared pool that holds data dictionary information. Row cache holds data as rows instead of buffers.A Row cache enqueue lock is a lock on the data dictionary rows. The enqueue will be on a specific data dictionary object. This is called the enqueue type and can be found in the v$rowcache view

For a list of row cache types per version see:
Document 468334.1 How to Match a Row Cache Object Child Latch to its Row Cache

What is the meaning of the warning WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK?

This wait event is used when we are trying to acquire a lock on a rowcache entry.
When Row cache contention occurs, if the enqueue cannot be obtained within a certain predetermined time period, a trace file will be generated in the user_dump_dest or background_dump_dest depending on whether a user or background process created the trace file. The alert log is usually updated accordingly with the warning and the location of the trace file.
The database detects that a key resource is being held for too long and flags this up to the administrator so that this situation can be resolved.This may also be accompanied by database hang or slowdown.
The message in the alert.log and the trace file generated will tend to contain the message:

> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<< If the rowcache entry lock cannot be granted immediately then we enter a loop where we free the row cache objects latch, wait (using the above event), re-acquire the latch and then again try to acquire the rowcache lock. In single-instance mode, this is repeated for up to 1000 times before the process reports WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK message. Under RAC it repeats until we fail to acquire the instance lock or we are interrupted. The systemstate dumped can provide some useful information for diagnosing the cause of the contention. Note: The "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message is raised when a threshold is reached and so if that threshold is NOT reached it will not be raised. This means that less severe issues, with identical causes, may not output the message. Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"SGA Shrink/Resize Operations When the SGA is dynamically resized, varous latches need to be held to prevent changes from being made while the operation completes. If the resize takes a while or is happening frequently you can see "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" occurring. The key identifiers for this is high waits for 'SGA: allocation forcing component growth' or similar waits at the top of waits in AWR and the blocking session for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" waiting for waiting for 'SGA: allocation forcing component growth' (or similar). There are a couple of fixes available, see: Document 7189722.8 Bug 7189722 - Frequent grow/shrink SGA resize operations Document 9267837.8 Bug 9267837 - Auto-SGA policy may see larger resizes than needed Issues by Row Cache Enqueue Type For each enqueue type, there are a limited number of operations that require each enqueue. The enqueue type therefore may give an indication as the type of operation that may be causing the issue. As such some common reasons are outlined below: DC_TABLESPACES Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents . DC_SEQUENCES Check for appropriate caching of sequences for the application requirements.(这里可以检查dba_sequences视图,判断sequences cache_size是否为0,如果当前比较值比较大,建议调整cache_size至少1000,同时也需要优化一些nocache的sequences) Document 853652.1 RAC and Sequences Document 395314.1 RAC Hangs due to small cache size on SYS.AUDSES$ - fixed in 10.2.0.3 Document 6027068.8 Bug 6027068 - Contention on ORA_TQ_BASE sequence -fixed in 10.2.0.5 and 11.2.0.1 DC_USERS Deadlock and resulting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database. Document 4604972.8 Bug 4604972 - Deadlock on dc_users by Concurrent Grant/Revoke - fixed in 11.1.0.6 Document 6143420.8 Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users AND "CURSOR: PIN S WAIT ON X"- fixed in 10.2.0.5 and 11.1.0.6 DC_OBJECTS Document 12772404.8 Bug 12772404 - Significant "row cache objects" latch contention when using VPD DC_OBJECT_IDS Document 11693365.8 Bug 11693365 - Concurrent Drop table and Select on Reference constraint table hangs(deadlock) - fixed in 12.1 DC_SEGMENTS This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose. DC_ROLLBACK_SEGMENTS This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required. DC_TABLE_SCNS Document 5756769.8 Bug 5756769 - Deadlock between Create MVIEW and DML - fixed in 10.2.0.5 ,11.1.07 and 11.2.0.1 DC_AWR_CONTROL This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these. RAC Specific Bugs Document 6004916.8 Bug 6004916 - Hang involving row cache enqueues in RAC (ORA-4021) - fixed in 102.0.5 and 11.1.0.6 Document 8666117.8 Bug 8666117 - High row cache latch contention in RAC - fixed in 11.2.0.2 and 12.1 Document 9866045.8 Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits - fixed in 12.1 What information can be gathered to help identify the cause?Systemstate dump A systemstate dump is automatically generated by Oracle when the issue occurs and listed in the alert.log Wed Sep 21 13:39:19 2011 > WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
System State dumped to trace file /oracle/diag/rdbms/..../.trc

AWR, ADDM and ASH Reports
Run AWR and ASH reports for the time when the problem is reported as well as a report leading up to the problem as these can sometimes help build a picture of when a problem actually started. The AWR ,ADDM and ASH reports can compliment each other in getting a complete picture.

Depending on the interval used for generating AWR snapshot, get a report for the smallest time frame available. The default for snapshots is an hourly interval.
SQL>@$ORACLE_HOME/rdbms/admin/awrrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/addmrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql

As systemstate is complicated to analyze, it is recommended to create a Service Request and upload the alert.log, systemstate dump and AWR reports preceding and during the problem to support.

How to interpret the information obtainedSystemstate dump
Often the wait for a Row cache enqueue is the culmination of a chain of events and the lock being held is a symptom of another issue where a process holding the requested row cache enqueue is being blocked by other processes. ie it is often a symptom, not the cause.

下边是几种关于Enqueue的案例分析:

Systemstate dumps can help to find which row cache is being requested and may help find the blocking process.

Example 1:
Oracle process number: 77
Unix process pid: 10846, image: oracle@cpdb4532

*** 2011-05-13 08:08:58.775
*** SERVICE NAME:(ALFCMR_SERVICE) 2011-05-13 08:08:58.775
*** SESSION ID:(1076.796) 2011-05-13 08:08:58.775
> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<< row cache enqueue: session: 0x1df57ade8, mode: N, request: S The header of the trace shows: [list] [*]the Oracle process number (pid) of the process waiting for the row cache enqueue lock (in this case Process 77) [*]the mode row cache enqueue is being requested in (request: S) So, in the example above Process 77 is waiting for a row cache needed in shared mode (request: S) . Systemstate contains process state information for every process in the database so looking for this process in the systemstate: PROCESS 77 ---------------------------------------- . . ---------------------------------------- SO: 0x1cdf11958, type: 50, owner: 0x17d198288, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=0x1df57ade8 object=0x1dc9a5d30, request=S savepoint=0x87b70d row cache parent object: address=0x1dc9a5d30 cid=7(dc_users) . . Above we see that PROCESS 77 is requesting row cache dc_users in shared mode. If process 77 is waiting that implies it is being blocked by another process so we now need to search the systemstate to determine what is holding the resource and blocking the process. It is best to search for the object reference which in this case is object=0x1dc9a5d30. If this is done we find that Process 218 is requesting this object in eXclusive: PROCESS 218: ---------------------------------------- . . SO: 0x1cdf118f8, type: 50, owner: 0x1ceb0f178, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=0x1da54cf68 object=0x1dc9a5d30, request=X savepoint=0x11e row cache parent object: address=0x1dc9a5d30 cid=7(dc_users) A request in Exclusive mode will result in any requests in shared mode to wait behind it until the process is granted the request in exclusive mode and has released it. Thus this will block. Note this is a request for an exclusive not an exclusive hold, so something must be blocking this process. Looking at the other processes, we see that Process 164 has the object held in mode=S PROCESS 164: ---------------------------------------- . . O/S info: user: u1m, term: , ospid: 1234, machine: cpc44711 program: last wait for 'SQL*Net message from client' blocking sess=0x(nil) seq=36289 wait_time=6943 seconds since wait started=2539 driver id=54435000, #bytes=1, =0 . . SO: 0x1cdf11418, type: 50, owner: 0x1ccc26120, flag: INIT/-/-/0x00 row cache enqueue: count=2 session=0x1df578318 object=0x1dc9a5d30, mode=S savepoint=0xb1bd8e row cache parent object: address=0x1dc9a5d30 cid=7(dc_users) hash=fc968070 typ=11 transaction=(nil) flags=00000002 own=0x1dc9a5e00[0x1cdf11448,0x1cdf11448] wat=0x1dc9a5e10[0x1cdf11928,0x17d5192e0] mode=S So, Process 164 is holding the row cache enqueue in Shared mode (mode=S) and thus preventing process 218 from obtaining the row cache enqueue in eXclusive. Furthermore we see that the process 164 is on CPU (the systemstate shows last wait for 'SQL*Net message from client' and not waiting for 'SQL*Net message from client'). To further diagnose support would need to check the stack calls included in the dump to determine why this session was on CPU and holding this enqueue for so long (based on 'seconds since wait started=2539'). Example 2: In this example PROCESS 18 (MMON) was waiting for row cache type dc_awr_control in SHARED mode. Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Oracle Label Security, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /opt/oracle10/product/10.2.0 System name: SunOS Node name: saecopt51 Release: 5.10 Version: Generic_144488-04 Machine: sun4v Instance name: PORT_V16 Redo thread mounted by this instance: 1 Oracle process number: 18 Unix process pid: 6196, image: oracle@sae (MMON) . . PROCESS 18: ---------------------------------------- . . last wait for 'ksdxexeotherwait' wait_time=0.000013 sec, seconds since wait started=6 . . SO: 39bf1f0e8, type: 50, owner: 3980783a0, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=3be37ea80 object=39a79f090, request=S savepoint=0x41f0ae row cache parent object: address=39a79f090 cid=22(dc_awr_control) hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X . . The row cache lock for this object (object=39a79f090) is being held by PROCESS 269 in exclusive mode (mode=X). The process is waiting for 'SGA: allocation forcing component growth' . PROCESS 269: ---------------------------------------- . . waiting for 'SGA: allocation forcing component growth' wait_time=0, seconds since wait started=3 . . SO: 39bf1f148, type: 50, owner: 3bc39f560, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=3be1b7c98 object=39a79f090, mode=X savepoint=0x41efe8 row cache parent object: address=39a79f090 cid=22(dc_awr_control) hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X . . Thus the root cause in this case is the resizing of SGA and the wait on row cache is secondary result of this. If we use the AWR report for the period to correlate this information: AWR Report Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- SGA: allocation forcing compon 42,067,317 38,469 1 7.6 Other CPU time 2,796 0.6 db file sequential read 132,906 929 7 0.2 User I/O latch free 4,282,858 704 0 0.1 Other log file switch (checkpoint in 904 560 620 0.1 Configurat ------------------------------------------------------------- We can see clearly in the Top 5 Timed Events that there is significant waiting for this event across the system and 'SGA: allocation forcing component growth' is a major issue at this time. The root cause of the "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message is the resizing activity. The top 5 events do not even show waits for the 'row cache' symptom. NOTE: If the re-sizing activity is less severe, you may not get a "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message - primarily because is a threshold before this is raised. You may however see waits for other events. One common example is outlined in: Document 742599.1 High 'cursor: pin S wait on X' and/or 'library cache lock' Waits Generated by Frequent Shared Pool/Buffer Cache Resize Activity For Frequent resizing, there are a couple of potential fixes available, see: Document 7189722.8 Bug 7189722 - Frequent grow/shrink SGA resize operations Document 9267837.8 Bug 9267837 - Auto-SGA policy may see larger resizes than needed Possible Issue in Pre-10g versions Prior to 10g there was a limitation to detect deadlock at the row cache level. Possible workarounds to minimize the possibility of a deadlock occurring are: [list] [*]set TIMED_STATISTICS=FALSE [*]set _row_cache_cursors=20 or more (10 default) [*]don't do any kind of tracing Document 30802.1 Init.ora Parameter "ROW_CACHE_CURSORS" Reference Note Troubleshooting Other Issues For guidance troubleshooting other performance issues see: Document 1377446.1 Troubleshooting Performance Issues

本文标签:
网站声明:本文由风哥整理发布,转载请保留此段声明,本站所有内容将不对其使用后果做任何承诺,请读者谨慎使用!
【上一篇】
【下一篇】