asm磁盘组空间满导致asm实例进程泄漏处理
作者:
1 问题描述
1)用户把两个原已属于vg的磁盘加入到asm 磁盘组dgdata中;
2)磁盘组dgrecover原本是用于存入归档日志文件,前一段时间该磁盘组出现过空间被占满情况,用rman的delete archivelog 命令清除后,磁盘组空间已释放,但仍不能用于存放归档文件,当把数据库归档路径指向该磁盘组时,数据库在线日志则无法切换,即在线日志无法归档。
[@more@]2 处理过程
1. 检查数据库ASM磁盘组空间使用情况,发现dgdata和dgrecover两个磁盘组都还有足够的空间
export ORACLE_SID=+ASM1
asmcmd
ASMCMD>lsdg
2. 检查ASM和数据库的报警日志,发现有报错,报错内容显示asm的process进程数达到最大限制,检查asm实例参数文件后发现processes参数是40,当前asm实例的总进程数是39,数据库log_archive_max_processes参数是2,asm实例无法分配足够的进程给数据库归档进程使用,导致数据库归档日志文件不能存放到asm的磁盘组中。在正常情况下,asm实例的总进程数不会有39个这么多,在metalink中查找后发现,这是Bug 6139547 Shadow process leak on ASM instance when diskspace exhausted (ORA-20),当asm磁盘组空间曾经被占满的情况下会触发该bug,导致asm实例进程泄漏,解决问题办法是重启数据库或者执行命令“alter system set log_archive_max_processes=1 scope=memory sid='*';” ,该bug在oracle10.2.0.4版本被修正。根据metalink文档的提示,执行“alter system set log_archive_max_processes=1 scope=memory sid='*';”后,检查asm实例的进程数下降到31个,重新把log_archive_max_processes参数改回2,asm实例进程数变成32个,asm进程恢复正常。
alert_+ASM1.log
Errors in file /oracle/app/oracle/admin/+ASM/bdump/+asm1_rbal_249918.trc:
ORA-00450: background process 'ARB0' did not start
ORA-00444: background process "ARB0" failed while starting
ORA-00020: maximum number of processes () exceeded
alert_app11.ora
Errors in file /oracle/app/oracle/admin/app1/bdump/app11_arc0_385660.trc:
ORA-19504: failed to create file "+DGRECOVER"
ORA-17502: ksfdcre:4 Failed to create file +DGRECOVER
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
Wed Dec 16 14:37:17 2009
ARC0: Error 19504 Creating archive log file to '+DGRECOVER'
ARCH: Archival stopped, error occurred. Will continue retrying
Wed Dec 16 14:37:17 2009
ORACLE Instance app11 - Archival Error
Wed Dec 16 14:37:17 2009
ORA-16038: log 2 sequence# 126197 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 2 thread 1: '+DGRECOVER/app1/onlinelog/group_2.258.623517813'
ORA-00312: online log 2 thread 1: '+DGSYSTEM/app1/onlinelog/group_2.264.623523601'
Wed Dec 16 14:37:17 2009
Errors in file /oracle/app/oracle/admin/app1/bdump/app11_arc0_385660.trc:
ORA-16038: log 2 sequence# 126197 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 2 thread 1: '+DGRECOVER/app1/onlinelog/group_2.258.623517813'
ORA-00312: online log 2 thread 1: '+DGSYSTEM/app1/onlinelog/group_2.264.623523601'
Wed Dec 16 14:37:19 2009
ARCH: Archival stopped, error occurred. Will continue retrying
Wed Dec 16 14:37:19 2009
ORACLE Instance app11 - Archival Error
Wed Dec 16 14:37:19 2009
ORA-16014: log 2 sequence# 126197 not archived, no available destinations
ORA-00312: online log 2 thread 1: '+DGRECOVER/app1/onlinelog/group_2.258.623517813'
ORA-00312: online log 2 thread 1: '+DGSYSTEM/app1/onlinelog/group_2.264.623523601'
Wed Dec 16 14:37:19 2009
Errors in file /oracle/app/oracle/admin/app1/bdump/app11_arc1_1765726.trc:
ORA-16014: log 2 sequence# 126197 not archived, no available destinations
ORA-00312: online log 2 thread 1: '+DGRECOVER/app1/onlinelog/group_2.258.623517813'
ORA-00312: online log 2 thread 1: '+DGSYSTEM/app1/onlinelog/group_2.264.623523601'
Wed Dec 16 14:39:00 2009
3. 检查v$asm_disk视图,两个原属于vg的磁盘(/dev/rhdisk59 /dev/rhdisk60)已加入到asm 磁盘组dgdata中。但这些磁盘已经有pvid,用户也曾经做过exportvg,建议用户把这两个磁盘从asm磁盘组中删除。
4. 对磁盘组做rebalance
SQL> alter diskgroup dgrecover rebalance power 11;
SQL> alter diskgroup dgdata rebalance power 11;
3 总结
1). oracle10.2.0.3版本,当asm磁盘组空间曾经被占满的情况下会触发bug6139547,导致asm实例进程泄漏
2). 已经被分配到vg的磁盘,仍然可以分配到asm的磁盘组中(这些磁盘有PVID),但建议不要这样做,目前还不清楚会不会引出其它问题???