oracle 9i 裸设备raw数据文件损坏的恢复过程
oracle 9i 裸设备数据文件损坏的恢复过程
环境 oracle 9.2.0.4+AIX 5300-02
1.问题描述
今天客户数据库前台应用报追加记录失败,以前一直都是正常的,我就赶紧查看数据库日志,发现在9:49分-10:02之间有很多以下报错,
Errors in file /ora/app/oracle/admin/ora9i/bdump/ora9i_ckpt_1011798.trc:
ORA-01171: datafile 76 going offline due to error advancing checkpoint
ORA-01110: data file 76: '/dev/rdata2_4_01_rw'
ORA-01115: IO error reading block from file 76 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 16: Device busy
Additional information: -1
Additional information: 8192
Mon Feb 23 10:02:20 2009
Errors in file /ora/app/oracle/admin/ora9i/bdump/ora9i_ckpt_1011798.trc:
ORA-01171: datafile 77 going offline due to error advancing checkpoint
ORA-01110: data file 77: '/dev/rdata2_4_02_rw'
ORA-01115: IO error reading block from file 77 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 16: Device busy
通过查询vg所在的lv裸设备,有问题的数据文件全是closed状态
oracle9i@p550b> lsvg -l ora04vg
ora04vg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
data4_4_01_rw jfs 128 128 2 closed/syncd N/A
data4_4_02_rw jfs 128 128 2 closed/syncd N/A
data4_4_03_rw jfs 128 128 2 closed/syncd N/A
data4_4_04_rw jfs 128 128 2 closed/syncd N/A
data4_4_05_rw jfs 128 128 2 closed/syncd N/A
data4_4_06_rw jfs 128 128 2 closed/syncd N/A
data4_4_07_rw jfs 128 128 2 closed/syncd N/A
data4_4_08_rw jfs 128 128 2 closed/syncd N/A
data4_4_09_rw jfs 128 128 2 closed/syncd N/A
data4_4_10_rw jfs 128 128 2 closed/syncd N/A
data4_4_11_rw jfs 128 128 2 closed/syncd N/A
data4_4_12_rw jfs 128 128 2 closed/syncd N/A
data4_4_13_rw jfs 128 128 2 closed/syncd N/A
data4_4_14_rw jfs 128 128 2 closed/syncd N/A
data4_4_15_rw jfs 128 128 2 closed/syncd N/A
data4_4_16_rw jfs 128 128 2 closed/syncd N/A
data4_4_17_rw jfs 128 128 2 closed/syncd N/A
data4_4_18_rw jfs 128 128 2 closed/syncd N/A
data4_4_19_rw jfs 128 128 2 closed/syncd N/A
data4_4_20_rw jfs 128 128 2 closed/syncd N/A
data1_4_01_rw jfs 32 32 2 closed/syncd N/A
data1_4_02_rw jfs 32 32 2 closed/syncd N/A
data1_4_03_rw jfs 32 32 2 closed/syncd N/A
data2_4_01_rw jfs 64 64 2 closed/syncd N/A
data2_4_02_rw jfs 64 64 2 closed/syncd N/A
data2_4_03_rw jfs 64 64 2 closed/syncd N/A
data2_4_04_rw jfs 64 64 2 closed/syncd N/A
data2_4_05_rw jfs 64 64 2 closed/syncd N/A
data2_4_06_rw jfs 64 64 2 closed/syncd N/A
data2_4_07_rw jfs 64 64 2 closed/syncd N/A
data2_4_08_rw jfs 64 64 2 closed/syncd N/A
oracle9i@p550b>
2.问题跟踪
我就和客户到了机房,我自己也就用前台那个应用程序做了下测试,插入一条记录提交后,报追加记录失败的错误,
但没有任何ora-相关错误代码体现出来,用pl/sql developer工具
连到数据库做测试
insert into stck (编号,户号) values(100000000,'test');
出错信息如下:
ORA-00376: file 106 cannot be read at this time
ORA-01110: data file 106: '/dev/rdata4_4_13_rw'
看到这个错误提示后,就联想到上午看的那些错误日志有关了,重启数据库后根本没解决,脑海中闪出的就是这个
裸设备出问题了
马上查这个数据文件的状态
select status,name from v$datafile where name='/dev/rdata2_4_01_rw';
发现状态是recover,想到上午很多裸设备报错,通过以下sql查询,很多数据文件都是recover状态
select status,name from v$datafile where status='RECOVER';
RECOVER /dev/rdata4_4_08_rw
RECOVER /dev/rdata4_4_09_rw
RECOVER /dev/rdata4_4_10_rw
RECOVER /dev/rdata4_4_11_rw
RECOVER /dev/rdata4_4_12_rw
..........
然后查询数据文件所属表空间都是online状态,就只能用抱着用介质恢复recover datafile file_name 应用
归档来看看能不能解决。
3.问题解决
3.1 先恢复今天的归档日志,调用备份软件的参数
p550b# su - oracle9i
oracle9i@p550b> rman target / catalog=rman/rman@rman
Recovery Manager: Release 9.2.0.4.0 - 64bit Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORA9I (DBID=2362852444)
connected to recovery catalog database
RMAN> run{
2> allocate channel t1 type 'SBT_TAPE'
3> parms 'ENV=(NB_ORA_SERV=backup_server,
4> NB_ORA_CLIENT=p550b,
5> NB_ORA_POLICY=p550b_arch)';
6> restore archivelog from logseq 26861;
7> release channel t1;
8> }
allocated channel: t1
channel t1: sid=70 devtype=SBT_TAPE
channel t1: VERITAS NetBackup for Oracle - Release 5.0GA (2004111820)
Starting restore at 2009-02-23:16:04:58
archive log thread 1 sequence 26871 is already on disk as file /arch/1_26871.dbf
archive log thread 1 sequence 26872 is already on disk as file /arch/1_26872.dbf
archive log thread 1 sequence 26873 is already on disk as file /arch/1_26873.dbf
archive log thread 1 sequence 26874 is already on disk as file /arch/1_26874.dbf
archive log thread 1 sequence 26875 is already on disk as file /arch/1_26875.dbf
archive log thread 1 sequence 26876 is already on disk as file /arch/1_26876.dbf
channel t1: starting archive log restore to default destination
channel t1: restoring archive log
archive log thread=1 sequence=26861
channel t1: restored backup piece 1
piece handle=oracle_arch_2810_1_679584083 tag=TAG20090223T132122 params=NULL
channel t1: restore complete
channel t1: starting archive log restore to default destination
channel t1: restoring archive log
archive log thread=1 sequence=26862
channel t1: restoring archive log
archive log thread=1 sequence=26863
channel t1: restoring archive log
archive log thread=1 sequence=26864
channel t1: restoring archive log
archive log thread=1 sequence=26865
channel t1: restoring archive log
archive log thread=1 sequence=26866
channel t1: restoring archive log
archive log thread=1 sequence=26867
channel t1: restoring archive log
archive log thread=1 sequence=26868
channel t1: restored backup piece 1
piece handle=oracle_arch_2809_1_679584083 tag=TAG20090223T132122 params=NULL
channel t1: restore complete
channel t1: starting archive log restore to default destination
channel t1: restoring archive log
archive log thread=1 sequence=26869
channel t1: restoring archive log
archive log thread=1 sequence=26870
channel t1: restored backup piece 1
piece handle=oracle_arch_2811_1_679584143 tag=TAG20090223T132122 params=NULL
channel t1: restore complete
Finished restore at 2009-02-23:16:07:21
released channel: t1
RMAN>
3.2 介质恢复
p550b# su - oracle9i
oracle9i@p550b> sqlplus "/as sysdba"
SQL*Plus: Release 9.2.0.4.0 - Production on Mon Feb 23 15:54:22 2009
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.4.0 - Production
SQL> recover datafile 106;
ORA-00279: change 9463530758543 generated at 02/23/2009 08:37:55 needed for
thread 1
ORA-00289: suggestion : /arch/1_26862.dbf
ORA-00280: change 9463530758543 for thread 1 is in sequence #26862
Specify log: {
auto
ORA-00308: cannot open archived log '/arch/1_26862.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-00308: cannot open archived log '/arch/1_26862.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
SQL> recover datafile 106;
ORA-00279: change 9463530758543 generated at 02/23/2009 08:37:55 needed for
thread 1
ORA-00289: suggestion : /arch/1_26862.dbf
ORA-00280: change 9463530758543 for thread 1 is in sequence #26862
Specify log: {
auto
ORA-00279: change 9463818933167 generated at 02/23/2009 09:38:47 needed for
thread 1
ORA-00289: suggestion : /arch/1_26863.dbf
ORA-00280: change 9463818933167 for thread 1 is in sequence #26863
ORA-00278: log file '/arch/1_26862.dbf' no longer needed for this recovery
Log applied.
Media recovery complete.
SQL> alter database datafile '/dev/rdata4_4_13_rw' online;
Database altered.
SQL>
其他数据文件也是用同样方法,也可以用
recover datafile '/dev/rdata4_4_13_rw'
3.3问题确认解决
insert into stck (编号,户号) values(100000000,'test');
能正常插入
这时查那那些有问题的lv状态,属于open状态
ora04vg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
data4_4_07_rw jfs 128 128 2 open/syncd N/A
data4_4_08_rw jfs 128 128 2 open/syncd N/A
data4_4_09_rw jfs 128 128 2 open/syncd N/A
data4_4_13_rw jfs 128 128 2 open/syncd N/A
data4_4_20_rw jfs 128 128 2 open/syncd N/A
data1_4_01_rw jfs 32 32 2 open/syncd N/A
data1_4_02_rw jfs 32 32 2 open/syncd N/A
data1_4_03_rw jfs 32 32 2 open/syncd N/A
data2_4_01_rw jfs 64 64 2 open/syncd N/A
data2_4_02_rw jfs 64 64 2 open/syncd N/A
oracle9i@p550b>
客户经过测试也已经正常了。