Recovery cases: perte d’un membre de redo log

Recovery en cas de perte d’un membre de redo log. Le système continue car les redo logs sont multiplexés. Il suffit de les recréer (après un switch logfile si c’est le courant qui manque).

Ici un exemple en utilisant Recovery Advisor, sur une base de demo créée ici.

j’ai les logfiles suivants (3 groupes de 2 membres):

RMAN> select 'group#='||group#||' member='||member from v$logfile order by 1;
using target database control file instead of recovery catalog
'GROUP#='||GROUP#||'MEMBER='||MEMBER                                            
--------------------------------------------------------------------------------
group#=1 member=/u/DEMO/DATA/DEMO100/redo01.log         
group#=1 member=/u/DEMO/DATA/DEMO100/redo01bis.log      
group#=2 member=/u/DEMO/DATA/DEMO100/redo02.log         
group#=2 member=/u/DEMO/DATA/DEMO100/redo02bis.log      
group#=3 member=/u/DEMO/DATA/DEMO100/redo03.log         
group#=3 member=/u/DEMO/DATA/DEMO100/redo03bis.log      

Je supprime un membre de chaque groupe:

+ rm /u/DEMO/DATA/DEMO100/redo01bis.log /u/DEMO/DATA/DEMO100/redo02bis.log /u/DEMO/DATA/DEMO100/redo03bis.log
+ ls -l /u/DEMO/DATA/DEMO100/redo*.log 
-rw-r-----. 1 oracle oinstall 52429312 Nov 15 19:42 /u/DEMO/DATA/DEMO100/redo01.log
-rw-r-----. 1 oracle oinstall 52429312 Nov 15 19:42 /u/DEMO/DATA/DEMO100/redo02.log
-rw-r-----. 1 oracle oinstall 52429312 Nov 15 19:38 /u/DEMO/DATA/DEMO100/redo03.log

La base continue de fonctionner:

RMAN> commit;
Statement processed
RMAN> alter system switch logfile;
Statement processed
RMAN> alter system switch logfile;
Statement processed
RMAN> alter system archive log current;
Statement processed

Mais avec des erreurs dans l’alert.log:

Fri Nov 15 19:42:33 2013
Errors in file /u/DEMO/diag/rdbms/demo100/DEMO100/trace/DEMO100_lgwr_13104.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u/DEMO/DATA/DEMO100/redo03bis.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Fri Nov 15 19:42:33 2013
Errors in file /u/DEMO/diag/rdbms/demo100/DEMO100/trace/DEMO100_lgwr_13104.trc:
ORA-00321: log 3 of thread 1, cannot update log file header
ORA-00312: online log 3 thread 1: '/u/DEMO/DATA/DEMO100/redo03bis.log'
Fri Nov 15 19:42:33 2013
Errors in file /u/DEMO/diag/rdbms/demo100/DEMO100/trace/DEMO100_lgwr_13104.trc:
ORA-00313: open failed for members of log group 3 of thread 1
Fri Nov 15 19:42:33 2013
Thread 1 advanced to log sequence 24 (LGWR switch)
  Current log# 3 seq# 24 mem# 0: /u/DEMO/DATA/DEMO100/redo03.log

Voici la résolution automatique avec Recovery Advisor:

RMAN> list failure;

RMAN> list failure;
using target database control file instead of recovery catalog
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6289       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing
6286       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo02bis.log is missing
6283       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo01bis.log is missing


RMAN> advise failure;

RMAN> advise failure;
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6289       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing
6286       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo02bis.log is missing
6283       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo01bis.log is missing

analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=130 device type=DISK
analyzing automatic repair options complete

Mandatory Manual Actions
========================
1. If file /u/DEMO/DATA/DEMO100/redo03bis.log was unintentionally renamed or moved, restore it
2. Contact Oracle Support Services if the preceding recommendations cannot be used, or if they do not fix the failures selected for repair

Optional Manual Actions
=======================
1. If file /u/DEMO/DATA/DEMO100/redo02bis.log was unintentionally renamed or moved, restore it
2. If file /u/DEMO/DATA/DEMO100/redo01bis.log was unintentionally renamed or moved, restore it

Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Drop and re-create redo log group member /u/DEMO/DATA/DEMO100/redo02bis.log; Drop and re-create redo log group member /u/DEMO/DATA/DEMO100/redo01bis.log 
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_1059050515.hm

On remarque qu’il ne s’agit que des groupes 1 et 2.
La proposition étant de les supprimer et les recréer, on ne peut pas faire ça pour le redo log courant.

RMAN> repair failure;

RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_1059050515.hm

contents of repair script:
   # drop/recreate redo log group member
   sql "begin sys.dbms_ir.execsqlscript(filename => ''/u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_1999793821.hm'' ); end;";
   # drop/recreate redo log group member
   sql "begin sys.dbms_ir.execsqlscript(filename => ''/u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_93228091.hm'' ); end;";

Do you really want to execute the above repair (enter YES or NO)? executing repair script

sql statement: begin sys.dbms_ir.execsqlscript(filename => ''/u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_1999793821.hm'' ); end;

sql statement: begin sys.dbms_ir.execsqlscript(filename => ''/u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_93228091.hm'' ); end;
repair failure complete

Le contenu des scripts reco_1999793821.hm et reco_93228091.hm’ sont:

begin
/*Drop the Log Member*/
execute immediate 'ALTER DATABASE DROP LOGFILE MEMBER ''/u/DEMO/DATA/DEMO100/redo02bis.log''';
/*Add the Log Member (This will create a new file)*/
execute immediate 'ALTER DATABASE ADD LOGFILE MEMBER ''/u/DEMO/DATA/DEMO100/redo02bis.log'' REUSE TO GROUP 2';
end;
begin
/*Drop the Log Member*/
execute immediate 'ALTER DATABASE DROP LOGFILE MEMBER ''/u/DEMO/DATA/DEMO100/redo01bis.log''';
/*Add the Log Member (This will create a new file)*/
execute immediate 'ALTER DATABASE ADD LOGFILE MEMBER ''/u/DEMO/DATA/DEMO100/redo01bis.log'' REUSE TO GROUP 1';
end;

On a donc recréé les groupes 1 et 2.

Mais il reste le group 3 auquel il manque un fichier:

RMAN> list failure;

RMAN> list failure;
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6289       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing


RMAN> advise failure;

RMAN> advise failure;
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6289       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing

analyzing automatic repair options; this may take some time
using channel ORA_DISK_1
analyzing automatic repair options complete

Mandatory Manual Actions
========================
1. If file /u/DEMO/DATA/DEMO100/redo03bis.log was unintentionally renamed or moved, restore it
2. Contact Oracle Support Services if the preceding recommendations cannot be used, or if they do not fix the failures selected for repair

Optional Manual Actions
=======================
no manual actions available

Automated Repair Options
========================
no automatic repair options available

Recovery advisor ne nous propose rien pour celui-ci ca c’est le redo log courant.
C’est pourtant simple: il suffit de faire un switch log file:


RMAN> alter system switch logfile;

RMAN> alter system switch logfile;
Statement processed

Et Recovery Advisor a une proposition comme plus haut: une fois qu’il n’est plsu actif, on peut le supprimer et le recréer.

RMAN> advise failure;

RMAN> advise failure;
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6289       HIGH     OPEN      15-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing

analyzing automatic repair options; this may take some time
using channel ORA_DISK_1
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
1. If file /u/DEMO/DATA/DEMO100/redo03bis.log was unintentionally renamed or moved, restore it

Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Drop and re-create redo log group member /u/DEMO/DATA/DEMO100/redo03bis.log  
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_743548653.hm


RMAN> repair failure;

RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_743548653.hm

contents of repair script:
   # drop/recreate redo log group member
   sql "begin sys.dbms_ir.execsqlscript(filename => ''/u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_3147502739.hm'' ); end;";

Do you really want to execute the above repair (enter YES or NO)? executing repair script

sql statement: begin sys.dbms_ir.execsqlscript(filename => ''/u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_3147502739.hm'' ); end;
repair failure complete
Recovery Manager complete.

Le script reco_3147502739.hm étant:

begin
/*Drop the Log Member*/
execute immediate 'ALTER DATABASE DROP LOGFILE MEMBER ''/u/DEMO/DATA/DEMO100/redo03bis.log''';
/*Add the Log Member (This will create a new file)*/
execute immediate 'ALTER DATABASE ADD LOGFILE MEMBER ''/u/DEMO/DATA/DEMO100/redo03bis.log'' REUSE TO GROUP 3';
end;

La base est restée ouverte et fonctionnelle grâce au fait d’avoir multiplexé les redo logs. Aucune perte de données.
Dommage que Recovery Advisor n’ait pas eu l’idée de faire un switch logfile pour réparer le log courant aussi.

Laisser un commentaire