Recovery cases: perte de tous les redo log

Recovery en cas de perte de tous les redo logs. Le système ne peut pas continuer. Les transactions couvertes par les redo perdus, mais pas encore appliquées par le checkpoint, sont perdues.
Il faut restaurer toute la base et appliquer tout le redo disponible (archived redo logs) puis ouvrir la base en resetlogs, et faire un nouveau backup.

Ici un exemple en utilisant Recovery Advisor, sur une base de demo créée ici.

J’ai les redo logs suivants:

RMAN> select ‘group#=’||group#||’ member=’||member from v$logfile order by 1;

RMAN> select 'group#='||group#||' member='||member from v$logfile order by 1;
using target database control file instead of recovery catalog
'GROUP#='||GROUP#||'MEMBER='||MEMBER                                            
--------------------------------------------------------------------------------
group#=1 member=/u/DEMO/DATA/DEMO100/redo01.log         
group#=1 member=/u/DEMO/DATA/DEMO100/redo01bis.log      
group#=2 member=/u/DEMO/DATA/DEMO100/redo02.log         
group#=2 member=/u/DEMO/DATA/DEMO100/redo02bis.log      
group#=3 member=/u/DEMO/DATA/DEMO100/redo03.log         
group#=3 member=/u/DEMO/DATA/DEMO100/redo03bis.log      

Je supprime tous les redo logs:

Failure simulation

+ rm /u/DEMO/DATA/DEMO100/redo01bis.log /u/DEMO/DATA/DEMO100/redo01.log /u/DEMO/DATA/DEMO100/redo02bis.log /u/DEMO/DATA/DEMO100/redo02.log /u/DEMO/DATA/DEMO100/redo03bis.log /u/DEMO/DATA/DEMO100/redo03.log
+ ls -l '/u/DEMO/DATA/DEMO100/redo0*.log'
ls: cannot access /u/DEMO/DATA/DEMO100/redo0*.log: No such file or directory

Si on pouvait toujours écrire dans les fichiers, il faudrait immédiatement faire un checkpoint et un shutdown normal. Mais ici c’est trop tard:

RMAN> alter system archive log current;

RMAN> alter system archive log current;
using target database control file instead of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of sql statement command at 11/16/2013 21:00:40
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u/DEMO/DATA/DEMO100/redo03bis.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/u/DEMO/DATA/DEMO100/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

Et je dois faire un shutdown abort et redémarrer en mount:

RMAN> shutdown abort
Oracle instance shut down

RMAN> startup
Oracle instance started
database mounted
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of startup command at 11/16/2013 21:01:00
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u/DEMO/DATA/DEMO100/redo02bis.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 2 thread 1: '/u/DEMO/DATA/DEMO100/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

Voici la résolution automatique avec Recovery Advisor:

RMAN> list failure;

RMAN> list failure;
using target database control file instead of recovery catalog
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6679       CRITICAL OPEN      16-NOV-13     Online log group 3 is unavailable
6670       CRITICAL OPEN      16-NOV-13     Online log group 2 is unavailable
6661       CRITICAL OPEN      16-NOV-13     Online log group 1 is unavailable
6685       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing
6682       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03.log is missing
6676       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo02bis.log is missing
6673       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo02.log is missing
6667       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo01bis.log is missing
6664       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo01.log is missing


RMAN> advise failure;

RMAN> advise failure;
Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
6679       CRITICAL OPEN      16-NOV-13     Online log group 3 is unavailable
6670       CRITICAL OPEN      16-NOV-13     Online log group 2 is unavailable
6661       CRITICAL OPEN      16-NOV-13     Online log group 1 is unavailable
6685       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03bis.log is missing
6682       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo03.log is missing
6676       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo02bis.log is missing
6673       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo02.log is missing
6667       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo01bis.log is missing
6664       HIGH     OPEN      16-NOV-13     Online log member /u/DEMO/DATA/DEMO100/redo01.log is missing

analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=365 device type=DISK
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
1. If file /u/DEMO/DATA/DEMO100/redo03bis.log was unintentionally renamed or moved, restore it
2. If file /u/DEMO/DATA/DEMO100/redo03.log was unintentionally renamed or moved, restore it
3. If file /u/DEMO/DATA/DEMO100/redo02bis.log was unintentionally renamed or moved, restore it
4. If file /u/DEMO/DATA/DEMO100/redo02.log was unintentionally renamed or moved, restore it
5. If file /u/DEMO/DATA/DEMO100/redo01bis.log was unintentionally renamed or moved, restore it
6. If file /u/DEMO/DATA/DEMO100/redo01.log was unintentionally renamed or moved, restore it
7. Try flush redo using ALTER SYSTEM FLUSH REDO TO 'standby name' command.  Then perform a Data Guard role change (failover).  Available standbys: DEMO110.

Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Perform incomplete database recovery to SCN 2263300  
  Strategy: The repair includes point-in-time recovery with some data loss
  Repair script: /u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_3275529026.hm


RMAN> repair failure;

RMAN> repair failure;
Strategy: The repair includes point-in-time recovery with some data loss
Repair script: /u/DEMO/diag/rdbms/demo100/DEMO100/hm/reco_3275529026.hm

contents of repair script:
   # database point-in-time recovery
   restore database until scn 2263300;
   recover database until scn 2263300;
   alter database open resetlogs;

Do you really want to execute the above repair (enter YES or NO)? executing repair script

Starting restore at 16-NOV-13
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /u/DEMO/DATA/DEMO100/system01.dbf
channel ORA_DISK_1: restoring datafile 00002 to /u/DEMO/DATA/DEMO100/example01.dbf
channel ORA_DISK_1: restoring datafile 00003 to /u/DEMO/DATA/DEMO100/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00004 to /u/DEMO/DATA/DEMO100/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00006 to /u/DEMO/DATA/DEMO100/users01.dbf
channel ORA_DISK_1: reading from backup piece /u/DEMO/backup/12op4mt1_1_1
channel ORA_DISK_1: piece handle=/u/DEMO/backup/12op4mt1_1_1 tag=TAG20131116T205457
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:01:35
Finished restore at 16-NOV-13

Starting recover at 16-NOV-13
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 2 is already on disk as file /u/DEMO/FRA/DEMO100/archivelog/2013_11_16/o1_mf_1_2_98hmq0s6_.arc
archived log file name=/u/DEMO/FRA/DEMO100/archivelog/2013_11_16/o1_mf_1_2_98hmq0s6_.arc thread=1 sequence=2
media recovery complete, elapsed time: 00:00:08
Finished recover at 16-NOV-13

Statement processed
repair failure complete

Recovery Manager complete.

La base a du être arrêtée et, parce qu’on a perdu du redo courant et actif, on a perdu des transactions. Et la base a été ouverte en resetlogs. C’est un scnéario à éviter et c’est pourquoi il faut multiplexer les redo logs pour éviter de perdre tous les membres en cas de panne.

DataGuarg peut aussi éviter la perte de transactions lorsqu’on est en Maximum Availability (toutes les transactions commitées sont appliquées sur la standby) , ou lordqu

Laisser un commentaire