En este artículo se documenta un problema crítico relacionado con el modo de protección MaxAvailability en Oracle Data Guard, donde la base de datos primaria se queda en estado de espera (hang) durante los tiempos de espera configurados al utilizar transporte síncrrono hacia la base de datos de respaldo.
Configuración Inicial - Modo de Máximo Rendimiento
DGMGRL> SHOW CONFIGURATION lag verbose
Configuration - dg_config
Protection Mode: MaxPerformance
Members:
bd_principal - Primary database
bd_secundaria - Physical standby database
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
bd_remota - Physical standby database
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
ConfigurationWideServiceName = 'bd_principal_CFG'
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS
Cambio a Modo de Máxima Disponibilidad
-- Configuración para modo síncrono
edit database bd_secundaria set property LogXptMode='sync';
edit database bd_secundaria set property NetTimeout=30;
DGMGRL> SHOW DATABASE VERBOSE bd_secundaria
Database - bd_secundaria
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 12.00 KByte/s
Active Apply Rate: 0 Byte/s
Maximum Apply Rate: 0 Byte/s
Real Time Query: ON
Instance(s):
bd_secundaria
Properties:
DGConnectIdentifier = 'prdg19/bd_secundaria'
ObserverConnectIdentifier = ''
FastStartFailoverTarget = ''
PreferredObserverHosts = ''
LogShipping = 'ON'
RedoRoutes = ''
LogXptMode = 'sync'
DelayMins = '0'
Binding = 'optional'
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
Configuración de Destinos de Archivo
-- Destinos de archivo configurados
log_archive_dest_2 string service="prdg19/bd_secundaria", SYNC ***** Transporte síncrono
AFFIRM delay=0 optional compr
ession=disable max_failure=0 r
eopen=5 db_unique_name="bd_secund
aria" net_timeout=30, valid_for=(
online_logfile,all_roles)
log_archive_dest_3 string service="prdg19/bd_remota", ASYN *** Transporte asíncrono
C NOAFFIRM delay=0 optional co
mpression=disable max_failure=
0 reopen=5 db_unique_name="bd_rem
ota" net_timeout=8, valid_for=(
online_logfile,all_roles)
Problema Detectado - Procesos RFS en Espera
SQL> SET LINESIZE 300
SQL> SET PAGESIZE 3000
SQL> SELECT PID, NAME, TYPE, CLIENT_PID, THREAD#, BLOCK#, ROLE, ACTION, CLIENT_ROLE
FROM V$DATAGUARD_PROCESS;
PID NAME TYP CLIENT_PID THREAD# BLOCK# ROLE ACTION CLIENT_ROLE
------------------------ ----- --- ---------- ---------- ---------- ------------------------ ------------ ----------------
5846 LGWR KSB 0 0 0 log writer IDLE none
5882 TMON KSB 0 0 0 redo transport monitor IDLE none
5950 TT00 KSV 0 0 0 gap manager IDLE none
5956 TT01 KSV 0 0 0 redo transport timer IDLE none
5948 ARC0 KSB 0 0 0 archive local IDLE none
5958 ARC1 KSB 0 0 0 archive redo IDLE none
5960 ARC2 KSB 0 0 0 archive redo IDLE none
5962 ARC3 KSB 0 0 0 archive redo IDLE none
6013 INSV KSB 0 0 0 broker instance slave IDLE none
5867 DMON KSB 0 0 0 broker monitor IDLE none
6075 NSV0 KSB 0 0 0 broker net slave IDLE none
6438 rfs NET 22518 0 0 RFS archive IDLE archive gap
6460 RSM0 KSB 0 0 0 broker worker IDLE none
6837 MRP0 KSB 0 1 69 managed recovery APPLYING_LOG none
16586 rfs NET 22521 1 0 RFS archive IDLE archive gap
19983 rfs NET 22104 1 69 RFS sync IDLE log writer
20193 rfs NET 22509 1 0 RFS ping IDLE gap manager
17 rows selected.
Prueba del Comportamiento de Timeout
ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
SELECT SYSDATE FROM DUAL;
CREATE TABLE prueba AS SELECT 1 a FROM DUAL;
SELECT SYSDATE FROM DUAL;
Resultado observado:
SQL> SELECT SYSDATE FROM DUAL;
SYSDATE
-------------------
2024-10-11 13:23:18
SQL> CREATE TABLE prueba AS SELECT 1 a FROM DUAL;
SELECT SYSDATE FROM DUAL;
Table created.
SQL>
SYSDATE
-------------------
2024-10-11 13:23:37
-- La ejecución se bloquea durante el período de timeout
Reducción del Modo de Protección
DGMGRL> SHOW CONFIGURATION lag verbose
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
bd_principal - Primary database
Warning: ORA-16629: database reports a different protection level from the protection mode
bd_secundaria - Physical standby database
Warning: ORA-16809: multiple warnings detected for the member
Transport Lag: 2 minutes 7 seconds (computed 0 seconds ago)
Apply Lag: 2 minutes 7 seconds (computed 0 seconds ago)
bd_remota - Physical standby database
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
DGMGRL> SHOW CONFIGURATION lag verbose
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
bd_principal - Primary database
bd_secundaria - Physical standby database
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
bd_remota - Physical standby database
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '0'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
ConfigurationWideServiceName = 'bd_principal_CFG'
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS
Tras la ejecución del comando ORADEBUG RESUME, el sistema se recupera del estado de espera, confirmando que el problema está relacionado con el proceso RFS y el timeout configurado para el transporte síncrono.