Problema de Timeout en Transmisión Síncrona de Oracle Data Guard

En este artículo se documenta un problema crítico relacionado con el modo de protección MaxAvailability en Oracle Data Guard, donde la base de datos primaria se queda en estado de espera (hang) durante los tiempos de espera configurados al utilizar transporte síncrrono hacia la base de datos de respaldo.

Configuración Inicial - Modo de Máximo Rendimiento


DGMGRL> SHOW CONFIGURATION lag verbose
Configuration - dg_config
  Protection Mode: MaxPerformance
  Members:
  bd_principal    - Primary database
    bd_secundaria - Physical standby database 
                    Transport Lag:      0 seconds (computed 0 seconds ago)
                    Apply Lag:          0 seconds (computed 0 seconds ago)
    bd_remota     - Physical standby database 
                    Transport Lag:      0 seconds (computed 0 seconds ago)
                    Apply Lag:          0 seconds (computed 0 seconds ago)

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'bd_principal_CFG'

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS

Cambio a Modo de Máxima Disponibilidad


-- Configuración para modo síncrono
edit database bd_secundaria set property LogXptMode='sync';
edit database bd_secundaria set property NetTimeout=30;

DGMGRL> SHOW DATABASE VERBOSE bd_secundaria
Database - bd_secundaria
  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 12.00 KByte/s
  Active Apply Rate:  0 Byte/s
  Maximum Apply Rate: 0 Byte/s
  Real Time Query:    ON
  Instance(s):
    bd_secundaria
  Properties:
    DGConnectIdentifier             = 'prdg19/bd_secundaria'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = ''
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'sync'
    DelayMins                       = '0'
    Binding                         = 'optional'

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.

Configuración de Destinos de Archivo


-- Destinos de archivo configurados
log_archive_dest_2                   string      service="prdg19/bd_secundaria", SYNC  ***** Transporte síncrono
                                                  AFFIRM delay=0 optional compr
                                                 ession=disable max_failure=0 r
                                                 eopen=5 db_unique_name="bd_secund
                                                 aria" net_timeout=30, valid_for=(
                                                 online_logfile,all_roles)
log_archive_dest_3                   string      service="prdg19/bd_remota", ASYN   *** Transporte asíncrono
                                                 C NOAFFIRM delay=0 optional co
                                                 mpression=disable max_failure=
                                                 0 reopen=5 db_unique_name="bd_rem
                                                 ota" net_timeout=8, valid_for=(
                                                 online_logfile,all_roles)

Problema Detectado - Procesos RFS en Espera


SQL> SET LINESIZE 300
SQL> SET PAGESIZE 3000
SQL> SELECT PID, NAME, TYPE, CLIENT_PID, THREAD#, BLOCK#, ROLE, ACTION, CLIENT_ROLE 
     FROM V$DATAGUARD_PROCESS;

PID                      NAME  TYP CLIENT_PID    THREAD#     BLOCK# ROLE                     ACTION       CLIENT_ROLE
------------------------ ----- --- ---------- ---------- ---------- ------------------------ ------------ ----------------
5846                     LGWR  KSB          0          0          0 log writer               IDLE         none
5882                     TMON  KSB          0          0          0 redo transport monitor   IDLE         none
5950                     TT00  KSV          0          0          0 gap manager              IDLE         none
5956                     TT01  KSV          0          0          0 redo transport timer     IDLE         none
5948                     ARC0  KSB          0          0          0 archive local            IDLE         none
5958                     ARC1  KSB          0          0          0 archive redo             IDLE         none
5960                     ARC2  KSB          0          0          0 archive redo             IDLE         none
5962                     ARC3  KSB          0          0          0 archive redo             IDLE         none
6013                     INSV  KSB          0          0          0 broker instance slave    IDLE         none
5867                     DMON  KSB          0          0          0 broker monitor           IDLE         none
6075                     NSV0  KSB          0          0          0 broker net slave         IDLE         none
6438                     rfs   NET      22518          0          0 RFS archive              IDLE         archive gap
6460                     RSM0  KSB          0          0          0 broker worker            IDLE         none
6837                     MRP0  KSB          0          1         69 managed recovery         APPLYING_LOG none
16586                    rfs   NET      22521          1          0 RFS archive              IDLE         archive gap
19983                    rfs   NET      22104          1         69 RFS sync                 IDLE         log writer
20193                    rfs   NET      22509          1          0 RFS ping                 IDLE         gap manager
17 rows selected.

Prueba del Comportamiento de Timeout


ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';  
SELECT SYSDATE FROM DUAL;  
CREATE TABLE prueba AS SELECT 1 a FROM DUAL;  
SELECT SYSDATE FROM DUAL;

Resultado observado:

SQL> SELECT SYSDATE FROM DUAL;
SYSDATE
-------------------
2024-10-11 13:23:18
SQL> CREATE TABLE prueba AS SELECT 1 a FROM DUAL;
SELECT SYSDATE FROM DUAL;
Table created.
SQL> 
SYSDATE
-------------------
2024-10-11 13:23:37

-- La ejecución se bloquea durante el período de timeout

Reducción del Modo de Protección


DGMGRL> SHOW CONFIGURATION lag verbose
Configuration - dg_config
  Protection Mode: MaxAvailability
  Members:
  bd_principal    - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode
    bd_secundaria - Physical standby database 
      Warning: ORA-16809: multiple warnings detected for the member
              Transport Lag:      2 minutes 7 seconds (computed 0 seconds ago)
              Apply Lag:          2 minutes 7 seconds (computed 0 seconds ago)

    bd_remota     - Physical standby database 
              Transport Lag:      0 seconds (computed 0 seconds ago)
              Apply Lag:          0 seconds (computed 0 seconds ago)

DGMGRL> SHOW CONFIGURATION lag verbose

Configuration - dg_config

  Protection Mode: MaxAvailability
  Members:
  bd_principal    - Primary database
    bd_secundaria - Physical standby database 
              Transport Lag:      0 seconds (computed 0 seconds ago)
              Apply Lag:          0 seconds (computed 0 seconds ago)
    bd_remota     - Physical standby database 
              Transport Lag:      0 seconds (computed 0 seconds ago)
              Apply Lag:          0 seconds (computed 0 seconds ago)

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '0'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'bd_principal_CFG'

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS

Tras la ejecución del comando ORADEBUG RESUME, el sistema se recupera del estado de espera, confirmando que el problema está relacionado con el proceso RFS y el timeout configurado para el transporte síncrono.

Etiquetas: Oracle Data Guard Synchronous Transport Timeout MaxAvailability RFS Process

Publicado el 6-4 01:27