August 2024 ~ Oracle DBA Secrets

Friday, August 30, 2024

RAC Cluster Failure Detection and Recovery Process

RAC relies on the Cluster Services for failure detection. The Cluster Services are a distributed kernel component that monitors whether Cluster Members can communicate with each other and through this process enforces the rule of Cluster Membership. This is taken care of by Cluster Synchronization Service (CSS) with the CSSD process. The functions performed by CSS can be listed below.

1. Form a cluster, and add/remove members from a cluster.
1. Tracks in which members in a cluster are active.
2. Maintains a Cluster Membership list, which is consistent on all member nodes.
3. Provides timely notification of Membership changes.

When a node polls another Node (Target) in the Cluster, and the target has not responded successfully after repeated attempts, a timeout occurs after approximately 60 seconds. Among the responding nodes, the node that was started first and that is alive declares that the other node is not responding and has failed. This node becomes the new MASTER and starts evicting the non-responding node from the cluster. Once the eviction is complete, cluster reformation begins. The reorganization process regroups accessible nodes and removes the failed ones.

LMON is a background process that monitors the entire cluster to manage the global resource. By constantly probing the other instances, it checks and manages instance death and associated recovery for Global Cache Service (GCS). When a node joins or leaves the cluster, it handles the reconfiguration of locks and associated resources. LMON handles the part of recovery associated with global resources. Failover of service is also triggered by the EVMD process by firing a down event.

Once the reconfiguration of the nodes is complete, Oracle in, coordination with the EVMD and CRSD, performs several tasks.

1. Database/Instance recovery.

2. Failover of VIP system service.

3. Failover of the user/database services to another instance.

Database/Instance Recovery

After a node in the cluster fails, it goes through several steps of recovery to complete changes at both the instance (cache) level and database level:

1. During the first phase of recovery, Global Enqueue Services (GES) remasters the enqueues, and Global Cache Services (GCS) remasters its resources from the failed instance among the surviving instances.

2. The first step in the GCS remastering process is for Oracle to assign a new incarnation number.

3. Oracle determines how many more nodes are remaining in the cluster. (Nodes are identified by a numeric starting with zero and incremented by one for every additional node in the cluster).

4. In An Attempt To Recreate The Resource Master Of The Failed Instance, All GCS Resource Requests And Write Requests Are Temporarily Suspended (Grd Is Frozen).

5. All the dead shadow processes related to the GCS are cleaned from the failed instance.

6. After enqueues are reconfigured, one of the surviving instances can grab the instance recovery enqueue.

7. At the same time as GCS resources are remastered, SMON determines the set of blocks that need recovery. This set is called the Recovery set. With Cache Fusion an instance ships the contents of its block to the requesting instance without writing that dirty block to the disk (i.e. the on-disk version of the blocks may not contain the changes that are made by either instance). Because of this behavior, SMON needs to merge the content of all the online redo logs of each failed instance to determine the recovery set and the order of recovery.

8. At this stage, buffer space for recovery is allocated, and the resources that were identified in the previous reading of the redo logs are claimed as recovery resources. this is done to prevent other instances from accessing those resources.

9. A new master node for the cluster is created (A New Master Node Is Only Assigned If The Failed Node Was The Previous Master Node In The Cluster). All GCS shadow processes are now traversed from a frozen state, and this completes the reconfiguration process.

10. During the remastering of GCS from the failed instance (during cache recovery), Most Work On The Instance Performing Recovery Is Paused, And While Transaction Recovery Takes Place, Works Occur At A Slower Pace. Subsequently, Oracle starts the database recovery process and begins the cache recovery process (i.e., rolling forward committed transactions). This is made possible by reading the redo log files of the failed instance. Because of the shared storage subsystem, redo log files of all instances participating in the cluster are visible to other instances. This makes any one instance that detected the failure read the redo log files of the failed instance and start the recovery process.

11. After completion of the cache recovery, Oracle starts the transaction recovery operation i.e. roll forward the committed transaction and rollback the uncommitted transactions.

Please feel free to ask. thank you 🙂
Toufique Khan

Monday, August 26, 2024

Step by Step Install Oracle 19c Release 3 on Oracle Linux 7 (OL7)

Oracle Database is a robust and trusted relational database management system that has gained immense popularity among enterprises across the globe. This article offers a detailed and user-friendly guide on installing Oracle Database 19c on Oracle Linux.

Hardware Requirements

Requirements for Installing Oracle Database 19c on OL7 or RHEL7 64-bit (x86-64) (Doc ID 2551169.1)

Invoke ./runInstaller

Start the Oracle Universal Installer (OUI) by issuing the following command.

Note: If you encounter any issues during the installation process after running the Installer, please ensure that all the necessary directories, such as oraInventory, have been created. If they are not, Create them and assign appropriate access permissions.

Please feel free to ask. thank you 🙂

Toufique Khan

Thursday, August 08, 2024

Oracle 19c-Convert Physical Standby To Snapshot Standby

Snapshot standby is a feature in Oracle 11g that allows doing a read-write operation on the standby database.To Configure snapshot standby, first, we need to create a physical standby and then we will convert from physical standby to Snapshot standby.

Developers want to testing on Fresh Live Data, but DBA can't allow them to test on Primary, then how to archive developer requirements.. DBA can convert physical standby to snapshot standby in R/W mode.

Hence Developers can make their changes in Snapshot databases. Whatever changes are done on snapshot standby will be flushed out once converted back to the physical standby database from snapshot standby.

Primary database changes will not applied to the snapshot standby database why because there is no MRP process running the snapshot database.

No need to enable flashback database.

Only need to have db_recovery_file_dest and db_recovery_file_dest_size on physical standby.

1. Check Database Role and Verify Archive log GAP

SQL> select name,database_role,open_mode,log_mode from v$database;
NAME DATABASE_ROLE OPEN_MODE LOG_MODE
-------------------- ----------------------------- ----------------------------------- -----------
ITMSPRD PHYSICAL STANDBY READ ONLY WITH APPLY ARCHIVELOG

SQL>select INST_ID,PROCESS,STATUS,THREAD#,sequence#,BLOCK# from gv$managed_standby;
INST_ID PROCESS STATUS THREAD# SEQUENCE# BLOCK#
---------- --------- ------------ ---------- ---------- ----------
1 ARCH CLOSING 1 665547 761856
1 DGRD ALLOCATED 0 0 0
1 DGRD ALLOCATED 0 0 0
1 ARCH CLOSING 1 665544 737280
1 ARCH CLOSING 1 665548 735232
1 ARCH CLOSING 1 665549 780288
1 ARCH CLOSING 1 665542 745472
1 ARCH CLOSING 1 665546 808960
1 ARCH CLOSING 1 665543 747520
1 RFS IDLE 1 0 0
1 RFS RECEIVING 1 665897 77825
1 RFS RECEIVING 1 665895 346113
1 RFS RECEIVING 1 665896 126977
1 RFS RECEIVING 1 665892 778241
1 RFS RECEIVING 1 665893 473089
1 RFS RECEIVING 1 665894 393217
1 MRP0 WAIT_FOR_LOG 1 665892 0
17 rows selected.

SQL>select distinct(THREAD#),max(sequence#) from v$archived_log group by THREAD# order by 1;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 665891

SQL>select distinct(THREAD#),max(sequence#) from v$archived_log where APPLIED='YES' group by THREAD# order by 1;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 665889

SQL> select 'Last Log applied : ' Logs, to_char(next_time,'DD-MON-YY:HH24:MI:SS') Time
from v$archived_log where sequence# = (select max(sequence#) from v$archived_log where applied='YES')
union select 'Last Log received : ' Logs, to_char(next_time,'DD-MON-YY:HH24:MI:SS') Time
from v$archived_log where sequence# = (select max(sequence#) from v$archived_log);
LOGS TIME
-------------------- ------------------
Last Log applied : 11-JAN-24:11:40:54
Last Log received : 11-JAN-24:11:41:21

2. Verify Flash Recovery Area and Flashback database

SQL> select flashback_on from v$database;
FLASHBACK_ON
------------------
NO

SQL> show parameter db_recovery_file_dest ;
NAME TYPE VALUE
------------------------------------ ------------ -------------------------------------------
db_recovery_file_dest string /oracle/ora19c/19cbase/fast_recovery_area/
db_recovery_file_dest_size big integer 15G

SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /fodb_arch1/ITMSPRD/arch
Oldest online log sequence 0
Next log sequence to archive 0
Current log sequence 0

SQL> Show parameter db_flashback_retention_target
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_flashback_retention_target integer 1440

SQL> select name,database_role,open_mode,log_mode from v$database;
NAME DATABASE_ROLE OPEN_MODE LOG_MODE
-------------------- ----------------------------- ----------------------------------- -----------
ITMSPRD PHYSICAL STANDBY READ ONLY WITH APPLY ARCHIVELOG

3. Cancel MRP on Standby

SQL> alter database recover managed standby database cancel;

Database altered.

4. Covert to Snapshot Standby from Physical Standby

SQL> alter database convert to snapshot standby;
Database altered.

SQL> select name,database_role,open_mode,log_mode from v$database;
NAME DATABASE_ROLE OPEN_MODE LOG_MODE
-------------------- ----------------------------- ----------------------- ---------------------
ITMSPRD SNAPSHOT STANDBY MOUNTED ARCHIVELOG

SQL> alter database open;
Database altered.

5. Testing

6. Covert Back to Physical Standby from Snapshot Standby

SQL> shut immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

SQL> startup mount

ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance

ORACLE instance started.

Total System Global Area 6.8719E+10 bytes

Fixed Size 37257296 bytes

Variable Size 1.4764E+10 bytes

Database Buffers 5.3687E+10 bytes

Redo Buffers 231174144 bytes

Database mounted.

SQL>

SQL> select name,database_role,open_mode,log_mode from v$database;

NAME DATABASE_ROLE OPEN_MODE LOG_MODE
-------------------- ----------------------------- ----------------------- ---------------------
ITMSPRD SNAPSHOT STANDBY MOUNTED ARCHIVELOG

SQL> alter database convert to physical standby;

Database altered.

SQL> shut immediate

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

SQL> startup

ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance

ORACLE instance started.

Total System Global Area 6.8719E+10 bytes

Fixed Size 37257296 bytes

Variable Size 1.4764E+10 bytes

Database Buffers 5.3687E+10 bytes

Redo Buffers 231174144 bytes

Database mounted.

Database opened.

SQL>

7. Start MRP on Standby

SQL> alter database recover managed standby database disconnect from session;

Database altered.

Oracle DBA Secrets

Friday, August 30, 2024

RAC Cluster Failure Detection and Recovery Process

Database/Instance Recovery

Monday, August 26, 2024

Step by Step Install Oracle 19c Release 3 on Oracle Linux 7 (OL7)

Hardware Requirements

Invoke ./runInstaller

Thursday, August 08, 2024

Oracle 19c-Convert Physical Standby To Snapshot Standby

1. Check Database Role and Verify Archive log GAP

2. Verify Flash Recovery Area and Flashback database

3. Cancel MRP on Standby

4. Covert to Snapshot Standby from Physical Standby

5. Testing

6. Covert Back to Physical Standby from Snapshot Standby

7. Start MRP on Standby

Labels

Popular Posts

Recent Posts

Blog Archive

Oracle DBA Resources