Search this Blog

Friday, January 12, 2024

Dataguard Setup

1. General information

With the ever increasing level of security, the need to keep up to date with the latest patches of the Oracle database also increases.
Patching of an Oracle database becomes easier and easier with the release of Oracle 19c and the AutoUpgrade tool, but that subject is covered in another document. 

In this document we describe the setup of a Dataguard environment and the patching of it. At Kramp we use Dataguard for the RMAN catalog database, to ensure that this database is always available. The RMAN catalog database is a very important database, as without it backups of a database can not be created.

Next to the setup of the Dataguard environment we will describe the patching of it. Understandably it differs from patching a Single Instance database, because of the redo apply process.

Note that in normal situations the use of dataguard is a licensed option. Since we however use it here on the database used for the RMAN catalog, which is a free-to-use database, also the use of dataguard is free of charge.
Note that we’re talking here about standard dataguard. Active dataguard where a standby database would be opened up read-only is a must-license feature always.

This document does not cover the contents of the database involved in this setup. 
2. Setup of Dataguard

In this section the setup of the dataguard environment is described. In the examples the setup of the test environment for the RMAN catalog database is used for the screenshots and statements.

The basic setup of the environment consists of two Linux servers:
vtl1g002: the primary server, running an Oracle 19c EE database
vtl1g003: the secondary server, running the exact same Oracle version

It is important for a Dataguard setup that both primary and secondary servers run the same Oracle version and patch level.

On the primary server an empty database is created in the CDB/PDB setup called TSMT01C.
The OSuser for this installation is called “tsmt01”, so all file system locations are based on the Kramp standard:  /opt/tsmt01/xxx

In the below sub sections the setup is explained in steps. 
2.1. Enable archive logging

Dataguard relies on the shipping of redo information from primary server to the secondary server. To support this, both the primary and secondary server databases must have archived log mode enabled. 

Start SQLplus on the primary server database and execute the following commands:

shutdown immediate;
startup mount;
alter database archivelog;
alter database open;

After enabling archivelog mode, make sure the database is backed up and archived logfiles are cleaned out. 

After that the database must be set into Force Logging mode, so that all actions done on the database will result in redo-generation, even if a statement uses the NOLOGGING option. 
To enable force logging, execute the following statement:

alter database force logging;
alter system switch logfile;


2.2. Create standby logfiles

Now create standby redo logs on the primary database in case of switchovers. The standby redo logs should be as big as the largest online redo logfile and there should be one extra group per thread. In our case we have 6 online redo file of 200Mb each. 
This means we create 7 standby redo logs of the same size. They will be located in a subdirectory of the online log location. 
First create this directory on both the primary and secondary server:

$ mkdir /opt/tsmt01/re01/standby

Now we can create standby log file from within SQLplus. 
This is only done on the primary database, as we will copy this database in a later step completely to the secondary server:

alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby01.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby02.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby03.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby04.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby05.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby06.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby07.log’ size 200m;


2.3. Initialization Parameters
For a dataguard solution both the primary & standby databases should have the same name (db_name).
They should however distinguish themselves based on db_unique_name. 
In our case, the primary database is in the Amsterdam datacentre and will therefore be called TSMT01C_AMS. The standby is located in the Enschede datacentre and is called TSMT01C_ENS.
In this step we will already name the primary instance. After we copy the primary to the standby server, we will rename it then,


alter system set db_unique_name=’TSMT01C_AMS’ scope=spfile;

Bounce the database to activate the above change.


Next parameter to be changed is “standby_file_management”. When set to “AUTO”, it makes sure that a physical change on the primary database, like adding or extending an tablespace is also reflected to the standby database.
This parameter is dynamic, so it does not need a restart (again):

alter system set standby_file_management=AUTO scope=both;

Enable this both on the primary and on standby database as this feature should also work after a failover or switchover.

If this parameter is not set to AUTO is will result in redo-apply on the standby server being unable to apply changes when dataguard is up and running.
If e.g. a tablespace is added on the primary database, the datafile(s) belonging to that new tablespace will not be present on the standby server. If this happens, the below error is seen in the DGMGRL console when looking at the configuration.


DGMGRL>; show configuration;

Configuration - tsmt01c_config

  Protection Mode: MaxPerformance
  Members:
  tsmt01c_ams - Primary database
    tsmt01c_ens - Physical standby database
      Error: ORA-16766: Redo Apply is stopped

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 37 seconds ago)

2.4. Duplicate the database 

If the database is shutdown, then first copy the contents of the spfile into a pfile.

SQL> create pfile from spfile;

File created.

Change directory to the $ORACLE_HOME/dbs directory and copy the pfile to the standby server:

$ scp initTSMT01C.ora  vtl1g003:`pwd`

On the standby server copy the created initfile to ‘/tmp’ for use with RMAN as temporary initfile.

$ cp /opt/tsmt01/ora/19.0.0.0/dbs/initTSMT01C.ora  /tmp/inittemp.ora

Now start the instance on the standby server in NOMOUNT mode 

SQL> startup nomount pfile=’/tmp/inittemp.ora’;

Then use RMAN on the standby server connecting to the primary database as target and to the -to be- standby as auxiliary.

$ rman target sys/<password>@tsmt01c_ams  auxiliary sys/<password>@tsmt01c_ens

Issue the following RMAN commands to copy the database:

duplicate target database
  for standby
  from active database
  dorecover
  spfile
    set db_unique_name='TSMT01C_ENS' 
  nofilenamecheck;

Note the option “for standby”, telling the duplicate command that this database is used for a standby database. It will thus not change a DBID after the clone.
The duplicate command can ofcourse be extended with db_file_name_convert or log_file_name_convert options, if needed.

Wait for the command to finish before continuing.


Finally create a password file on the standby server, with the same password as used for SYS on the primary database server.
Preferably we copy the password file from the primary server to the standby server, to make sure that it contains the same password.

On primary server:
$ cd $ORACLE_HOME/dbs
$ scp orapwTSMT01C   vtl1g003:`pwd`


As soon as the standby database has been created, add the same standby logfiles as on the primary:

alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby01.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby02.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby03.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby04.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby05.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby06.log’ size 200m;
alter database add standby logfile ‘/opt/tsmt01/re01/standby/tsmt01c_stdby07.log’ size 200m;


2.5. Listener.ora

Edit the listener.ora on  both the primary and secondary server to contain a static entry for the databases. This is needed for the dataguard broker to be able to access the database, even if it is not open.
Remember that only open databases will automatically register with a listener. When a database is in mounted mode. 

Most documentation found on the Internet name a standby database to contain the “_stdby” extension, like “cdb1_stdby”
This is however not really smart, especially if a failover takes place and suddenly the standby database called “cdb1_stdby” suddenly becomes the primary databases. 
This is why we name the databases following the physical location.

On the primary database add the following entry to the listener.ora:

SID_LIST_TSMT01 =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = TSMT01C_AMS_DGMGRL)
      (ORACLE_HOME = /opt/tsmt01/ora/19.0.0.0)
      (SID_NAME = TSMT01C)
      (ENVS="TNS_ADMIN=/opt/tsmt01/ora/network/admin")
    )
  )

On the standby server add the following:

SID_LIST_TSMT01 =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = TSMT01C_ENS_DGMGRL)
      (ORACLE_HOME = /opt/tsmt01/ora/19.0.0.0)
      (SID_NAME = TSMT01C)
      (ENVS="TNS_ADMIN=/opt/tsmt01/ora/network/admin")
    )
   )
  
Note the use of CAPITALS in the database names. This is important, as it must match how it is used later in the Dataguard Broker setup.  
Restart both listener processes:

lsnrctl stop
lsnrctl start


2.6. Tnsnames.ora

Update the tnsnames.ora file on both servers to have entries for both db_unique_names and the static “gloval_name” as defined in the listener.ora. The following entries therefore need to be added to the tnsnames.ora file on both servers:

TSMT01C_AMS =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = vtl1g002.kramponline.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = TSMT01C)
    )
  )

TSMT01C_ENS =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = vtl1g003.kramponline.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = TSMT01C)
    )
  )


Note the use of SID i.o SERVICE_NAME as the Dataguard Broker will need to connect to the databases when they are down. In that case the services are not available. 


3. Enable the Broker
At this point we are ready to start the Dataguard synchronization process. The first step is to start the Dataguard Broker on both databases.
Since we want to prevent the standby database from becoming its own incarnation, we only start it in “mount” mode. 
The primary database can be opened in “open” mode.


3.1. Start the broker process
On the primary database:

$ sqlplus / as sysdba

Connected to an idle instance.

SQL> startup
ORACLE instance started.
….
….
Database mounted.
Database opened.

SQL> alter system set dg_broker_start=true scope=both;


On the standby database:

$ sqlplus / as sysdba

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.
….
….
Database mounted.

SQL> alter system set dg_broker_start=true scope=both;
3.2. Create Broker configuration

On the primary server start the dataguard broker to register this server.

$ dgmgrl sys/<password>@tsmt01c_ams
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Tue Dec 5 16:27:35 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "TSMT01C_AMS"
Connected as SYSDBA.

DGMGRL> create configuration tsmt01c_config as primary database is TSMT01C_AMS connect identifier is TSMT01C_AMS;
Configuration "tsmt01c_config" created with primary database "tsmt01c_ams"
DGMGRL>

Now add the standby database:

DGMGRL> add database TSMT01C_ENS as connect identifier is TSMT01C_ENS;
Database "tsmt01c_ens" added
DGMGRL>

It is important to use capitals in these statements for the database name if there are also capitals in the listener configuration. 

If there is somewhere a discrepancy between these configurations, one can expect an ORA-01017 when trying to setup the dataguard configuration.

You can check the configuration using the “show configuration” command:

DGMGRL> show configuration

Configuration - tsmt01c_config

  Protection Mode: MaxPerformance
  Members:
  tsmt01c_ams - Primary database
    tsmt01c_ens - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
DISABLED
Configuration status is “DISABLED” which means that dataguard is not active yet.
We need to enable the configuration to let the standby database become and stay in sync with the primary database.

DGMGRL> enable configuration;
Enabled.
DGMGRL>

This commando may take a while depending on the amount of data that has to be synced.


3.3. Tnsnames.ora
Before this installation can be successfully used and before we can do e.g. a switchover or a failover, clients connecting to this Dataguard database need to be made aware about the existence of two hosts, possibly serving the primary database.

To solve this an extra host address needs to be added to the tnsnames.ora entry for this database. There where a single instance usually only has one address line in tnsnames.ora, for a Dataguard setup you need to add one more.
The tnsnames entry for the above configuration therefor looks like this:

TSMT01 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS=(PROTOCOL=TCP)(HOST=vtl1g002.kramponline.com)(PORT = 1521))
      (ADDRESS=(PROTOCOL=TCP)(HOST=vtl1g003.kramponline.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = TSMT01)
    )
  )


This way this entry enables the client to probe for which of the hosts is the active primary host and connections will always succeed


3.4. Monitoring
The configuration and activity can basically be monitored from the Dataguard Broker command line. More detailed monitoring and alerting on issues is better done from Oracle Enterprise Manager.

The fastest way to check the status of the Dataguard setup is to use the “show configuration” command from the Dataguard Broker command line.
This looks this:


DGMGR> show configuration

Configuration - tsmt01c_config

  Protection Mode: MaxPerformance
  Members:
  tsmt01c_ens - Primary database
    tsmt01c_ams - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 44 seconds ago)

It is important that the status show “SUCCESS”

If one looks in detail to the above screen output, it is visible that it is different from that on the previous page. There the tsmt01c_ams was primary database. Now we have switched roles and tsmt01c_ens is the primary database. 
In the next chapter this is explained in detail 

3.5. Switchover
From a dataguard setup, the members can switch roles. This can be done manually using a switchover, or automatically in case of a failover.
A failover can occur when e.g. de primary database server physically fails. The broker can in that case decide to promote the standby to be the primary database. 

A failover or the setup of it, is not discussed in this document. 


A switchover can be convenient in case of maintenance. If e.g. one of the servers needs to be shutdown, patched or something else, the standby database can take over the role of primary database. This makes it possible to shutdown the “before” primary database server.

A switchover making the tsmt01c_ams the primary database server looks like this:

DGMGRL> switchover to tsmt01c_ams;
Performing switchover NOW, please wait...
New primary database "tsmt01c_ams" is opening...
Operation requires start up of instance "TSMT01C" on database "tsmt01c_ens"
Starting instance "TSMT01C"...
Connected to an idle instance.
ORACLE instance started.
Connected to "TSMT01C_ENS"
Database mounted.
Connected to "TSMT01C_ENS"
Switchover succeeded, new primary is "tsmt01c_ams"
DGMGRL>

Confirming the switchover by looking at the resulting configuration shows:

DGMGRL> show configuration

Configuration - tsmt01c_config

  Protection Mode: MaxPerformance
  Members:
  tsmt01c_ams - Primary database
    tsmt01c_ens - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)
DGMGRL>


3.6. Failovers
With Dataguard it is also possible to let the configuration automatically failover to the standby node in case of a failure of the primary node. 
This is called Fast Start Fail Over, of FSFO

To configure this, we should setup a so-called observer-server, a standalone server that is responsible for monitoring the primary database.
In case the observer notices that the primary database is not responding for a couple of seconds, for what kind of reason, it instructs the dataguard broker to switch the configuration to the standby database.

For the current purpose of our Dataguard Setup, we do not need automatic failover, so we skip this configuration for now.

Patching Dataguard

1. Patching

When patching a database it is normal to bring down the database, patch the ORACLE_HOME and finally run ‘datapatch’ to apply any database changes introduced with the patch to the database.
With Out-Of-Place patching a new ORACLE_HOME is installed next to the existing and the database is started with that HOME.

But how do we do that with Dataguard?
Where we know that Dataguard applies changes done in the primary database onto the standby database using the redo-apply, we can’t patch the Dataguard the regular way.

If we would patch the primary database the usual way, it will apply database changes to the standby database of a patch that may be unknown to the ORACLE_HOME of this standby database. 

1.1. The basics
This is why Oracle has introduced the so-called “Standby First Patching method”.
With this method, the ORACLE_HOME of the standby database is patched first. But we can’t run datapatch on this ORACLE_HOME as the database is not open to receive these changes. 
So, after we have patched the ORACLE_HOME of the standby server, we switch roles, making the standby the primary. Then we can patch the ORACLE_HOME of the previous primary, now standby server.
Finally we run datapatch on the new primary and switch back.

In steps:
1) Disable redo apply on standby database
2) Shutdown standby database
3) Patch the ORACLE_HOME on the standby database
4) Start the standby database 
5) Start the redo apply

6) Perform a switchover. The patched standby database now becomes primary database, but without that datapatch has- run.

7) Now repeat step 1 to 5 for the “new” standby database
8) When finished, switch back roles
9) Finally run datapatch from the primary database.

That concludes the in-place patching of a dataguard environment.
The above steps are only useful when patching an existing ORACLE_HOME, so-called in-place patching. 
At Kramp we prefer- however out-of-place patching, where we install a new ORACLE_HOME with the patch already applied.

With out-of-place patching the steps needed to be done are:
1) Install a new patched home on both the primary and secondary server into a new directory.
2) Disable redo apply on standby database
3) Shutdown standby database
4) Copy password file, spfile and dataguard config files from the 
5) Change environment to the new ORACLE_HOME
6) OLD_ORACLE_HOME/dbs to the NEW_ORACLE_HOME/dbs
7) Start the database using the new ORACLE_HOME in mount mode
8) Start redo-apply
9) Switchover the primary database role to the new started standby database.
10) Repeat steps 1 to 7 for the former primary database server, now being the standby
11) When ready, run datapatch from the primary database
12) Switchover to let the initial primary database taking this role again


On the next pages both processes are seen with commands and screen output.


1.2. In place patching 
In this section the real patch process of in-place patching is discussed.

1) The first step is to disable the redo apply on the standby database. The command to use from SQLplus is:  


2) Now we can shutdown the Standby database following step 2:
 

Note that the “ORA-01109” confirms that we are connected to the standby database. 
Stopping the database includes stopping the listener:
 

3) At this point we assume that we have extracted the patch on a location with sufficient space, e.g. /opt/tsmt01/exp.
Also the OPatch version in the ORACLE_HOME has been replaced with the newest version.

Now we can apply the patches for database and ojvm:

When finished, optionally clean out old replace patch-sets and release updates.
The statements to use are :

opatch util listorderedinactivepatches 

This lists the inactive patches and release updates. 
You can actually delete them, using the command:

opatch util deleteinactivepatches

4) Then start the listener again.
Also start the standby database in MOUNT mode. 
Do NOT run datapatch !!!

5) Start the redo apply services again:


It may, like in this screenshot, give an error is redo apply has already been started.

6) At this point we can switch over to make the current Standby database the Primary database, after which we can repeat the above steps, which we will not show here again

7) So… step 7 is skipped, as it would display everything from step 1 to 5 again
Step 7 finishes with the switchover to the initial primary database again. 

8) In step 8 we’re going to run datapatch to apply the database changes that have come with the patch

Step into the ORACLE_HOME/OPatch directory and run ‘datapatch -verbose’:

This process may take a while. Because of the redo apply, both the Primary and Standby database are now supplied with the changes coming with the patch.



1.3. Out of Place patching

With out of place patching the downtime can be even more minimized. This method of patching uses a new patched installation as a new ORACLE_HOME.
The running database from the 1st ORACLE_HOME has to be brought down and can be immediately started from the 2nd ORACLE_HOME.

Because downtime is only very short, it is questionable if a switchover would not result in the same amount of downtime, as there also both databases are briefly down together while switching roles. 

1) In this section we assume that the new ORACLE_HOME is already in place. 
2) The first step is to disable the redo apply on the standby database. The command to use from SQLplus is:  



3) Now we can shutdown the database and the listener


4) Copy all parameter files, both for the database and the dataguard configuration, and the password file to the new ORACLE_HOME/dbs, e.g.

$ cd $OLD_ORACLE_HOME/dbs
$ cp *  $NEW_ORACLE_HOME/dbs


5) Set the environment tot the new ORACLE_HOME:

Also change the /etc/oratab and re-login to make sure that also the PATH variable is set correctly. Otherwise binaries from the OLD_ORACLE_HOME will be used to start the database again.


6) Edit the listener.ora file and update the ORACLE_HOME to point to the new location.
Then start the listener again with the new HOME.
Also start the standby database in MOUNT mode. 
Do NOT run datapatch !!!
7) Start the redo apply services again:



It may, like in this screenshot, give an error is redo apply has already been started.
 
8) At this point we can switch over to make the current Standby database the Primary database, after which we can repeat the above steps, which we will not show here again


9) Repeat steps 1 through 7, to relocate the now running standby database to the new ORACLE_HOME
10) Switchover to let the initial primary database take this role again
11) Run datapatch to apply the database changes that have come with the patch

Step into the ORACLE_HOME/OPatch directory and run ‘datapatch -verbose’:

 


This process may take a while. Because of the redo apply, both the Primary and Standby database are now supplied with the changes coming with the patch.


2. Troubleshooting
In this section, issues encountered during testing and writing of this blog are discussed. 
Actually only two issues occurred.

2.1. Listener configuration
The first was during the out-of-place patching where the primary was unable to find the standby database, showing:


If you encounter this error, when having done out of place patching of the standby and the database and listener have been started on the standby server, then you probably have not updated the listener.ora file, having the ORACLE_HOME pointing to the new location in the static listener entry:
If the above is true, this is confirmed in the dataguard broker logfile “drcTSMT01C.log”
There is may say:

Data Guard Broker Status Summary:
  Type                      Name           Severity  Status
  Configuration             tsmt01c_config Warning  ORA-16607: one or more members have failed
  Primary Database          tsmt01c_ams    Error  ORA-16778: redo transport error for one or more members
  Physical Standby Database tsmt01c_ens    Error  ORA-01034: ORACLE not available

The problem is caused by the dataguard broker connecting the the standby listener, which in turn can not find a running database from the old ORACLE_HOME.

To solve this problem, stop the Standby database and listener again, and update the ORACLE_HOME entry in the listener.ora as seen here. After updating the listener.ora restart the listener and database and all should work as expected.


2.2. Switchover failing
During the patch process the configuration is switched over from primary to standby two times. This ensures that the database will remain available for the clients

It may happen that a switchover fails, in restarting the two databases in the correct mode. An error in Dataguard Broker is then seen:



Somehow the switchover was partly done. 
Before acting upon this and blindly follow the action step, make sure that you check the other database if it is open in read-write mode and also if it is primary. 
This can be done using this query:

SQL> column open_mode format a20
SQL> column database_role format a20;
SQL> set linesize 100
SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- --------------------
READ WRITE           PRIMARY

It should return the same as seen here. Otherwise manual intervention is needed to really dig in to logfiles to find out where the switchover has gone of track.

If it returns the same output as seen here, you can safely start the standby in mount mode as indicated which will solve the problem.