Search this Blog

Thursday, June 13, 2013

Oracle Manual Standby Revised ( no Dataguard)

Because of all the comments and remarks I received about my Blog discussing a Manual Standby solution, I decided to update this recovery process and make it more stable. My new findings are described below.
A downloadable PDF version of the documentation can be found here:

Note:
When implementing the below solution, keep in mind that both databases need to be correctly licensed!!


Goal:

The goal is to setup a temporary manual HA solution for Oracle databases. It will keep a copy of a certain database in sync using the archived redo-logs from that database. The gap a standby database will be behind its production master is depending on the amount of redo generated and the size of the redo-logfiles combined with the amount of time, the delay, between the several runs of the Recover Script.

It is very important that the master and standby database have exact the same logical en physical configuration, i.e. database names should be the same, file system layout and placing of the files should be identical, owner ship of the database should be the same user, etc…


Basic steps for each run:

In order to keep a standby copy of a database in sync with its master a couple of repeatedly executed steps have to be performed:
* Connect to the master database and get the sequence# and start-time (a.k.a first_time) of the last archived redo-log file. The use of the first_time of the last archived redo-log ensures that we have all the required archived log information available at the standy site. This step is known in the script as the DetermineRecoverTime. It needs a configuration entry in the tnsnames.ora referring to the MASTER database (see configuration steps).

* Create a recover-statement with the retrieved ‘start_time’. The retrieved start-time therefore is configured in the required syntax, being ‘YYYY-MM-DD HH24:MI:SS’. This start-time string is passed into a recover statement, making the recover script look like this:

connect / as sysdba
recover database until time ‘yyyy-mm-dd hh24:mi:ss’ using backup controlfile;
auto
quit;

The “auto” option used here ensures the recover process automatically starts with the suggested files and continues until the database has been recovered until the desired recovertime is reached.
Here the importance of identical physical layout of both database servers is clearly seen.
The recover to a point in time is very usefull. At any moment between Recover script sessions the database can be made available by simply ‘opening’ it for use.
Remember however that opening the database will probable result in an incarnation of the production database which can no longer be used for standby recovery!!

After the recover steps are done, and all necessary archived logs have been applied, the script takes care of cleaning out all spend log files. It determines the first archived log file existing in the archivelog directory ( seen according to the sequence number in the file name). It then determines the highest used archived logfile from the recovery process by reading its logfile and –to be sure- subtract the value with 1, to always retain the last one used. All archived logfiles between these two sequence numbers will be deleted.


The Setup:

The setup of the Recover Standby procedure consists of putting the script in /var/bhr/scripts/oracle, setting ownership to the database owner, and setting rights to 744. We don’t want this script executed by mistake by any other user. Note that if there are more databases on the same system that are standby databases, each database must have its own script. A copy/rename action is required in that case.

The script(s) need(s) to be edited, changing the following parameters in the top of the script:

export ORACLE_SID=ORCL             # The name of the database ( SID )
export ORACLE_USER=oracle          # The owner of the database ( AIX userid )
export ARCHPATH=/opt/oracle/ar01   # The path where the archived logfile are on
# the standby server
export SOURCE_SID=ORCL_SRC         # The name of the tnsnames entry
# referring to the master db
#
# The hosts
#
export MASTER=master_hostname      # The UNIX hostname of the master site
export SLAVE=standby_hostname      # the UNIX hostname of the standby site
#

Note that all values here are examples of an existing configuration and have to be changed accordingly.

Now determine how many archived logs in terms of storage are generated on average by the master database. Create or resize the archived log file system on the standby database accordingly.
Mount this file system using NFS on the master server as an extra archived log location, e.g. /opt/oracle/ar02.
Make sure it is writable by the master database owner.

Important:  Following Oracle License directives the above solution using an NFS mount is only allowed for Enterprise Edition database installations.
On Standard Editions a second archive log destination is only allowed on a local file system.

Add this extra archived log location to the configuration of the master database, using the following SQLplus statement:

SQL>alter system set log_archive_dest_2=’location=/opt/oracle/ar02’ scope=spfile;
SQL>alter system set log_archive_dest_state_2=’enable’ scope=spfile;

The first parameter here is not dynamical. It needs a database restart.
The second parameter however is dynamical, it can be changed at will during runtime of the database, e.g. when problems occur at the standby site and we don’t want an active archived log location to become full.

Finally the login credentials of the masters database SYSTEM account needs to added to the ‘envusers’ file of the standby system. Also add an entry to the /etc/oratab file of the standby system, in order for the script to set the correct environment.


Run the script:

Before the script can be run the standby database needs to be in mount mode.
Using SQLplus, connect as sysdba user and use the

SQL> startup mount;

command to do so.
Do not open the database, or it will start a life of its own, making it an incarnation of the master and risking it to be further unrecoverable.
The initial versions of the recover script also contained a ‘shutdown abort’ and ‘startup mount’ command in each recover run. This has been removed. It can however be a good thing to re-add it back to the script, if it turns out the standby database goes down too often, for no reason and Recovery can’t continue.

The script can now be run, i.e. if all archived logfiles that are needed for the first run are available in the correct location on the standby server.
As soon as the first run finishes, the script can be added to the root crontab, making it run for every x minutes. The value of x is highly depending on the amount of redo generated by the master database and the size of the archived log file system on the standby site.

A crontab entry for the root-user where the script runs every half hour would look like:

00,30 * * * * su - oracle -c /opt/oracle/recover/bin/RecoverStandby.sh >/dev/null

Note that, although it is in the root-crontab, the script is executed by starting a shell as user ‘oracle. This user should be the same user as configured in the script as being the database owner.


Logging:

The script generates two logfiles in /var/bhr/log.
On Logfile named after the script but with a .log extension. It containes a continues log of every session.
Another logfile is a last run only logfile, with only the output of the ‘recover database until….’ command.



Manual actions:

The RecoverStandby script is autonomous. I.e. the script can run without user interference. If a run might fail, it will pick up work at the next run.

Manual actions however are needed when physical configuration changes are made to the master database. An example of this is e.g. adding a new datafile to an existing tablespace, or simply creating a new tablespace.
Since the standby database is in mount mode, such actions can not be performed there.
And – as said before – we do NOT want to open this database, as it might deviate from the master database in terms of changes and log sequence numbering.

This is a problem of keeping a database in sync manually.
With Oracle Dataguard, all these action would be synchronized by the dataguard processes.
In our case however the following steps need to be performed after a datafile change on the master site:

* Shutdown the Standby database
* Retain a backup copy of the control files of the Standby database
* On the Master database put the tablespace of which a datafile has changed in backup mode, using:

SQL> alter tablespace begin backup;

* Copy the changed datafile to the same location at the Stanby database
* When the copy finishes, turn of backup mode at the Master database,using

SQL> alter tablespace end backup;

* Copy the controlfile of the Master database to the Standby database
* Restart the Standby database to mount mode.






The script
Below the script I created for this purpose.
The script shown here is generic for documentation purposes. Adapt it to your own needs


# Recover a manual standby database to the current moment
#
#
export ORACLE_SID=ORCL
export ORACLE_USER=oracle
export ARCHPATH=/opt/oracle/ar01
export SOURCE_SID=ORACLE_SRC            # Used to retrieve the restore-to timestamp
export PROGRAM=`basename $0 | cut -d'.' -f1`
export LOG_NAME=/opt/oracle/log/${PROGRAM}.log
export SQLLOGIN="system/manager"
#
# The hosts
#
export MASTER=primhost
export SLAVE=stdbhost
#
ORAENV_ASK=NO
. oraenv
ORAENV_ASK=YES

Log()
{
  echo $1 >> ${LOG_NAME}
}


DetermineRecoverTime()
{
  echo "set heading off;" > Archtime${ORACLE_SID}.sql
  echo "set linesize 200;"  >> Archtime${ORACLE_SID}.sql
  echo "set feedback off;" >> Archtime${ORACLE_SID}.sql
  echo "connect ${SQLLOGIN}@${SOURCE_SID}" >> Archtime${ORACLE_SID}.sql
  echo "select sequence#, min(to_char(first_time,'YYYY-MM-DD HH24:MI:SS')) from v\$archived_log where sequence#=(select sequence# - 2 from v\$log where status='CURRENT') group by sequence#;"  >> Archtime${ORACLE_SID}.sql
  echo "quit;"  >> Archtime${ORACLE_SID}.sql

  OUTPUT=`${ORACLE_HOME}/bin/sqlplus -S /nolog @Archtime${ORACLE_SID}.sql |tail -1`
  SEQUENCE=`echo ${OUTPUT} | awk '{print $1}'`
  ARCHTIME=`echo ${OUTPUT} | awk '{print $2" "$3}'`
  export ARCHTIME SEQUENCE
  Log SEQUENCE=${SEQUENCE}
  Log ARCHTIME=${ARCHTIME}
}



CreateSqlRecoverScript()
{
  RECOVERTIME=$1
  echo "connect / as sysdba" > Recover${ORACLE_SID}.sql
  echo "recover database until time '${RECOVERTIME}' using backup controlfile;"  >> Recover${ORACLE_SID}.sql
  Log "recover database until time '${RECOVERTIME}' using backup controlfile;"
  echo "auto" >> Recover${ORACLE_SID}.sql
  echo "quit;"  >> Recover${ORACLE_SID}.sql
}


CleanUpArchfiles()
{
  echo "set heading off;" > CleanUp${ORACLE_SID}.sql
  echo "set linesize 200;"  >> CleanUp${ORACLE_SID}.sql
  echo "set feedback off;" >> CleanUp${ORACLE_SID}.sql
  echo "connect / as sysdba" >> CleanUp${ORACLE_SID}.sql
  echo "select sequence# - 1 from v\$log where status = 'CURRENT';"  >> CleanUp${ORACLE_SID}.sql
  echo "quit;"  >> CleanUp${ORACLE_SID}.sql

  OUTPUT=`${ORACLE_HOME}/bin/sqlplus -S /nolog @CleanUp${ORACLE_SID}.sql |tail -1 | awk '{print $1}'`
  MINFILE=`ls ${ARCHPATH}/*arc | sort | head -1 | cut -d'_' -f2 | cut -d'.' -f1`
  MAXFILE=`cat /opt/oracle/log/Recover.log| grep sequence| tail -1 | cut -d'#' -f2 | awk '{print $1}' `
  # Just to save 1 extra file... don't delete the last one used
  MAXFILE=`expr ${MAXFILE} - 1`
  Log OUTPUT=$OUTPUT
  Log MINFILE=$MINFILE
  Log MAXFILE=$MAXFILE
  while [ ${MINFILE} -lt ${MAXFILE} ]
  do
    rm -f ${ARCHPATH}/${ORACLE_USER}_${MINFILE}.arc
    MINFILE=`expr ${MINFILE} + 1`
  done
}


#
# Main
#
ARCHTIME=FALSE

  DetermineRecoverTime

  CreateSqlRecoverScript "${ARCHTIME}"

if [ ! "$ARCHTIME" = "FALSE" ]
then
#
echo Retrieve of Recovertime succeeded, continue with recover database
#
  ${ORACLE_HOME}/bin/sqlplus /nolog <<EOF
spool /opt/oracle/log/Recover.log;
@Recover${ORACLE_SID}.sql
quit
EOF

# Remove processed archive files
CleanUpArchfiles

# Remove temporary files
  rm -f Archtime${ORACLE_SID}.sql  Recover${ORACLE_SID}.sql
fi