Search this Blog

Thursday, June 13, 2013

Oracle Manual Standby Revised ( no Dataguard)

Because of all the comments and remarks I received about my Blog discussing a Manual Standby solution, I decided to update this recovery process and make it more stable. My new findings are described below.
A downloadable PDF version of the documentation can be found here:

Note:
When implementing the below solution, keep in mind that both databases need to be correctly licensed!!


Goal:

The goal is to setup a temporary manual HA solution for Oracle databases. It will keep a copy of a certain database in sync using the archived redo-logs from that database. The gap a standby database will be behind its production master is depending on the amount of redo generated and the size of the redo-logfiles combined with the amount of time, the delay, between the several runs of the Recover Script.

It is very important that the master and standby database have exact the same logical en physical configuration, i.e. database names should be the same, file system layout and placing of the files should be identical, owner ship of the database should be the same user, etc…


Basic steps for each run:

In order to keep a standby copy of a database in sync with its master a couple of repeatedly executed steps have to be performed:
* Connect to the master database and get the sequence# and start-time (a.k.a first_time) of the last archived redo-log file. The use of the first_time of the last archived redo-log ensures that we have all the required archived log information available at the standy site. This step is known in the script as the DetermineRecoverTime. It needs a configuration entry in the tnsnames.ora referring to the MASTER database (see configuration steps).

* Create a recover-statement with the retrieved ‘start_time’. The retrieved start-time therefore is configured in the required syntax, being ‘YYYY-MM-DD HH24:MI:SS’. This start-time string is passed into a recover statement, making the recover script look like this:

connect / as sysdba
recover database until time ‘yyyy-mm-dd hh24:mi:ss’ using backup controlfile;
auto
quit;

The “auto” option used here ensures the recover process automatically starts with the suggested files and continues until the database has been recovered until the desired recovertime is reached.
Here the importance of identical physical layout of both database servers is clearly seen.
The recover to a point in time is very usefull. At any moment between Recover script sessions the database can be made available by simply ‘opening’ it for use.
Remember however that opening the database will probable result in an incarnation of the production database which can no longer be used for standby recovery!!

After the recover steps are done, and all necessary archived logs have been applied, the script takes care of cleaning out all spend log files. It determines the first archived log file existing in the archivelog directory ( seen according to the sequence number in the file name). It then determines the highest used archived logfile from the recovery process by reading its logfile and –to be sure- subtract the value with 1, to always retain the last one used. All archived logfiles between these two sequence numbers will be deleted.


The Setup:

The setup of the Recover Standby procedure consists of putting the script in /var/bhr/scripts/oracle, setting ownership to the database owner, and setting rights to 744. We don’t want this script executed by mistake by any other user. Note that if there are more databases on the same system that are standby databases, each database must have its own script. A copy/rename action is required in that case.

The script(s) need(s) to be edited, changing the following parameters in the top of the script:

export ORACLE_SID=ORCL             # The name of the database ( SID )
export ORACLE_USER=oracle          # The owner of the database ( AIX userid )
export ARCHPATH=/opt/oracle/ar01   # The path where the archived logfile are on
# the standby server
export SOURCE_SID=ORCL_SRC         # The name of the tnsnames entry
# referring to the master db
#
# The hosts
#
export MASTER=master_hostname      # The UNIX hostname of the master site
export SLAVE=standby_hostname      # the UNIX hostname of the standby site
#

Note that all values here are examples of an existing configuration and have to be changed accordingly.

Now determine how many archived logs in terms of storage are generated on average by the master database. Create or resize the archived log file system on the standby database accordingly.
Mount this file system using NFS on the master server as an extra archived log location, e.g. /opt/oracle/ar02.
Make sure it is writable by the master database owner.

Important:  Following Oracle License directives the above solution using an NFS mount is only allowed for Enterprise Edition database installations.
On Standard Editions a second archive log destination is only allowed on a local file system.

Add this extra archived log location to the configuration of the master database, using the following SQLplus statement:

SQL>alter system set log_archive_dest_2=’location=/opt/oracle/ar02’ scope=spfile;
SQL>alter system set log_archive_dest_state_2=’enable’ scope=spfile;

The first parameter here is not dynamical. It needs a database restart.
The second parameter however is dynamical, it can be changed at will during runtime of the database, e.g. when problems occur at the standby site and we don’t want an active archived log location to become full.

Finally the login credentials of the masters database SYSTEM account needs to added to the ‘envusers’ file of the standby system. Also add an entry to the /etc/oratab file of the standby system, in order for the script to set the correct environment.


Run the script:

Before the script can be run the standby database needs to be in mount mode.
Using SQLplus, connect as sysdba user and use the

SQL> startup mount;

command to do so.
Do not open the database, or it will start a life of its own, making it an incarnation of the master and risking it to be further unrecoverable.
The initial versions of the recover script also contained a ‘shutdown abort’ and ‘startup mount’ command in each recover run. This has been removed. It can however be a good thing to re-add it back to the script, if it turns out the standby database goes down too often, for no reason and Recovery can’t continue.

The script can now be run, i.e. if all archived logfiles that are needed for the first run are available in the correct location on the standby server.
As soon as the first run finishes, the script can be added to the root crontab, making it run for every x minutes. The value of x is highly depending on the amount of redo generated by the master database and the size of the archived log file system on the standby site.

A crontab entry for the root-user where the script runs every half hour would look like:

00,30 * * * * su - oracle -c /opt/oracle/recover/bin/RecoverStandby.sh >/dev/null

Note that, although it is in the root-crontab, the script is executed by starting a shell as user ‘oracle. This user should be the same user as configured in the script as being the database owner.


Logging:

The script generates two logfiles in /var/bhr/log.
On Logfile named after the script but with a .log extension. It containes a continues log of every session.
Another logfile is a last run only logfile, with only the output of the ‘recover database until….’ command.



Manual actions:

The RecoverStandby script is autonomous. I.e. the script can run without user interference. If a run might fail, it will pick up work at the next run.

Manual actions however are needed when physical configuration changes are made to the master database. An example of this is e.g. adding a new datafile to an existing tablespace, or simply creating a new tablespace.
Since the standby database is in mount mode, such actions can not be performed there.
And – as said before – we do NOT want to open this database, as it might deviate from the master database in terms of changes and log sequence numbering.

This is a problem of keeping a database in sync manually.
With Oracle Dataguard, all these action would be synchronized by the dataguard processes.
In our case however the following steps need to be performed after a datafile change on the master site:

* Shutdown the Standby database
* Retain a backup copy of the control files of the Standby database
* On the Master database put the tablespace of which a datafile has changed in backup mode, using:

SQL> alter tablespace begin backup;

* Copy the changed datafile to the same location at the Stanby database
* When the copy finishes, turn of backup mode at the Master database,using

SQL> alter tablespace end; backup;

* Copy the controlfile of the Master database to the Standby database
* Restart the Standby database to mount mode.






The script
Below the script I created for this purpose.
The script shown here is generic for documentation purposes. Adapt it to your own needs


# Recover a manual standby database to the current moment
#
#
export ORACLE_SID=ORCL
export ORACLE_USER=oracle
export ARCHPATH=/opt/oracle/ar01
export SOURCE_SID=ORACLE_SRC            # Used to retrieve the restore-to timestamp
export PROGRAM=`basename $0 | cut -d'.' -f1`
export LOG_NAME=/opt/oracle/log/${PROGRAM}.log
export SQLLOGIN="system/manager"
#
# The hosts
#
export MASTER=primhost
export SLAVE=stdbhost
#
ORAENV_ASK=NO
. oraenv
ORAENV_ASK=YES

Log()
{
  echo $1 >> ${LOG_NAME}
}


DetermineRecoverTime()
{
  echo "set heading off;" > Archtime${ORACLE_SID}.sql
  echo "set linesize 200;"  >> Archtime${ORACLE_SID}.sql
  echo "set feedback off;" >> Archtime${ORACLE_SID}.sql
  echo "connect ${SQLLOGIN}@${SOURCE_SID}" >> Archtime${ORACLE_SID}.sql
  echo "select sequence#, min(to_char(first_time,'YYYY-MM-DD HH24:MI:SS')) from v\$archived_log where sequence#=(select sequence# - 2 from v\$log where status='CURRENT') group by sequence#;"  >> Archtime${ORACLE_SID}.sql
  echo "quit;"  >> Archtime${ORACLE_SID}.sql

  OUTPUT=`${ORACLE_HOME}/bin/sqlplus -S /nolog @Archtime${ORACLE_SID}.sql |tail -1`
  SEQUENCE=`echo ${OUTPUT} | awk '{print $1}'`
  ARCHTIME=`echo ${OUTPUT} | awk '{print $2" "$3}'`
  export ARCHTIME SEQUENCE
  Log SEQUENCE=${SEQUENCE}
  Log ARCHTIME=${ARCHTIME}
}



CreateSqlRecoverScript()
{
  RECOVERTIME=$1
  echo "connect / as sysdba" > Recover${ORACLE_SID}.sql
  echo "recover database until time '${RECOVERTIME}' using backup controlfile;"  >> Recover${ORACLE_SID}.sql
  Log "recover database until time '${RECOVERTIME}' using backup controlfile;"
  echo "auto" >> Recover${ORACLE_SID}.sql
  echo "quit;"  >> Recover${ORACLE_SID}.sql
}


CleanUpArchfiles()
{
  echo "set heading off;" > CleanUp${ORACLE_SID}.sql
  echo "set linesize 200;"  >> CleanUp${ORACLE_SID}.sql
  echo "set feedback off;" >> CleanUp${ORACLE_SID}.sql
  echo "connect / as sysdba" >> CleanUp${ORACLE_SID}.sql
  echo "select sequence# - 1 from v\$log where status = 'CURRENT';"  >> CleanUp${ORACLE_SID}.sql
  echo "quit;"  >> CleanUp${ORACLE_SID}.sql

  OUTPUT=`${ORACLE_HOME}/bin/sqlplus -S /nolog @CleanUp${ORACLE_SID}.sql |tail -1 | awk '{print $1}'`
  MINFILE=`ls ${ARCHPATH}/*arc | sort | head -1 | cut -d'_' -f2 | cut -d'.' -f1`
  MAXFILE=`cat /opt/oracle/log/Recover.log| grep sequence| tail -1 | cut -d'#' -f2 | awk '{print $1}' `
  # Just to save 1 extra file... don't delete the last one used
  MAXFILE=`expr ${MAXFILE} - 1`
  Log OUTPUT=$OUTPUT
  Log MINFILE=$MINFILE
  Log MAXFILE=$MAXFILE
  while [ ${MINFILE} -lt ${MAXFILE} ]
  do
    rm -f ${ARCHPATH}/${ORACLE_USER}_${MINFILE}.arc
    MINFILE=`expr ${MINFILE} + 1`
  done
}


#
# Main
#
ARCHTIME=FALSE

  DetermineRecoverTime

  CreateSqlRecoverScript "${ARCHTIME}"

if [ ! "$ARCHTIME" = "FALSE" ]
then
#
echo Retrieve of Recovertime succeeded, continue with recover database
#
  ${ORACLE_HOME}/bin/sqlplus /nolog <
spool /opt/oracle/log/Recover.log;
@Recover${ORACLE_SID}.sql
quit
EOF

# Remove processed archive files
CleanUpArchfiles

# Remove temporary files
  rm -f Archtime${ORACLE_SID}.sql  Recover${ORACLE_SID}.sql
fi




10 comments:

  1. Hi,

    Since I got a lot of requests for the mentioned document, I made it now publically available.

    Thanks.

    ReplyDelete
  2. hi.

    am trying to run your script in the standby server , but get an error:

    SQL*Plus: Release 11.2.0.3.0 Production on Thu Dec 12 16:48:08 2013

    Copyright (c) 1982, 2011, Oracle. All rights reserved.

    SQL> SQL> Connected.
    Media recovery complete.
    SP2-0042: unknown command "auto" - rest of line ignored.
    Disconnected from Oracle Database 11g Release 11.2.0.3.0 - 64bit Production
    With the Automatic Storage Management option

    if i comment out the " echo 'auto' ........ " line, i get only the "Media recovery complete." as last line, never get a number of file processed, etc, hence later in the code this line fail:

    MAXFILE = 'cat ......'

    can you provide some advice ?

    ReplyDelete
    Replies
    1. Hi Lester,


      Thank you for supporting my solution for maintaining a Manual Standby database.

      Since these scripts were developed/created on IBM-AIX, I guess that is where your problem is.
      The syntax of e.g. the 'expr' should/might be different on Oracle Linux.

      In the bottom of the script, if I remember correctly, there are a couple of logfiles removed. Comment out the rm statement and see what they show.
      Otherwise, startup a SQL session to the mounted database and fire the SQL-commands from the recover.sql script manually.
      See what responses are expected by Oracle Linux and maybe fix the script accordingly.

      The expr-part of the script is reponsible for deleting the used archive files, so you may have to fix that part too.

      We have 2 Standby databases using this solution and it is quite flawless. It sometimes goes lost in the recovery process, but that is maybe once a year.

      Success!

      FJ Franken

      Delete
  3. I know this is a old post, but what if the master database is in ASM? How do I copy the datafiles to the standby database?

    ReplyDelete
    Replies
    1. Hi Rafael,


      Thanks for your interesting question.
      I developed this solution for one of our non-ASM databases, so I never considered implementing support for ASM.
      Giving it a short "think over", I would think you have to create a backup/restore solution. I assume you archives are backed up to some kind of offline storage.
      Since the primary and manual standby are identical clones, you would be able to create a restore procedure, retrieving these archived files onto the standby server.

      On the other hand, don't let enormous databases, and/or high important databases rely on this "self-made" standby principle.
      Oracle invented Dataguard for that purpose.

      Thanks again

      Delete
  4. Hi, very nice article. I would like to point out that in the article you state 'Mount this file system using NFS on the master server as an extra archived log location'. If you take a look at the licensing documentation from Oracle ( https://docs.oracle.com/cd/E11882_01/license.112/e47877.pdf) you will find the following phrase: "Customers who want to manually script maintenance of a standby database can create a second copy of their archive logs outside of Oracle ASM on a local file system only, designated by the LOG_ARCHIVE_DEST_n parameter. 2. RMAN backups of files can reside on either a local or network mounted."

    Mounting an NFS share as an archivelog location is not permitted on Oracle Standard Edition, only on Enterprise Edition. I think this is worth mentioning.

    ReplyDelete
    Replies
    1. Thanks "Unknow" for this very good remark.
      I will add is to the BLOG text immediately.
      Thanks again

      Delete
  5. Hi Franken, Not wanting but to be gravedigger of this excellent article....

    I'm trying to use your first version of this script ( because the second dont work correctly ), and i still have genting this message :

    SP2-0042: unknown command "auto" - rest of line ignored.

    But this message is not prompt all the time of executions, is more or less one yes and one no.

    I try to remove but sometimes this is necessary, for now this is safe to ignore, but reduce my recovery time ( i dont know if you understand me properly because im brazilian hehe ).

    Anyway thanks , this script help me alot!

    ReplyDelete
    Replies
    1. Hi,

      Since it has been some time I worked with these scripts myself,I can't bring you a direct answer to this.
      However I have seen similar behaviour when the OS was not AIX as it was when the script was written. So maybe the difference in OS causes your problem.

      Also, try and retry to use the standby database before you really rely on it. The above mentioned method seems OK, but in one case I had, it totally corrupted the standby db, making it useless.

      Thanks..

      Delete