Search this Blog

Friday, April 6, 2012

Oracle Manual Standby (no Dataguard )

This entry is outdated. A revised solution can be found here:
http://managingoracle.blogspot.nl/2013/06/oracle-manual-standby-revised-no_13.html

A new personal challenge. How to create a manual standby database using Oracle11gR2 Standard Edition.

As everybody might know, the official Standby database solution is Oracle Dataguard, only available however as an option with Oracle's Enterprise Edition of the database software. Since we have a lot of Standard Edition databases and we didn't want to pay the license fee for Enterprise Edition and the Dataguard option, I went on searching myself for a solution.

With dataguard a primary and a secondary (standby) database exist. These database servers know eachother as FAL_SERVER and FAL_CLIENT. The FAL_SERVER is the primary, serving archivelog information to the standby server, which in his turn 'replays' this information onto his mounted database, thus syncing the database with the primary.
Because this is a dataguard automated process, completely covered by the socalled dataguard broker, there is no need for any crontab entry what-so-ever to get this working.

In our case, where I don't want to use Dataguard, I don't have a broker process, responsible for transferring the archivelog information and replaying this on the standby database. I had to find a way, to get this done manually.
Now the actual method is much the same. Without dataguard, the only way to have a similar result is by using a continuing process of database recovery.
So simply said: Put the standby database in mount mode, get the archivelog file from the primary and recover the database until the last archivelog available. This process can be repeated as often as necessary.

One remark has to be made upfront:
A manual standby database always lags behind the primary database. There where dataguard standby databases are usually in sync up to the last transaction made, this manual standby is in sync up to the beginning of the last archivelog received. 

So, there is always a gap to overcome, in case the standby database needs to become active. Depending on the scenario of the disaster, we need to cope with having lost database information of the size of the last archivelog and the active redo-log, or we need to replay the last archivelogs and possibly the redo-logs from primary to get the database completely in sync. As said, this is totally depending on the situation of the disaster.


What did I use for this project:
  • Two Redhat 6 Linux virtual machines, both setup with Oracle's Virtualbox. Both VM's have 1Gb of memory, and 30Gb of diskspace. 
  • I named the VM's "dgmaster" for the primary and "dgslave" for the standby host. In this blog I will refer to them as such.
  • Both VM's are identically installed, with Oracle 11.2.0.3 SE, with the exact file system naming.
  • On dgmaster a database was configured, called ORCL.
  • On dgslave initially an offline copy of dgmaster's ORCL database was restored/copied and brought up into mounted state. This ofcourse brings the need of transferring the init.ora / spfile, and the creation of a password file.

The goal was to set up a fully automated process that could run from thne crontab of the standby server "dgslave". To get this working we need to have the following setup:
  • SSH equivalency between the two Oracle users on both nodes. The oracle user must be able to transfer archive files from dgmaster to dgslave, without the need of entering a password. Information about how to setup SSH equivalence can be found here:  http://managingoracle.blogspot.com/2010/06/ssh-equivalence-on-rac.html
  • Determine a way of automatic transfer of archivelogs and also determine the time to recover to, related to the last retrieved archivelog. I selected to use the 'first_time#' of the last archivelog received as the point to recover to. This makes sure that every time the recovery will succeed, as all needed information is available. 
  • The master ( primary ) database is up and running in 'open' state. It is e.g. in use as production database.
  • The standby ( secondary) database is in mounted state, so it is closed
  • The primary database has archive logging enabled. The log_archive_format of the primary is copied to the spfile of the standby database. This makes sure that dgslave can find the files it has retrieved, when starting the recovery.
  • Make sure the primary database has forced logging enabled. This is necessary to be able to log really all changes in the redo files, as all changes need to be send to the standby server. This can be configured using the statement:
    • ALTER DATABASE FORCE LOGGING;
I'm sure every DBA can setup two Virtual machines running an Oracle installation. I'm also sure that they will succeed in setting up SSH equivalence, so when everything is setup, it is time to discuss the steps that need to be done over and over again.

1) Dgmaster executes a forced log-switch
2) Dgmaster determines the 'first_time#' entry of the archivelog file written with this logswitch and sends it to dgslave.
3) Dgslave fetches all missing archivelogs from the master 
4) With the retrieved 'first_time#' entry, the recovery of this mounted database can continue another step. 

To make this run automatically, dgslave has the initiative. Dgslave runs the necessary scripts on dgmaster, by means of a ssh-connection. This makes it also possible to retrieve information sent by dgmaster.
Below the script that is used on dgmaster is shown. This script is - in my test system - in the home-directory of the oracle user. You can put it anywhere you like on dgmaster, but then make sure to add that location to the script on dgslave:


#
# Force a logswitch to get the last archivelog to the standby host
#
ORACLE_SID=ORCL
ORAENV_ASK=NO
. oraenv >/dev/null 2>&1


SwitchLogfile()
{
  #
  # Do logswitch  
  #
  RESULT=`echo "Alter system switch logfile;" | sqlplus -S / as sysdba | grep 'System altered'`
  if [ "$RESULT" = "System altered." ]
  then
export RETURN=1
  else
export RETURN=0
  fi
  # Do we need to do something with this return value?
  export RETURN
}


GetArchiveTime()
{
  CURYEAR=`date +%Y`
  echo "set heading off;" > temp.sql
  echo "set termout off;" >> temp.sql
  echo "select to_char(first_time,'YYYY-MM-DD HH24:MI:SS') from v\$archived_log where sequence#=(select sequence# - 1 from v\$log where status='CURRENT');" >> temp.sql
  sqlplus -S / as sysdba <
spool tempres.txt
@temp.sql
quit
EOF

cat tempres.txt | grep ${CURYEAR} | grep -v grep | awk '{print $1" "$2}'

#rm -f temp.sql  tempres.sql


SwitchLogfile
GetArchiveTime

In this script two functions are recognized. The first is responsible for executing a log switch, making the database write the current redo-log file into and archived logfile. 
The second function determines the timestamp of the 'first_time#' of this archived logfile. This is the timestamp of the first database change recorded in this file. One can understand that here the gap is created which causes the standby database to lag behind.
The above script is never executed by itself on the dgmaster. It is always executed by a remote call from dgslave. 
The script used on dgslave looks like below:


# Recover a manual standby database to the current moment
#
#
export ORACLE_SID=ORCL
export ORALCE_USER=oracle
export ARCHPATH=/opt/mandg/ar01
#
# The hosts
#
export MASTER=dgmaster
export SLAVE=dgslave
#
# The current date and time
#
CURDATE=`date +%Y-%m-%d`
CURTIME=`date +%H:%M:%S`

ForceRemoteLogSwitch()

  # This function calls a remote script that forces a log-switch and return
  # the 'first_time' of subsequently just written archive-file
  #
  ARCHTIME=`ssh ${MASTER} /opt/mandg/ora/home/RecoverDbSwitchLogfile.sh`
  export ARCHTIME
}

GetArchives()
{
 # 
 # copy archives from remote host archive directory to the local d irectory
 #
 for remfile in `ssh ${MASTER} "ls ${ARCHPATH}/*arc"`
 {
   locfile=${ARCHPATH}/`echo ${remfile} | cut -d'/' -f5`
   if [ ! -f  ${locfile} ]
   then
      scp ${MASTER}:${remfile} ${ARCHPATH} 
   fi
 }
}

CreateSqlRecoverScript()
{
  RECOVERTIME=$1
  echo "connect / as sysdba" > Recover${ORACLE_SID}.sql
  echo "shutdown abort;" >> Recover${ORACLE_SID}.sql
  echo "startup mount;" >> Recover${ORACLE_SID}.sql
  echo "recover database until time '${RECOVERTIME}' using backup controlfile;"  >> Recover${ORACLE_SID}.sql
  echo "auto" >> Recover${ORACLE_SID}.sql
  echo "quit;"  >> Recover${ORACLE_SID}.sql 
}


#
# Main
#
ARCHTIME=FALSE

ForceRemoteLogSwitch

echo {$ARCHTIME}
if [ ! "$ARCHTIME" = "FALSE" ]
then
#
# Logswitch Succeeded, get the Archives now and recover
#
  GetArchives
  CreateSqlRecoverScript "${ARCHTIME}"
  sqlplus /nolog <<EOF
@Recover${ORACLE_SID}.sql
quit
EOF
fi

The script on dgslave has three main functions. The function ForceRemoteLogSwitch takes care of the remote logswitch. As we are assigning the output of the remote call to a variable ARCHTIME, we can use that later on for the recovery process.


The second function GetArchives is responsible for transferring the needed archives from dgmaster to dgslave. As both VM's are configured identically, the ARCHPATH on both servers is the same. This function only copies archived logfiles from dgmaster to dgslave which have not been copied already before.

The third function CreateSqlRecoverScript creates a local script which contains the recovery statements. As it is in fact unknown how the last recovery ended, we start the recovery with a 'shutdown abort' making sure we start recovery in a correctly mounted database. You can recognize here the usage of the retrieved ARCHTIME, called RECOVERTIME here, as it is passed to this function as argument $1.
Also seen here is the usage of the 'auto' recover option. This will make the recovery uninteractive, but it needs a correctly configured log_archive_dest pointing to the location where the archive files have been put.

Finally the Main part of this script, calls all above functions and finally starts an SQLplus session, with a call to the created Recoveryscript.
That's it. Nothing really special or complex. It works like a charm!!

Some additional remarks:
As we are using a database recovery method the process can not handle the creation of new tablespaces and or new datafile ( to an existing tablespace ).
When we e.g. execute on the primary dgmaster the following statement:

alter tablespace example add datafile '/opt/mandg/db01/example02.dbf' size 200m;

it will result on the dgmaster during the next recovery step into an

ORA-01244: unnamed datafile(s) added to control file by media recovery
ORA-01110: datafile file 7: '/opt/mandg/db01/example02dbf' 

This is also true when e.g. creating a new tablespace. This can be solved using the following steps:

Lookup the name of the datafile that was generated by the recovery process. It is identified by the fact that it resides within $ORACLE_HOME/dbs and is

called UNNAMEDnnnnn, where nnnnn is a number.
If the name is found, use the below statement to recreate the correct datafile, using:

alter database create datafile 
'/opt/mandg/ora/11.2.0.3/dbs/UNNAMED00007' 
as '/opt/mandg/db01/example02dbf' ;

In this example /opt/mandg/ora/11.2.0.3 is the ORACLE_HOME on the VM's we created.
After that the automated recovery can continue.