Search this Blog

Wednesday, July 18, 2018

Using Flashback Data Archive for Changed Data Capture

What is Change Data Capture?
When a row in a table gets updated the content of the table changes. This is also true when data gets inserted or deleted. With Change Data Capture (CDC) you would like to capture all changes made to a certain amount of data.

Take for example the value of the Dollar on the Stock Exchange. In relation to other coin-values the value of the Dollar can change many times during one day.
If the value is stored in a single row of a table, you only get the current value when looking at it. You never know what happened to it in the period between two look ups.
If you do need to know what happened in the mean time, it can be very useful to track and record all changes that happened on the value of the Dollar and thus on the table containing the value.

With Oracle 9i Change Data Capture was introduced, and it made it possible to track changes to tables and store these changes in a separate table.
Other processes then could "consume" this captured data for further processing. CDC worked by creating triggers on the source tables, that transferred data synchronously to the staging tables. This caused processing overhead and, because it was synchronous, had effect on the applications working on the source tables.

In Oracle 10g Asynchronous CDC was introduced, that captured data from the online redo log files. The triggers that captured the transaction activity were no longer needed and therefore the performance impact was reduced. This redo log capturing of data was combined with Oracle Streams, requiring a certain configuration for each CDC change set.
Unfortunately the Oracle 10g CDC is only available with Oracle Enterprise Editions. Because the changes where only captured from the redo log files after a commit was received, a minor delay is seen between the commit on the source tables and arrival of the changed data.

In Oracle 11g the Oracle Streams functionality still exists but as of Oracle 12c this functionality along with CDC has been deprecated. It has been replaced by Oracle Golden Gate, which comes as a very expensive per core licensed solution.

Oracle Flashback
With Oracle 10g Oracle introduced the Flashback functionality. With Flashback the database recycle-bin was also introduced, enabling Database Administrators to retrieve dropped tables from within the recycle-bin. But the most important functionality of Flashback was the ability to look back in time. Default, depending on the size of the UNDO tablespace and its retention, a DBA was able to e.g  restore deleted data from a table shortly after it got deleted.

When flashback was completely enabled on the database, the database would start creating flashback log files, keeping a record of all before-change values of the database up to a certain retention time.
More information on Oracle Flashback Technology can be found here.

Flashback Data Archive
The Flashback functionality in Oracle Database has been extended with Flashback Data Archive in Oracle 11g. With Flashback Data Archive (FDA) it is possible to store all changed data on tables involved for a certain retention period or indefinitely.
There were the standard Flashback functionality writes a before-image of a row-change to a flashback logfile, using FDA this change is written to a table directly related to the table that has configured with FDA.

How does it work:
Flashback Data Archive is not only the name of the functionality. It is also a real archive object in the database that has to be created and that will hold all staging tables containing changed data. The archive is preferably created in its own tablespace to be able to manage the size of the archive.
All is explained by the following example.

First create a tablespace to hold the Flashback Archive:

create tablespace fb_storage 
datafile '/u01/oradata/fb_storage01.dbf' 
size 100m autoextend on next 100m maxsize 5G;


Within the tablespace a Flashback Archive is created. At this point the decision has to be made if a database default archive is to be created, or a user-personal archive will be the best option.
To create a database default archive, user the tag "default" in the statement and execute it as user SYS:

create flashback archive 
default fb_archive 
tablespace fb_storage 
quota 4 g retention 1 day;

This creates Flashback Arvchive named "fb_archive". The option "default" makes this archive the default archive of the database. It is possible to have more archives, but only one can be the default.
The qouta is set to 4G and the retention of the kept data is set to 1 day;
This quota is the maximum size of the archive in the given tablespace. If the archive becomes full then operations on the tracked tables will fail.
If the archive reaches 90% of its given quota, the database will start issuing "out of space" warnings, thus enabling the DBA to purge old data or increase the archive size.

If a user specific archive is to be created, then this user needs to receive the rights to do so first (as user SYS):

grant flashback archive administer to ehda;
alter user scott quota unlimited on fb_storage;

After this the user can create its own archive:

create flashback archive fb_archive_scott_1day 
tablespace fb_storage 
quota 4G retention 1 day;

It is recommended to put the name of the owner and the retention into the name of the archive. This makes working with archives easier, especially when more than one exists.

The below query shows the just created archive in its tablespace:

select   flashback_archive_name
       , flashback_archive#
       , tablespace_name
       , quota_in_mb
from     dba_flashback_archive_ts
order by flashback_archive_name;

In order to use the default FDA on a certain table, rights have to be granted to the user owning that table(s) to use the FDA and its tablespace:

grant flashback archive on fb_archive to scott;
alter user scott quota unlimited on fb_storage;

The above steps are not needed if a user specific archive has been created.

Now connect as user scott and enable flashback archive on a table that has to be tracked:

alter table scott.emp flashback archive;

or for his personal archive, name the archive to be used:

alter table scott.emp flashback archive fb_archive_scott_1day;

Enabling and Disabling a flashback archive functionality on a table can be done by the table owner.
Disabling is done by a privileged user when using a database default archive.

With the next change of data in the scott.emp table, a old-value ( before image ) is stored in the flashback archive.
Keep in mind that there is no "before image" for inserted data, so an inserted row is only seen in the flashback archive if it has undergone an update afterwards.

The general documentation about FDA now would show a query about looking at changed data using the following query:

select * from  scott.emp 
versions between timestamp sysdate-4/1440 and sysdate;

This will show the contents of the scott.emp table including all the updated rows over the last 4 minutes. If the table started empty and only a couple of rows were inserted and changed, then this overview would still be quite readable. If the table however was already populated with a lot of data, you can only find changed data easily if you know (the primary key values of ) the rows that have changed.
To overcome this, it is also possible to directly query the staging table in the archive which holds only the changed rows. To find this staging table use the following query:

select * from
 dba_flashback_archive_tables 
 where table_name ='EMP' 
 and owner='SCOTT';

It will return someting similar to this:

TABLE_NAME OWNER_NAME FLASHBACK_ARCHIVE_NAME    ARCHIVE_TABLE_NAME   STATUS    
---------- ---------- ------------------------- -------------------- ----------
EMP        SCOTT      FB_ARCHIVE                SYS_FBA_HIST_7785    ENABLED   

Here the archive table name for the SCOTT.EMP table is shown as SYS_FBA_HIST_7785;
Every table that is tracked with FDA has its own archive table.
Note that naming convention of archive tables start with "SYS_FBA". When looking up, knowing this, it shows us that there are in total 3 tables associated with the SCOTT.EMP table in the FDA:

TableDescription
SYS_FBA_DDL_COLMAP_7785Table that holds a mapping of the columns of the history table to the original table.Because of the administrational overhead, adding or removing a column takes much longer on a tracked table compared to a normal table. 
SYS_FBA_HIST_7785This table hold, as mentioned, the before images of the actual rows that have changed.
SYS_FBA_TCRV_7785This table shows the STARTSCN & ENDSCN of the changes, inluding the OPeration that was involved (U=update, I=insert, etc..)


The SYS_FBA_HIST_xxxx tables are structural identical to a join of the SYS_FBA_TCRV table and the original SCOTT.EMP table.


Retrieving changed data:
As the archive tables hold the changed data along with SCN numbers, it is not hard to retrieve this data ordered in time, e.g:

select empno,ename,job from  SYS_FBA_HIST_7785 order by endscn;

Select the maximum SCN number and retrieve all rows that have been archived before that SCN:

select max(endscn)
from   sys_fba_hist_7785;

ENDSCN
-----------------
   10640031020116

And then:

select empno,ename,job from sys_fba_hist_7785 where endscn < 10640031020116;


Purging the Flashback archive:
Besides the configured retention which would purge old data from the archive it is also possible to manually purge the data.
According to the Oracle Documentation, purging is possible based on SCN or TIMESTAMP, or by iussuing the command "purge all", deleting all data.

Warning: What the official documentation fails to mention is that there is a bug in Oracle 11.2 and 12.1 which prevents purging the FDA based on SCN or TIMESTAMP. Only in 12.2 this bug is solved (Doc ID 16898135.8).
There is however a one-off patch available for Oracle 11.2.0.3 and and 11.2.0.4.

So until this patch is installed, only "purge all" is a working solution, but that brings a timing problem as to when it is suitable to purge all data looking at the application that might be populating the table in the mean time. One can imagine that between reading the captured data and executing a "purge all", new data might have been archived, so an

alter flashback archive fb_archive purge all;

can possibly lead to data loss if the application that is responsible for populating the table is not shutdown beforehand.




DDL Limitations:
Before explaining on how to disable the tracking of data changes and dropping the archive, one DDL restrictions on tracked tables has to be mentioned.

  • Dropping the table

Simple, isn't it??
There is documentation found on the Internet, even within well-known Blogs where it is stated that a tracked table can also not be truncated or renamed or undergo column changes.
This is however not true. Only dropping the table is not allowed and will generate an error:

drop table scott.emp;

Error at line 1
ORA-55610: Invalid DDL statement on history-tracked table


Disable / Remove the Flashback Data Archive:

Before an FDA can be removed, all tables that are tracked need to be decoupled, using the statement:

alter table ehda.cohead no flashback archive;

In tests I ran, a common user ( owner of the table ) can enable the flashback archive function on a table, but disabling this generates an error:

ORA-55620: No privilege to use Flashback Archive

A privileged user has to execute this command.

As soon as all tables have be decoupled from the FDA, the FDA can be dropped:

drop flashback archive fb_archive;

After that also the tablespace can be dropped. Make sure before using the below command that the tablespace is really empty.

drop tablespace fb_storage including contents and datafiles;


Conclusion:
I was really excited to learn about this -for me- new functionality. It brings a very easy way to capture changed data. The solution is easy to implement, comes with no downtime, or impact to the application or database.
Unfortunately there are still some bugs that prevent flawless use, unless the database is upgraded to 12.2


2 comments:

  1. Thanks for above explanation. In CDC, when any update happens we can capture both ie before update rows and after update rows where as in flashback it doesn't capture both. Please correct me if I am wrong as I don't see any example on this.

    ReplyDelete
  2. Hi Madhan,

    Thank you for the reply.
    It has been some time since I wrote this but I think you're right.
    With normal flashback you can "flashback" a table to a certain point in time, but you can only then pick up some data value after the flashback has been done
    With Flashback Data Archive all "before images" of all updates are kept ordered in time, until the retention expires.
    So if you do hundred updates on this table within the retention time, all hundred "before images" will be available.

    I hope this answers your question

    ReplyDelete