Search this Blog

Thursday, July 7, 2011

Database SAN interactions

I would like to address a problem we are seeing on our systems, that is totally impossible from the viewpoint of SAN engineers.


The situation:

We have two Oracle databases.
The first one is an Oracle 10.2.0.4 databas
e, running on a Logical Partition (LPAR) of an IBM pSeries-7 server. The storage of this database is located on an IBM DS6800 SAN box, connected by Fiber cards to this server

The second database is an Oracle 11.
1.0.7.4 two node RAC database, running on two physically seperated IBM pSeries-6 servers. The storage of this database is in ASM, however located on the same IBM DS6800 SAN Storage server.


So, we have two database, totally physically seperated from each other. The only part they do share is the SAN box.


Now the problem:

If we put load on database 1 ( this is a Datawarehouse database ), database 2 starts suffering from it. Let's have a look the graphs to see what I mean:

The load on database 1:
The effect on database 2:


(Please Note that the graphs are not perfectly aligned in this blog)

The effect is thus that if database 1 is especially busy with LGWR ( system I/O ), database 2 starts to show I/O waits, seen as more and more active sessions with the dark-blue I/O color in Grid Control. The effect stops as soon as database 1 reduces (finishes) its LGWR activity.


Another strange effect we've seen is coming from another database "database3". This database is yet on another server (IBM pSeries-5), but also has its database on the same DS6800 storage server.
This database, an "old" 9.2.0.8 database, used to do a forced log-switch every 15 minutes, by means of a crontab script. This forced log-switch showed up as spikes in Grid Control of this database ( the orange "commit" color).
Again these spikes also reflect themselves as I/O waits on the RAC (database 2).
Even if we changed the execute time in the crontab of database 3, the spikes on database 2 moved accordingly. Because the application running on the RAC is our Webshop, suffers badly from these I/O waits, we turned these forced log-switches of.


Now, has anybody of you DBA's or Storage guys out there seen this kind of behaviour. The storage guys here at the company can't believe that database affect each other like this by means of the SAN. The SAN has a huge cache, it is tuned, etc...etc.., so theoretically this kind of behaviour is not possible.
In my opinion the LGWR "log-sync" actions cause the storage server to wait for a synchroneous action from the database, thus freezing or delaying all other actions with it.

Any other ideas are welcome. Solutions of course are even more welcome

Thanks



Add On: As of this week ( oct. 5th 2011 ), the storage of database 2 has been moved to a newly purchased SAN storage, totally isolating it from the other storage ( and thus the othter databases).
The inheritence effect now is gone !!