We have a two node RAC cluster running on AIX5.3 (TL 10).
It has been the first server running on ASM.
During a performance analysis we have seen that the RAC system with the ASM shows more latency in I/O than it predecessor, which was a DG system on normal AIX file systems.
The ASM is configured with two diskgroups, each existing of multiple disks. The disks are virtual RAW devices, configured through an IBM VIO server and existing on a SAN. The SAN itself stripes its data across multiple disks using a RAID5 solution.
As one knows ASM also uses striping as a method to spread data across the disks of a diskgroup.
In this topic we are going to investigate if the "extra" striping ASM does, has a negative effect on the I/O performance. Before we start the image to the left kind of explains how striping works. Because the disks offered to and used by ASM are virtual, they exist in a striped manner on the real physical disks of the SAN. Any data within the ASM diskgroup is striped also, and is theoretically double striped.
For the test we use a similar system, however not a RAC, configured with the same ASM and RDBMs versions ( 18.104.22.168 ).
On the SAN storage six (6) virtual devices are comfigured. Five (5) disks of 16Gb each forming a striped ASM diskgroup and one (1) disk of 64Gb that forms a diskgroup with just this single disk and therefore no striping. All diskgroups use 'External' redundancy.
On the diskgroup using the 5 disks a database is created, called STRIPEDB.
On the single disk diskgroup a database is created, called NOSTRIPE.
Both databases are identical. Tablespace sizes and memory settings are equal.
First of all: What would we expect the result to be.
a) The striped diskgroup is faster
b) The non-striped diskgroup is faster
c) The speed of both diskgroups is the same
Before I started the tests my bet was on b) or c).
That is because the extra overhead should cause extra wait-time (b) or the SAN cache would overcome everything and there was no noteable difference in I/O througput (c).
The test itself consists of the following steps:
1) Create a 10Gb tablespace
2) Create a 20Gb tablespace
3) Create a 40Gb tablespace with two datafiles of 20Gb each
4) Import a medium database (18Gb dumpfile) dump into the empty databases
Of all the steps either the SQL-timings and/or load timewill be measured.
At first hand the results where somehow dissapointing to me. The creation of both the 10Gb and 20Gb tablespaces was seconds faster on the striped diskgroup. The difference was not large:
10Gb tablespace: 1:09 for the striped diskgroup and 1:11 for the non striped diskgroup
20Gb tablespace: 2:12 for the striped diskgroup and 2:13 for the non striped diskgroup
The creation of the 40Gb diskgroup however gave better results. On the striped diskgroup this took 4:41 minutes. On the non-striped diskgroup it only took 4:17 minutes, so almost half a minute faster !!
Currently the large tests loading the dumpfiles are still running.
The first one, loading a 18Gb dump into the striped database took almost 9 hours to complete. To be precise, it took 8:56 hrs.
The second import totally surprised me. It took an astonishing 14 hours to complete.
I'm still wondering if the cause should be somewhere else. My feeling about the extra overhead that stripe on stripe has, should have made I/O performance worse.
But it doesn't !!
It doubles the I/O performance, which will ofcourse be dependent on the number of disks in the striped diskgroup.
Reason for this difference: The UNIX/SAN guys guessed for either the SAN cache, and/or the stripesize and layout must be a very good match. But anyway writing in parallel to 5 disks must be faster anyway than writing to a single disk, allthough all I/O is virtual.