Search this Blog

Monday, November 28, 2011

Tablespace reorganization

In our datawarehouse we often run into the "problem" that tablespaces contain a lot of free space within the tablespace itself. Most of the time, this is caused by the fact that the warehouse programs first build up new objects and afterwards destroy old objects.
In the future we might need to have a more definitive solution for this, like converting the tablespaces to one with an uniform size.

However until then I use to do a tablespace reorganization. This means I pick a block_id of the tablespace datafile, somewhere around the point of total usage.

For this procedure to work, you need a tablespace that can temporary store the objects that lie beyond the free space in the tablespace you want to reclaim. This temporary store tablespace should NOT be an oracle temporary tablespace, but just a permanent tablespace that temporary can hold these objects.
It is preferred to create a seperate tablespace for this, as this can afterwards be removed again.

How the reorganizing works:
E.g If we have a tablespace of 10Gb in size and its datafile ( let's assume we have a single file ), has 4Gb of free space, I use the following query to determine the upper-bound block_id:

SQL> select /*+ rule */
               max(block_id)*0.7
               from dba_extents a,
               where a.tablespace_name='USERS' ;

If a tablespace has more datafiles, then you need to specify a certain datafile by adding the the file_id column in the query. In that case first look up the datafile number by joining v$tablespace and v$datafile.


After the block_id has been calculated, I use the following complex query to generate table move and index rebuild statements:



set pages 9999 lines 200
accept "Reorg tablespace : " &tabsource;
accept "Temporary store tablespace (not TEMP) : " &tabtemp;


select distinct cmd from 
(SELECT  /*+ rule */  
         'alter '
        || DECODE (segment_type
,                 'TABLE SUBPARTITION', 'TABLE'
,                 'TABLE PARTITION', 'TABLE'
,                 'INDEX', 'INDEX'
,                 'INDEX PARTITION', 'INDEX'
,                 'TABLE','TABLE'
,                 'TABLE'                   -- TABLE is the default                  
                    )
       ||' '
       ||owner
       ||'.'
       || segment_name
       || ' '
       || DECODE (segment_type, 'INDEX', 'REBUILD','INDEX PARTITION','REBUILD','MOVE')
       || ' '
       || DECODE (segment_type
,                 'TABLE SUBPARTITION', 'SUBPARTITION'
,                 'TABLE PARTITION', 'PARTITION'
,                 'INDEX PARTITION','PARTITION')
       || ' '
       || partition_name
       || ' tablespace '
       || &tabtemp
       || DECODE (segment_type, 'INDEX', ' ONLINE','INDEX PARTITION',' ONLINE')
       || ';' cmd
  FROM dba_extents
 WHERE tablespace_name = &tabsource
 --AND FILE_ID=10
 AND block_id > [Calulated Block_size]) ;


In the above query the use of FILE_ID is commented out. This because we assumed that we have only one datafile. Otherwise enter the file_id that you've found in v$datafile;


The result of this query is a lot of object-rebuild statements, that move tables and tablepartitions to the tablespace defined by &tabtemp. The indexes and index-partitions are online rebuild in the same tablespace.

Remember to SAVE the script, as you need it to move everything back to the original tablespace. 

Run the script outside business hours, as moving table objects will lock the tables,  invalidate indexes and all dependent packages/procedures.


As soon as the script is finished, purge the tablespace of its 'recyclebin' contents, by issuing as SYS:

SQL> purge tablespace USERS;

Afterwards the tablespace datafile can be shrunk to the desired size, preferably subtracting the size of the empty space in the beginning from the total datafile size.
As soon as that is finished, reopen the script you saved, edit it and change back all tablespace names, to the original source tablespace ( in this example USERS ).
Then rerun the script .

Finally use UTLRP to rebuild all invalidated packages/procedure and indexes. Also, if you created a temporary tablespace for this procedure, drop that again.

In our databases, this procedure usually frees lots of free space. Last week also I was able to return 29Gb of free space back to the OS.

Copyright: This script can be freely used. If you use it as an explanation on your own site/blog, a link to the source (my blog) is appreciated.


Tuesday, November 8, 2011

Which rights are Granted PUBLIC

During a schema copy action from an old database to a new one, we ran into problems that certain packages where not compiled successfully.
Some dependencies failed, because the schema user in the new database failed to have execution priviliged on certain SYS packages, like DBMS_LOCK.

The settings of both schema's where identical, so the rights must have been set in the past by using the PUBLIC account.
For those who don't know, the PUBLIC role is a role used for setting database global rights, which every user is allowed to have.


What I needed was a list of GRANT statements for the PUBLIC user that existed in the "old" database but not (yet) in the "new" database.
I started browsing the usual tables like DBA_ROLE_ROLES, DBA_ROLES, DBS_ROLE_PRIVS, but nowhere I could find the right for the PUBLIC role.


After spending almost an hour on Google, I came accross a set of views called KU$_ROGRANT_VIEW, KU_SYSGRANT_VIEW and KU$_OBJGRANT_VIEW

Especially the last one contained the information I was looking for. This table contains columns like 'GRANTEE', 'OBJ_NUM' and the privilege name 'PRIVNAME'.
Joining this table with DBA_OBJECTS, using the below query, resulted in -at least- the list with all EXECUTION grants for the PUBLIC role.
I was looking for these EXECUTION grants, because the uncompilable packages complained about dependencies, not about missing tables.

The statement:


SQL> SELECT    'grant '
       || a.privname
       || ' on '
       || b.owner
       || '.'
       || b.object_name
       || ' to '
       || a.grantee
       || ';'
  FROM KU$_OBJGRANT_VIEW a, dba_objects b
 WHERE     grantee = 'PUBLIC'
       AND a.obj_num = b.object_id
       AND a.privname='EXECUTE'
       AND b.object_name not like '%/%'
       order by b.object_name;

Using this statement on both the "old" and "new" database I was able to create a list of missing GRANT statements (there were 10 of them) that had to be executed on the "new" database.

Afterwards all packages were compiled successfully.

Friday, October 7, 2011

Unregister RMAN database directly from catalog DB

What DBA never moved databases to new hardware ??
And what DBA never forgot to unregister the database first from the RMAN catalog ??

If you didn't unregister the database after a database move, then at those crucial moments, months or years later, you might end up facing a problem when trying to e.g. restore that database from backup.

Imagine you're in the middle of disaster, the production database server crashed, management is breathing down your neck, and you need to restore this database as soon as possible.
You prepare a new server, install Oracle on it and immediately start the restore, but then disaster strikes again:


RMAN> run {
2> allocate channel ch0 device type 'sbt';
3> restore controlfile;
4> }


allocated channel: ch0
channel ch0: sid=147 instance=hkdrt1 devtype=SBT_TAPE
channel ch0: Veritas NetBackup for Oracle - Release 6.5 (2009050106)

Starting restore at 2009-DEC-03 11:57:20

released channel: ch0
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/03/2009 11:57:20
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20005: target database name is ambiguous


The restore fails, because RMAN does not know which of the databases known in the recovery catalog with the given SID needs to be restored.
You should have unregistered the old copy of the database years back, when you moved the database to that "new" server which now is totally dead.
You should have -maybe- tried a restore back then to make sure it worked !!

But hey, even with the management in your back everything now goes smooth.
You have this blog guide to follow:


First of all:
Following this blog, you are directly administrating the RMAN catalog. Make sure you only follow the below procedure, do not manually delete records from any of the tables, or change any of the data, as you corrupt the catalog in doing so, rendering it totally useless 


How to continue?

We are going to use a database package that is installed in this catalog database. The package is called dbms_rcvcat and it has a procedure unregisterdatabase that cleans out all information about a database by providing a database key (db_key) and a database id (db_id).
The statement to do this looks like:

SQL> exec dbms_rcvcat.unregisterdatabase(db_key,db_id);

Now we only need to find these db_key and db_id values.
First connect to the database which contains the RMAN catalog using whatever SQLplus program you like.
Find in the catalog the list of databases know to the catalog with this SID:

SQL> select db_key, db_name, 
               reset_time, dbinc_status
               from dbinc where db_name = 'YOURSID';

The result is a list of registered databases with this SID name.


DB_KEY DB_NAME RESET_TIME DBINC_STATUS
411792
YOURSID 7-11-2002 14:29:22 PARENT
411792
YOURSID 24-10-2005 13:40:18 CURRENT
7455112
YOURSID 10-10-2008 21:41:20 CURRENT


In this example a couple of records are shown for the same SID. If correctly configured a database has only one CURRENT record in the catalog. A CURRENT record represents a database that is - according to the RMAN catalog - currently an active representation of a running database.
Since however a database can have only one CURRENT incarnation the one with the most recent RESET_TIME is the actual CURRENT database registration.
A PARENT incarnation is a database incarnation from which the current incarnation branched following an OPEN RESETLOGS operation. It is therefore not always the case that a PARENT incarnation exists.

In the above example the database registration with DB_KEY=7455112 seems to be the actual current database, as this one has the most recent reset_time.
This means we are going to unregister the database with the db_key equal to 411792;

Now we only need to find the DB_ID that belongs to this database:

SQL> select db_key, db_id 
               from db where db_key= 411792;

The result will look like:


DB_KEY DB_ID
411792
412102591


Now that we know the DB_KEY and the DB_ID, we can use the above mentioned procedure to unregister this database from the catalog:

SQL> exec dbms_rcvcat.unregisterdatabase(411792,412102591);

Don't worry, this statement can run for minutes ( up to 10-15 minutes on our system )!!

Afterwards re-run the select from the DBINC table again, and you'll see that the database registration has been removed from the catalog.
If a PARENT record existed for this DB_KEY, that record has also been deleted.


Conclusion:

The above procedure enables you to successfully clean out ( unregister ) databases from the RMAN catalog that do no longer exist.
This procedure also is a very risky procedure. Especially if one does now know what he is looking at, and expecially if the management team is breathing down your neck.
If you're in a disaster situation, where an immediate restore of a production environment is wanted, it is not the good time to do these kind of exercises.
If and when you enter by mistake the DB_KEY and DB_ID of the latest CURRENT incarnation ( the database you need to restore ) and you accidently press [Enter], then immediately start thinking about another job !!
Any restore of that particular database by means of RMAN then is no longer possible!!




Thursday, September 29, 2011

ORA-27123: Unable to attach a shared memory segment

This morning we had another brainteaser starting one of our databases.

The situiation was as following:

We have TEST/DEV server with a single Oracle 9.2.0.8 installation on it, installed as user "oracle"
Next to that there are several application users, called appt01, appt02, appt03 and appt04, all using this same ORACLE_HOME for their database.

All databases were running fine this morning. 
We shutdown the appt01 to install a patch. 

When trying to restart the database, suddenly this error appeared:

SQL> startup 
ORA-27123: Unable to attach a shared memory segment 
IBM AIX RISC System/6000 Error: 13: Permission denied


Using Google and MOS docs (115753.1 &167250.1), all information pointed to the incorrect file permissions of the "oracle" binary where the sticky bit has to be set. However this could be possible, because the patch installation rebuild this binary, it felt as not the correct reason. The file permssions were correct when looking at the file:

oracle@devrsv1% cd$ORACLE_HOME
oracle@devrsv1% ls -l ./bin/oracle
-rwsr-s--x    1 oracle   dba     70518002 Sep 29 12:09 ./bin/oracle

Investigating even more it was soon discovered that all other three database were able to stop and restart as expected. They showed no problems at all, even though they use the same ORACLE_HOME and settings.

After hours of searching and looking at the system, I finally turned to the OS.
Because the error message showed something about shared memory, I started investigating into that direction.
As user root, we executed:

# ipcs -mob

This statement usually shows all active shared memory segments, their sizes and the owners of the segments. In our case, only the output header appeared:


IPC status from /dev/mem as of Thu Sep 29 14:17:01 DFT 2011
T        ID     KEY        MODE       OWNER    GROUP NATTCH     SEGSZ


Now this showed that something was really wrong with the OS shared memory administration.
The other three databases were running, but also from these three no shared memory was shown, not even mentioning the standard OS shared memory pools.

We then decided to restart the complete server. And that solved the problem.
After the restart all databases started again as expected. Also the shared memory segments were correctly shown.
So even the error message displayed, usually points into the direction of the 'sticky bit' of the oracle binary, the problem can also be caused by a total shared memory problem of the system, caused by whatever reason.

Wednesday, September 28, 2011

Database Index administration

I would like to share some PL/SQL code that enables us DBA's to do index maintenance. In our data warehouse there were a couple of procedures that saved very basic index information.
Then the indexes where dropped and after the warehouse load they were recreated.
The pitfall of these simple procedures was that they were not able to cope with partitioned indexes and other complex structures. These more complex indexes were rebuild as normal non-partitioned indexes afterwards.

To be able to also rebuild partitioned and other index structures I rewrote the complete procedure, making effective use of the DBMS_METADATA package available in the SYS account.

This DBMS_METADATA package has a function called GET_DDL that is able to return the DDL of any object in the database, like tablespaces, tables, indexes and others.
If you e.g. would want to know the DDL of the EMP table, simple use the following query:

SQL > select dbms_metadata.get_ddl ('TABLE','EMP') from dual;

Ofcourse the above statement needs to be executed as the owner of the table. If you want to execute it as SYS then you must supply the function with the table owner, like:

SQL> select sys.dbms_metadata.get_ddl('TABLE','EMP','SCOTT') from dual;

The result is a CLOB variable containing the DDL to (re)create the EMP table of user SCOTT:


 CREATE TABLE "SCOTT"."EMP" 
   ( "EMPNO" NUMBER(4,0) NOT NULL ENABLE, 
"ENAME" VARCHAR2(10 CHAR), 
"JOB" VARCHAR2(9 CHAR), 
"MGR" NUMBER(4,0), 
"HIREDATE" DATE, 
"SAL" NUMBER(7,2), 
"COMM" NUMBER(7,2), 
"DEPTNO" NUMBER(2,0), 
CONSTRAINT "EMP_VALID_JOB" CHECK (job in ('CLERK','SALESMAN','MANAGER','ANALYST','PRESIDENT')) ENABLE, 
PRIMARY KEY ("EMPNO")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 16384 MINEXTENTS 1 MAXEXTENTS 1017
  PCTINCREASE 50 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "SYSTEM"  ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 16384 MINEXTENTS 1 MAXEXTENTS 1017
  PCTINCREASE 50 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "SYSTEM"



The procedures I wrote uses a simple table for administration purposes. This table exists of four columns:
INDEX_NAME of type VARCHAR(30)
TABLE_NAME of type VARCHAR(30)
QUERY of type CLOB
STMT_ORDER of type NUMBER(3)

The first three column are quite self-explaining. They contain the table_name, and index_name that is being dropped, and of course the query that is needed to recreate the index afterwards.
The last column STMT_ORDER determines the order of queries. This is important e.g. when creating primary-key indexes with extra constraints. The primary key needs to exist, before a constraint can be added to it.

The DDL structure of this table looks like:

CREATE TABLE WBX_DDL_INDEXES
(
   INDEX_NAME     VARCHAR2 (30 CHAR)
,  TABLE_NAME     VARCHAR2 (30 CHAR)
,  QUERY          CLOB
,  STMT_ORDER     NUMBER (3)
)
LOB (
   QUERY)
   STORE AS
   (
      TABLESPACE USERS
      ENABLE STORAGE IN ROW
      CHUNK 16384
      RETENTION
      NOCACHE LOGGING)
TABLESPACE USERS
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
MONITORING;


CREATE UNIQUE INDEX WBX_DDL_INDEXES_PK
   ON WBX_DDL_INDEXES (INDEX_NAME, TABLE_NAME, STMT_ORDER)
   NOLOGGING
   TABLESPACE TOOLS
   NOPARALLEL;


ALTER TABLE WBX_DDL_INDEXES ADD 
    ( CONSTRAINT WBX_DDL_INDEXES_PK PRIMARY KEY 
      (INDEX_NAME, TABLE_NAME, STMT_ORDER) 
      USING INDEX WBX_DDL_INDEXES_PK ENABLE VALIDATE);

There are two procedures written, the first responsible for dropping a given index, the second for recreating it. Note that the procedure take a table_name as argument, where one would maybe expect a table_name, index_name combination.
However, since we are using this is a large datawarehouse, all indexes of a given table are dropped. We therefore don't need to specify the index_name.
The "DROP_INDEX" procedure looks like:

  procedure drop_index
    (p_table      in     varchar2
    )
   is
    q clob;
    l_count number(3);


    -- Use a cursor with a complex statement to determine all indexes and their types
    cursor c1(p_table varchar2) 
    is
        select ui.index_name, ui.table_name, ui.index_type, ui.uniqueness, uc.constraint_type
        from user_indexes ui
        left outer join user_constraints uc
        on ui.index_name=uc.index_name 
        and ui.table_name=uc.table_name
        where ui.table_name=p_table
        and ui.index_type <> 'LOB'
        order by uc.constraint_type;
begin
    l_count := 1;
    for r1 in c1(p_table) 
    loop
        if r1.constraint_type = 'P'
        then
            q := sys.dbms_metadata.get_ddl('CONSTRAINT', r1.index_name);
            begin
                insert into wbx_ddl_indexes (index_name, table_name, query, stmt_order)
                values ( r1.index_name, p_table, q, l_count);
                exception
                    when dup_val_on_index then
                         null;
            end;
            -- if the index is a primary key, drop it using an "ALTER TABLE.."
            execute immediate 'alter table '||p_table||' drop primary key';   
        else 
            q := sys.dbms_metadata.get_ddl('INDEX', r1.index_name);
            begin
                insert into wbx_ddl_indexes (index_name, table_name, query, stmt_order)
                values ( r1.index_name, p_table, q, l_count);
                exception
                when dup_val_on_index then
                     null;
            end;
            -- if the index is a normal index, use the "DROP INDEX" statement
            execute immediate 'drop index '||r1.index_name;
        end if;


        l_count := l_count+1;        
        commit;               
     end loop;
end; 
     
The "CREATE_INDEX" procedure looks like:

  procedure create_index
    (p_table      in     varchar2
    )
   is
    q clob;
    exec_q varchar2(32767);
    l_index varchar2(30);
    l_length number(10);
    -- Use a cursor to go through all saved index statements
    cursor c1(l_table varchar2) 
    is 
        select index_name, query,stmt_order
        from wbx_ddl_indexes 
        where table_name=l_table
        order by stmt_order asc;
     
begin
    for r1 in c1(p_table) 
    loop
         begin
             select index_name
             into l_index
             from user_indexes where table_name=p_table and index_name=r1.index_name;
         
             exception 
                when no_data_found then
                -- Index does not exist, create it
                select query into exec_q 
                    from wbx_ddl_indexes
                    where index_name=r1.index_name
                    and table_name=p_table
                    and stmt_order=r1.stmt_order; 
                
                execute immediate exec_q;         
         end;
         
         -- As soon as the index is created, update the administration
         delete
           from   wbx_ddl_indexes  
           where  index_name = r1.index_name 
           and table_name=p_table
           and stmt_order=r1.stmt_order;


        commit;
   end loop;
end;


It is too much to exactly explain the procedure here themselves. Anyone who can read PL/SQL code should be able to understand the procedures.

Feel free however to post questions or send me an Email.

Thursday, July 7, 2011

Database SAN interactions

I would like to address a problem we are seeing on our systems, that is totally impossible from the viewpoint of SAN engineers.


The situation:

We have two Oracle databases.
The first one is an Oracle 10.2.0.4 databas
e, running on a Logical Partition (LPAR) of an IBM pSeries-7 server. The storage of this database is located on an IBM DS6800 SAN box, connected by Fiber cards to this server

The second database is an Oracle 11.
1.0.7.4 two node RAC database, running on two physically seperated IBM pSeries-6 servers. The storage of this database is in ASM, however located on the same IBM DS6800 SAN Storage server.


So, we have two database, totally physically seperated from each other. The only part they do share is the SAN box.


Now the problem:

If we put load on database 1 ( this is a Datawarehouse database ), database 2 starts suffering from it. Let's have a look the graphs to see what I mean:

The load on database 1:
The effect on database 2:


(Please Note that the graphs are not perfectly aligned in this blog)

The effect is thus that if database 1 is especially busy with LGWR ( system I/O ), database 2 starts to show I/O waits, seen as more and more active sessions with the dark-blue I/O color in Grid Control. The effect stops as soon as database 1 reduces (finishes) its LGWR activity.


Another strange effect we've seen is coming from another database "database3". This database is yet on another server (IBM pSeries-5), but also has its database on the same DS6800 storage server.
This database, an "old" 9.2.0.8 database, used to do a forced log-switch every 15 minutes, by means of a crontab script. This forced log-switch showed up as spikes in Grid Control of this database ( the orange "commit" color).
Again these spikes also reflect themselves as I/O waits on the RAC (database 2).
Even if we changed the execute time in the crontab of database 3, the spikes on database 2 moved accordingly. Because the application running on the RAC is our Webshop, suffers badly from these I/O waits, we turned these forced log-switches of.


Now, has anybody of you DBA's or Storage guys out there seen this kind of behaviour. The storage guys here at the company can't believe that database affect each other like this by means of the SAN. The SAN has a huge cache, it is tuned, etc...etc.., so theoretically this kind of behaviour is not possible.
In my opinion the LGWR "log-sync" actions cause the storage server to wait for a synchroneous action from the database, thus freezing or delaying all other actions with it.

Any other ideas are welcome. Solutions of course are even more welcome

Thanks



Add On: As of this week ( oct. 5th 2011 ), the storage of database 2 has been moved to a newly purchased SAN storage, totally isolating it from the other storage ( and thus the othter databases).
The inheritence effect now is gone !!


Tuesday, May 3, 2011

Grid Control 11g screens freeze

Last year we installed Oracle Enterprise Manager Grid Control 11g.
The main reason was the fact that we were (still) on version 10.2.0.3 and we experienced some troubles with that version. Also Oracle anounced the problem with this version regarding the end-of-life of the SSL-certificate on Dec 31th 2010.

The installation went very smoothly. See also my other posts about this subject.

The only problem we encountered with this new version was that the 'Performance Graphs' froze after a database restart of one of the monitored databases.
It is expected that there is no response anymore for the graph to build up, after a database shutdown, but you would expect that the graph just shows nothing, but keeps advancing in time.

This was not the case.

Also after the target database was restarted, the graph still kept showing the status before the shutdown. Nothing could be done to get over this, except a complete restart of Grid Control.
After such a complete restart, the graph refreshed and started showing real-time information again.


We started a Service Request at Oracle Support for this and thanks to DITO ( Demo It To Oracle) we were able to clearly show to Oracle support what happened. With CamStudio I recorded a complete movie about the problem and send it to Oracle Support.
They immediately recognized the problem as being bug 10307099 and supplied a patch for it.

This solved the problem.
I was very happy to see the power of CamStudio to help me in clearly showing/explaining the problem to Oracle Support. This encouraged me to use it in other cases as well. And it is free !!!


Monday, March 21, 2011

After node restart RAC cluster does not start

Today we have been very busy trying to figure out why after a system restart the RAC nodes didn't want to start the cluster stack anymore.

The biggest problem we had was there was absolutely no logging at all !!
The system seemed totally not startable.

All checks to get the cluster online where -in our opinion- successfull.
We were able to create an ocrdump, and query the voting disks.

Finally we started to dig into the processes that were running.
We found a process called:

/etc/init.cssd startcheck

This process seemed to hang. Waiting for something.
Looking into the script we read that this function checked for all needed resources to be available, and as long as they were not available, went for a sleep of 60 seconds.
We detected it used the AIX logging system, of which we however did not "catch" the logging messages by means of the syslog.conf file.

After a while we decided to start a "startcheck" of our own, using the debug options op de Korn shell:

# ksh -x /etc/init.cssd startcheck

The result was that it showed that there was some logging in files in /tmp calles cssxxxx where xxxx is a numeric value.

Looking into the last one of these files, it showed that the votingdisks where missing.
This is contradiction to the

# $CRS_HOME/bin/crsctl query votedisk

which still showed everything was fine with the voting disks.

The real problem was in the fact that the 'crsctl' command, just looks for the device files to be present and to be readable.
If there were/are actually disks attached to these device files was not checked.

It turned out that a major network problem we've had two days earlier disturbed something on the SAN network, making it not possible anymore to connect the logical drives to this server.


What I totally do not like here is the total lack of any logging in the usual logging location of CRS, being $CRS_HOME/log/.
Maybe in future releases ( we are using 11.1.0.7) this problem is better handled.


Thursday, January 20, 2011

Copying Archived Logfile from ASM to a file system

Following OTN tread http://forums.oracle.com/forums/thread.jspa?threadID=2154801&tstart=0 about copying archived logfiles from ASM to a UNIX filesystem, I am posting a script we used in the past.
Nowadays we use RMAN to get the files from ASM directly into a TSM diskpool, but back then we used another method for putting the archived logfiles on backup, which resulted in the need for this script:


# Script to copy archived logfiles from ASM to a file system on UNIX
# In this case all archived logfiles found in v$archived_log are copied to a staging mount point /opt/asmprd/ar01
# The username/password used to access the database need to be set before use of this script.
# For sake of security I removed these lines from this script
#
# The script needs to be run as owner of the ASM instance
#
#
export ORACLE_SID=WCSPRD
export FSFULL=FALSE
export ORACLE_BASE=/opt/$LOGNAME/ora
export CRS_HOME=/opt/crsprd/ora/11.1.0
export ORA_CRS_HOME=$CRS_HOME
export ASM_HOME=/opt/asmprd/ora/11.1.0
export ORA_ASM_HOME=$ASM_HOME
export ORACLE_HOME=$ORACLE_BASE/11.1.0
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$CRS_HOME/lib:$ORACLE_HOME/lib32:$CRS_HOME/lib32
export LIBPATH=$LD_LIBRARY_PATH
export PATH=$ORACLE_HOME/bin:$CRS_HOME/bin:$PATH
export TNS_ADMIN=/opt/oraadmin/network/admin
export ORACLE_SID=+ASM
export HOST=`hostname`
export LOG_FILE=/var/adm/log/StageArchives_${HOST}.log
export A_FILE_COPIED=FALSE
#
# Optionally turn Debug on to see what the script does
export Debug=false
#
# Functions
#
Log() {
echo "`date +%Y/%m/%d` (`date +%H:%M:%S`) : $1" | tee -a $LOG_FILE
} #End Log
#
#
Continue()
{
echo "Continue....? [n]\c"; read ans
case $ans in
y|Y) echo "\n\nContinuing...."
break;;
*) exit 0
break;;
esac
unset ans
}
#
#
GetArchiveList()
{
echo "set termout off heading off pages 9999 lines 70 feedback off" > /tmp/getArchives.$$.sql
echo "column name format A70;" >> /tmp/getArchives.$$.sql
echo "spool /tmp/archivelist.lst;" >> /tmp/getArchives.$$.sql
echo "select trim(name) from v\$archived_log where deleted='NO' and archived='YES';" >> /tmp/getArchives.$$.sql
echo "quit;" >> /tmp/getArchives.$$.sql
$ORACLE_HOME/bin/sqlplus -S ${WCSPRD_SYS}@WCSPRD as sysdba @/tmp/getArchives.$$.sql

if [ "$Debug" = "true" ]
then
cat /tmp/getArchives.$$.sql
Continue
fi
rm -f /tmp/getArchives.$$.sql
}


StoreArchivesInFS()
{
ArchName=$1
ArchShortName=`basename $1`
ArchDest=/opt/asmprd/ar01/${ArchShortName}
AvailFS=`df -gv /opt/asmprd/ar01 | grep -v Free| awk '{print $5}' | cut -d'%' -f1`
Log "Archiving $ArchDest"

if [ "$Debug" = "true" ]
then
echo "AvailFS = $AvailFS"
echo "Commando = $ASM_HOME/bin/asmcmd cp $ArchName $ArchDest"
Continue
fi
if [ $AvailFS -lt 90 ]
then
echo "$ASM_HOME/bin/asmcmd cp $ArchName $ArchDest"
$ASM_HOME/bin/asmcmd cp $ArchName $ArchDest >/dev/null 2>&1

if [ -f "$ArchDest" ]
then
export A_FILE_COPIED=TRUE
$ASM_HOME/bin/asmcmd rm $ArchName >/dev/null 2>&1
Log "$ArchName has been deleted from ASM."
else
echo "ArchDest=$ArchDest"
ls -l $ArchDest
fi
else
export FSFULL=TRUE
fi
}


RmanCrosscheck()
{
# Crosscheck and clean repository
#Log "Crosscheck and clean repository"
$ORACLE_HOME/bin/rman target ${WCSPRD_SYS}@WCSPRD rcvcat ${TSMORA_TSMCAT}@tsmora cmdfile StageArchives.cmd
d
}


# Main Program
PROG_PID=$$
RUNNING=`ps -ef| grep StageArchives.sh | grep -v ${PROG_PID} | grep -v grep | awk '{print $2}'`
if [ -n "$RUNNING" ]
then
Log "Old Running session(s) found with PID(s): $RUNNING "
ps -ef| grep ${RUNNING} | grep -v grep >> $LOG_FILE
for tpid in $RUNNING
{
kill $tpid
}
fi
GetArchiveList
cat /tmp/archivelist.lst | while read ARCHFILE
do
if [ ! "$ARCHFILE" = "" ]
then
StoreArchivesInFS $ARCHFILE
if [ "$FSFULL" = "TRUE" ]
then
#
# Insert some action here to clean up the destination file system
fi
fi
done
#
#
#

#
# Use Rman to update the recovery catalog
RmanCrosscheck

unset FSFULL ARCHFILE
#rm -f /tmp/archivelist.lst


The script contains a call to 'rman' using the command file 'StageArchives.cmd'
This file looks like:

allocate channel for maintenance type disk;
crosscheck archivelog all;
delete noprompt expired archivelog all;
quit;


Please note that I am not able to give any support to anyone who wants to use this script.
You are allowed to copy and use it, but for the remainder you're on your own.


Success!

Tuesday, January 11, 2011

Installing Oracle Agent11g

In this thread the installation of Oracle's Management Agent 11g for Grid Control 11g is explained.

The agent installation is not a manual install anymore. Oracle has decided to only provide in silent installs.
The available options are:

- Push install from the OMS server ( requires ssh to be configured on the client )
- Pull install from the client ( requires wget to be installed )
- Silent install from an Agent download.

Unfortunately the first two options are not suitable for AIX servers. With a push install from the OMS server, the push installer tries to start a ssh-session on the server to be installed, using the ssh binary in a Linux style path ( /usr/local/bin/ssh ).

The second installation type is also unfortunately more Linux minded. Wget, the Linux source download tool, is by default not available on AIX.

The last method is the easiest, because no extra software ( ssh, wget) needs to be installed or configured.
You can download the Agent11g installer from:


After downloading, FTP the agent 11g install file to the client to be installed.
Unzip it into a temporary directory.


Create an Installation response file, with the contents like below. A sample configuration file can be found in /aix/response.
The to be changed values are marked in bold-white.

####################################################################
## copyright (c) 1999, 2010 Oracle. All rights reserved. ##
## ##
## Specify values for the variables listed below to customize ##
## your installation. ##
## ##
## Each variable is associated with a comment. The comment ##
## identifies the variable type. ##
## ##
## Please specify the values in the following format: ##
## ##
## Type Example ##
## String "Sample Value" ##
## Boolean True or False ##
## Number 1000 ##
## StringList {"String value 1","String Value 2"} ##
## ##
## The values that are given as need to be ##
## specified for a silent installation to be successful. ##
## ##
## ##
## This response file is generated by Oracle Software ##
## Packager. ##
###################################################################

RESPONSEFILE_VERSION=2.2.1.0.0

#################################################################################
#Inputs for Oracle Configuration Manager
#################################################################################
#-------------------------------------------------------------------------------
#SECURITY_UPDATES_VIA_MYORACLESUPPORT: Whether security updates are reqi-
# uired via My Oracle Support
#DECLINE_SECURITY_UPDATES: Whether security updates should be declined.
#MYORACLESUPPORT_USERNAME & MYORACLESUPPORT_PASSWORD: User name and passw-
# ord for My Oracle Support access, these will be effected only when SECU-
# RITY_UPDATES_VIA_MYORACLESUPPORT=true
#COLLECTOR_SUPPORTHUB_URL: If there is not direct connection and Support
# hub is deployed. Need to specify that URL.
#-------------------------------------------------------------------------------
SECURITY_UPDATES_VIA_MYORACLESUPPORT=False
DECLINE_SECURITY_UPDATES=True

################################################################################
#PARAMETERS FOR SOFTWARE UPDATES
################################################################################
#-------------------------------------------------------------------------------
#INSTALL_UPDATES_SELECTION:
# Option 1. If you want to skip the software updates, provide
# INSTALL_UPDATES_SELECTION="skip"
# Option 2. If you have already downloaded the updates then provide
# INSTALL_UPDATES_SELECTION="staged"
# If you choose the Option 2 then make sure you also provide STAGE_LOCATION.
#STAGE_LOCATION: Stage location for software updates. It will be effected
# only when INSTALL_UPDATES_SELECTION is set to "staged"
#-------------------------------------------------------------------------------
INSTALL_UPDATES_SELECTION="skip"
STAGE_LOCATION=

################################################################################
#PROXY DETAILS FOR SECURITY UPDATES/ SOFTWARE UPDATES
################################################################################
#-------------------------------------------------------------------------------
#PROXY_USER: User name for proxy access.
#PROXY_PWD: Password for proxy access.
#PROXY_HOST: Server providing proxy access.
#PROXY_PORT: Port for proxy access.
#------------------------------------------------------------------------------
PROXY_USER=
PROXY_PWD=
PROXY_HOST=
PROXY_PORT=

################################################################################
#Various inputs required for Installation and Configuration
################################################################################
#-------------------------------------------------------------------------------
#ORACLE_AGENT_HOME_LOCATION: The location of oracle agent home.
#Agent home "agent11g" would be created under the directory specified by
#ORACLE_AGENT_HOME_LOCATION.
#-------------------------------------------------------------------------------
ORACLE_AGENT_HOME_LOCATION=/opt/wcst01/ora/

#-------------------------------------------------------------------------------
#DEINSTALL_LIST: List of components to be deinstalled during a
# deinstall session.The following choices are available. The value should
# contain only one of these choices.The choices are of the form
# Internal Name, Version : External name. Please use the internal name and
# version while specifying the value.
# oracle.sysman.top.agent, 10.2.0.0.0 : Oracle Management Agent 10.2.0.0.0
# oracle.sysman.top.em_seed, 10.2.0.0.0 : Oracle Enterprise Manager Repository Database 10.2.0.0.0
# oracle.sysman.top.oms, 10.2.0.0.0 : Oracle Enterprise Manager Grid Console 10.2.0.0.0
#-------------------------------------------------------------------------------
DEINSTALL_LIST={"oracle.sysman.top.agent","11.1.0.1.0"}

#-------------------------------------------------------------------------------
#b_silentInstall: The user should not change this value
#CLUSTER_NODES: This variable represents the cluster node names
# selected by the user for installation.
#-------------------------------------------------------------------------------
b_silentInstall=true
CLUSTER_NODES=vtl1h121ps,vtl1h123ps

#-------------------------------------------------------------------------------
#TOPLEVEL_COMPONENT: The top level component to be installed in the
# current session.Value should contain only one of the following available
# choices.The choices are of the form Internal Name, Version : External name.
# Please use the internal name and version while specifying the value.
# oracle.sysman.top.agent, 10.3.0.0.0 : Oracle Management Agent 10.3.0.0.0
# Example: TOPLEVEL_COMPONENT = {"oracle.sysman.top.agent","10.3.0.0.0"}
#-------------------------------------------------------------------------------
TOPLEVEL_COMPONENT={"oracle.sysman.top.agent","11.1.0.1.0"}

#-------------------------------------------------------------------------------
#SELECTED_LANGUAGES: Languages in which the components will be installed.
# The following choices are available. The value should contain only one of
# these choices.The choices are of the form Internal Name : External name.
# Please use the internal name while specifying the value.
# en, : English
# de, : German
# es, : Latin American Spanish
# es_MX, : Mexican Spanish
# es_ES, : Spanish
# fr, : French
# fr_CA, : Canadian French
# it, : Italian
# iw, : Hebrew
# ja, : Japanese
# ko, : Korean
# pt_BR, : Brazilian Portuguese
# zh_CN, : Simplified Chinese
# zh_TW, : Traditional Chinese
#Example: SELECTED_LANGUAGES = {"en"}
#-------------------------------------------------------------------------------
SELECTED_LANGUAGES={"en"}

#-------------------------------------------------------------------------------
#COMPONENT_LANGUAGES: Languages in which the components will be installed.
# The following choices are available. The value should contain only one of
# these choices.The choices are of the form Internal Name : External name.
# Please use the internal name while specifying the value.
# en, : English
# de, : German
# es, : Latin American Spanish
# es_MX, : Mexican Spanish
# es_ES, : Spanish
# fr, : French
# fr_CA, : Canadian French
# it, : Italian
# iw, : Hebrew
# ja, : Japanese
# ko, : Korean
# pt_BR, : Brazilian Portuguese
# zh_CN, : Simplified Chinese
# zh_TW, : Traditional Chinese
#Example: COMPONENT_LANGUAGES = {"en"}
#-------------------------------------------------------------------------------
COMPONENT_LANGUAGES={"en"}

#-------------------------------------------------------------------------------
#OMS_HOST: OMS host info required to connect to OMS
#OMS_PORT: OMS port info required to connect to OMS
#AGENT_REGISTRATION_PASSWORD: Agent Registration Password needed to
# establish a secure connection to the OMS.
#-------------------------------------------------------------------------------
OMS_HOST=[Your FQN OMS server]
OMS_PORT=1159
#OMS_PORT=4889
AGENT_REGISTRATION_PASSWORD=[Agent registration password entered during OMS installation]

#-------------------------------------------------------------------------------
#s_agentSrvcName: Sets the agent Service Name and this variable can be
# used to overrite the agent service name calculated by the install. This is
# required for only Windows.
# Example:
# s_agentSrvcName = "Oracleagent11gAgent" ; default value
# s_agentSrvcName = "GridAgent" ; User specified value
#-------------------------------------------------------------------------------
#s_agentSrvcName=""

####################################################################################
#Please Don't change the values of these variables
####################################################################################
#-------------------------------------------------------------------------------
#FROM_LOCATION: Complete path to the products.xml.
#b_upgrade: "whether it is Upgrade or not"
#EM_INSTALL_TYPE: install type
#-------------------------------------------------------------------------------
FROM_LOCATION="[Absolute path to products.xml]"
b_upgrade=false
EM_INSTALL_TYPE="AGENT"


Note that the above response file is from an installation where the to be installed server was a two-node RAC. When provided with the cluster names, the installer is able to install all nodes in one time.
When installing a RAC cluster, these options are mandatory. When not, comment out the option "CLUSTER_NODES"

Also note that in the above example there are two entries for OMS_PORT. One is commented out. Use port 1159 for SSL communication between the agent and the OMS server. The agent is in that case automatically secured. Use port 4889 for non-SSL communication.
The to be used ports here are the defaults. During the OMS installation, the port list of the OMS server is shown. Make sure you use the ports that match your OMS server.

The last entry that needs some attention is the one at the bottom of the file. However in the example a relative path has been given, the installer needs an absolute path instead.



When ready, start the installer using the command:

cp aix/response/additionalAgent.rsp aix/agent
cd aix/agent
./runInstaller [-CLUSTER_NAME=] -silent -responseFile -noconfig


The is the name of the RAC-cluster.
This name can be obtained by looking into an OCR-dump file and look for the entry:

[SYSTEM.css.clustername]
ORATEXT :

How to generate an OCR dump file can be found here .

When the installation finishes, run the given 'root.sh' script.


After that the agents need to be configured on each host ( in a cluster installation do this host by host )
Go to the AGENT_HOME/bin directory and execute:

./agentca -f -n -c -i /etc/oraInst.loc

When installing a non-RAC system, the -n and -c options can be left out.

This finishes the installation of the Oracle Agent 11g on AIX.
It might be necessary to 'upload' the first configuration manually, before the agent will appear in the OMS console. Use the following command to do so:

./emctl upload agent ( for a RAC do this on each node )