Job Failover In RAC

In RAC, if a node crashes, a jobs running on that node can automatically failover to another node provided the job has been mapped to a service by means of job class.

Let’s demonstrate this concept.

Current scenario:

Name of the cluster : cluster01.example.com
No. of nodes : 2 (host01, host02)
Name of the database : orcl

– Create  demo table hr.test 

 SQL> create table hr.test(dt date);

– Create a  database service job_serv with

. preferred instance : orcl1
. available instance : orcl2

[oracle@host01 ~]$ srvctl add service -s job_srv -d orcl -a orcl2 -r orcl1
[oracle@host01 ~]$ srvctl status service -d orcl
Service job_serv is running on instance(s) orcl1
Service job_srv is not running.

preferred node : host01, available node host02

– Create tnsnames.ora entry for service job_serv

job_srv = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = cluster01-scan.cluster01.example.com)(PORT = 1521))(LOAD_BALANCE = YES)(CONNECT_DATA = (SERVER = DEDICATED)(SERVICE_NAME = job_srv))

–  Start the service and check that service is running on host01 (preferred instance)

[oracle@host01 ~]$ srvctl start service -d orcl
CRS-5702: Resource 'ora.orcl.job_serv.svc' is already running on 'host01'

[oracle@host01 ~]$ srvctl status service -d orcl
Service job_srv is running on instance(s) orcl1

– Create job class JC1 which maps to the service job_serv

SQL>BEGIN
sys.dbms_scheduler.create_job_class(
logging_level => DBMS_SCHEDULER.LOGGING_RUNS,
service => 'job_srv',
job_class_name => '"JC1"');
END;
/

– Create a job JOB1 belonging to job class JC1 which inserts a row in table hr.test every 3  seconds

SQL>BEGIN
sys.dbms_scheduler.create_job(
job_name => '"SYS"."JOB1"',
job_type => 'PLSQL_BLOCK',
job_action => 'begin
insert into hr.test values (sysdate);
end;',
repeat_interval => 'FREQ=SECONDLY;INTERVAL=3',
start_date => systimestamp at time zone 'Asia/Calcutta',
job_class => '"JC1"',
auto_drop => FALSE,
enabled => TRUE);
END;
/

– Check that  job is running i.e. rows are getting inserted  in table hr.test 

SQL> select to_char(dt, 'dd-mon-yyyy hh24:mi:ss') from hr.test;

TO_CHAR(DT,'DD-MON-YYYYHH24:M
-----------------------------
11-jan-2014 04:12:57
11-jan-2014 04:12:47
11-jan-2014 04:12:49

...

11-jan-2014 04:13:27
11-jan-2014 04:13:30
11-jan-2014 04:13:33

– Check that job is running  the on same node as the  service job_serv i.e. host01 

SQL> col job_name for a7
 col status for a15 
select JOB_NAME, status , INSTANCE_ID from ALL_SCHEDULER_JOB_RUN_DETAILS where job_name = 'JOB1';
 JOB_NAM STATUS          INSTANCE_ID 
------- --------------- ----------- 
JOB1    SUCCEEDED                 1 
JOB1    SUCCEEDED                 1 
JOB1    SUCCEEDED                 1
..

– To simulate the crash of host01, kill the database instance on host01

[root@host01 ~]# ps -ef |grep pmon_
grid      5493     1  0 03:43 ?        00:00:00 asm_pmon_+ASM1
oracle    6755     1  0 03:47 ?        00:00:00 ora_pmon_orcl1
root     25051  6336  0 04:17 pts/1    00:00:00 grep pmon_
[root@host01 ~]# kill -9 6755

– Check that job has failed over to host02

[oracle@host01 ~]$ srvctl status service -d orcl
Service job_srv is not running.
[oracle@host01 ~]$ srvctl status service -d orcl
Service job_srv is running on instance(s) orcl2

select JOB_NAME, status , INSTANCE_ID from ALL_SCHEDULER_JOB_RUN_DETAILS where job_name = 'JOB1' ;

JOB_NAM STATUS          INSTANCE_ID
------- --------------- -----------
JOB1    SUCCEEDED                 1
JOB1    SUCCEEDED                 1
JOB1    SUCCEEDED                 1
...

JOB1    SUCCEEDED                 2
JOB1    SUCCEEDED                 2
JOB1    SUCCEEDED                 2

– Check that the job is still running i.e. records are still getting inserted in the table hr.test

SQL> select to_char(dt, 'dd-mon-yyyy hh24:mi:ss') from hr.test;

TO_CHAR(DT,'DD-MON-YYYYHH24:M
-----------------------------
11-jan-2014 04:12:57
11-jan-2014 04:12:47
11-jan-2014 04:12:49

...

11-jan-2014 04:15:27
11-jan-2014 04:15:30
11-jan-2014 04:15:33
I hope this post was useful.
Your comments and sugegstions are always welcome.
————————————————————————————————————————
Related links:

Home
11G R2 RAC Index

11g R2 RAC : Services

 

—————————-

5 thoughts on “Job Failover In RAC

  1. Is a job class required for failover to occur? I have a job that hangs with DBMS_JOB so I implemented DBMS_SCHEDULER with max duration. My sniper job worked in test but in prod the sniper fails with ORA-16639: specified instance inactive or currently unavailable. The job remains “running”. If I log onto the physical node the job was submitted under, Oracle says the request has been sent but the job never stops.

    1. Hello Patrick

      As far as I know, job class is needed for a job to fail over. When you map the job class to the service and the node hosting the job/service crashes, service fails over to another node and in the process all the jobs belonging to the corresponding job class also fail over along with the service.

      Hope it helps

      Anju

Your comments and suggestions are welcome!