Job Failover In RAC

In RAC, if a node crashes, a jobs running on that node can automatically failover to another node provided the job has been mapped to a service by means of job class.

Let’s demonstrate this concept.

Current scenario:

Name of the cluster : cluster01.example.com
No. of nodes : 2 (host01, host02)
Name of the database : orcl

– Create  demo table hr.test 

 SQL> create table hr.test(dt date);

– Create a  database service job_serv with

. preferred instance : orcl1
. available instance : orcl2

[oracle@host01 ~]$ srvctl add service -s job_srv -d orcl -a orcl2 -r orcl1
[oracle@host01 ~]$ srvctl status service -d orcl
Service job_serv is running on instance(s) orcl1
Service job_srv is not running.

preferred node : host01, available node host02

– Create tnsnames.ora entry for service job_serv

job_srv = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = cluster01-scan.cluster01.example.com)(PORT = 1521))(LOAD_BALANCE = YES)(CONNECT_DATA = (SERVER = DEDICATED)(SERVICE_NAME = job_srv))

–  Start the service and check that service is running on host01 (preferred instance)

[oracle@host01 ~]$ srvctl start service -d orcl
CRS-5702: Resource 'ora.orcl.job_serv.svc' is already running on 'host01'

[oracle@host01 ~]$ srvctl status service -d orcl
Service job_srv is running on instance(s) orcl1

– Create job class JC1 which maps to the service job_serv

SQL>BEGIN
sys.dbms_scheduler.create_job_class(
logging_level => DBMS_SCHEDULER.LOGGING_RUNS,
service => 'job_srv',
job_class_name => '"JC1"');
END;
/

– Create a job JOB1 belonging to job class JC1 which inserts a row in table hr.test every 3  seconds

SQL>BEGIN
sys.dbms_scheduler.create_job(
job_name => '"SYS"."JOB1"',
job_type => 'PLSQL_BLOCK',
job_action => 'begin
insert into hr.test values (sysdate);
end;',
repeat_interval => 'FREQ=SECONDLY;INTERVAL=3',
start_date => systimestamp at time zone 'Asia/Calcutta',
job_class => '"JC1"',
auto_drop => FALSE,
enabled => TRUE);
END;
/

– Check that  job is running i.e. rows are getting inserted  in table hr.test 

SQL> select to_char(dt, 'dd-mon-yyyy hh24:mi:ss') from hr.test;

TO_CHAR(DT,'DD-MON-YYYYHH24:M
-----------------------------
11-jan-2014 04:12:57
11-jan-2014 04:12:47
11-jan-2014 04:12:49

...

11-jan-2014 04:13:27
11-jan-2014 04:13:30
11-jan-2014 04:13:33

– Check that job is running  the on same node as the  service job_serv i.e. host01 

SQL> col job_name for a7
 col status for a15 
select JOB_NAME, status , INSTANCE_ID from ALL_SCHEDULER_JOB_RUN_DETAILS where job_name = 'JOB1';
 JOB_NAM STATUS          INSTANCE_ID 
------- --------------- ----------- 
JOB1    SUCCEEDED                 1 
JOB1    SUCCEEDED                 1 
JOB1    SUCCEEDED                 1
..

– To simulate the crash of host01, kill the database instance on host01

[root@host01 ~]# ps -ef |grep pmon_
grid      5493     1  0 03:43 ?        00:00:00 asm_pmon_+ASM1
oracle    6755     1  0 03:47 ?        00:00:00 ora_pmon_orcl1
root     25051  6336  0 04:17 pts/1    00:00:00 grep pmon_
[root@host01 ~]# kill -9 6755

– Check that job has failed over to host02

[oracle@host01 ~]$ srvctl status service -d orcl
Service job_srv is not running.
[oracle@host01 ~]$ srvctl status service -d orcl
Service job_srv is running on instance(s) orcl2

select JOB_NAME, status , INSTANCE_ID from ALL_SCHEDULER_JOB_RUN_DETAILS where job_name = 'JOB1' ;

JOB_NAM STATUS          INSTANCE_ID
------- --------------- -----------
JOB1    SUCCEEDED                 1
JOB1    SUCCEEDED                 1
JOB1    SUCCEEDED                 1
...

JOB1    SUCCEEDED                 2
JOB1    SUCCEEDED                 2
JOB1    SUCCEEDED                 2

– Check that the job is still running i.e. records are still getting inserted in the table hr.test

SQL> select to_char(dt, 'dd-mon-yyyy hh24:mi:ss') from hr.test;

TO_CHAR(DT,'DD-MON-YYYYHH24:M
-----------------------------
11-jan-2014 04:12:57
11-jan-2014 04:12:47
11-jan-2014 04:12:49

...

11-jan-2014 04:15:27
11-jan-2014 04:15:30
11-jan-2014 04:15:33
I hope this post was useful.
Your comments and sugegstions are always welcome.
————————————————————————————————————————
Related links:

Home
11G R2 RAC Index

11g R2 RAC : Services

 

—————————-

7 thoughts on “Job Failover In RAC

  1. Is a job class required for failover to occur? I have a job that hangs with DBMS_JOB so I implemented DBMS_SCHEDULER with max duration. My sniper job worked in test but in prod the sniper fails with ORA-16639: specified instance inactive or currently unavailable. The job remains “running”. If I log onto the physical node the job was submitted under, Oracle says the request has been sent but the job never stops.

    1. Hello Patrick

      As far as I know, job class is needed for a job to fail over. When you map the job class to the service and the node hosting the job/service crashes, service fails over to another node and in the process all the jobs belonging to the corresponding job class also fail over along with the service.

      Hope it helps

      Anju

  2. getting error ORA-12514: TNS:listener does not currently know of service requested in connect descriptor when connecting through client to service “job_srv”.
    Is my service entry in tnsnames.ora is correct?

    [oracle@racdev1 admin]$ vi tnsnames.ora

    job_srv.world =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan.dev.com)(PORT = 1521))
    (CONNECT_DATA = (SERVER = DEDICATED)(SERVICE_NAME = job_srv))
    )

    [oracle@racdev1 admin]$ srvctl config service -d dev -s job_srv

    Service name: job_srv
    Service is enabled
    Server pool: dev_job_srv
    Cardinality: 1
    Disconnect: false
    Service role: PRIMARY
    Management policy: AUTOMATIC
    DTP transaction: false
    AQ HA notifications: false
    Failover type: NONE
    Failover method: NONE
    TAF failover retries: 0
    TAF failover delay: 0
    Connection Load Balancing Goal: LONG
    Runtime Load Balancing Goal: NONE
    TAF policy specification: NONE
    Edition:
    Preferred instances: dev1
    Available instances: dev2

Your comments and suggestions are welcome!