Apache Oozie - Workflow engine scheduler for Hadoop Jobs in the cluster.

Oozie is a workflow scheduler to manage all the different jobs that are running simultaneously in the Hadoop cluster. Oozie workflows are Directed Acyclical Graphs of actions or DAGs, that is triggered by frequency and data availability. Oozie is integrated with all the tools in the Hadoop ecosystem - MapReduce, PIG, Hive, Sqoop. It also supports system specific jobs such as Java coding programs and shell scripts. Oozie monitors the capability and tracks the failure of the jobs running in the Hadoop cluster.

How Oozie works?

  • Oozie is used to schedule one or more Hadoop jobs on a regular basis. It is mostly used in production environment to schedule recurring jobs. Oozie is written in command lines around jobs or a bundle of jobs. There is a control dependency - which means one job cannot run - until the first job is completed. The workflow can start jobs in a remote system and the system sends a notification to the Oozie server - once the job is completed.
  • How Zookeeper works?
  • Oozie has control flow nodes and action nodes. Control flow nodes control the start and end of the workflow and also manages the execution in the workflow and action nodes trigger the jobs in the scheduler. Action nodes specify the type of action that needs to be performed - MapReduce jobs or Scripts.
  • Oozie was developed as an alternative to manual and ad-hoc approaches to shell scripts, job control that were there to schedule jobs in the workflow. Oozie detects the completion of jobs by two actions - callback and polling.

Advantages of Oozie

  • Oozie is scalable and reliable to monitor jobs in the Hadoop cluster.
  • Oozie supports various jobs in the Hadoop ecosystem - like MapReduce, Pig, Hive, streaming and also Java based applications.
  • Oozie has an extensible architecture which supports grid programming paradigms.

Oozie Blogs

Zookeeper and Oozie: Hadoop Workflow and Cluster Managers
Apache Oozie is the Java based web application used for Job scheduling. It combines the multistage Hadoop job in a single job, which can be termed as Oozie Job. Click to read more.
Hadoop Components and Architecture:Big Data and Hadoop Training
Oozie is a workflow scheduler where the workflows are expressed as Directed Acyclic Graphs. Oozie runs in a Java servlet container Tomcat and makes use of a database to store all the running workflow instances, their states ad variables along with the workflow definitions to manage Hadoop jobs (MapReduce, Sqoop, Pig and Hive). Click to read more.

Oozie Tutorials

Fundamentals of Oozie
In this introductory tutorial, OOZIE web-application has been introduced. A workflow engine has been developed for the Hadoop framework upon which the OOZIE process works with use of a simple example consisting of two jobs. Click to read more.

Oozie Interview Questions

  1. On what concept the Hadoop framework works?

    • Hadoop Framework works on the following two core components -
      1. HDFS
        Hadoop Distributed File System is the java based file system for scalable and reliable storage of large datasets. Data in HDFS is stored in the form of blocks and it operates on the Master Slave Architecture.
      2. Hadoop MapReduce
        This is a java based programming paradigm of Hadoop framework that provides scalability across various Hadoop clusters. MapReduce distributes the workload into various tasks that can run in parallel. Hadoop jobs perform 2 separate tasks- job. The map job breaks down the data sets into key-value pairs or tuples. The reduce job then takes the output of the map job and combines the data tuples to into smaller set of tuples. The reduce job is always performed after the map job is executed. Click to read more

Oozie Slides

Oozie Videos

Oozie Q&A

  1. Oozie service

    • oozie in cloudera is still not working. i followed the link shared by Rakesh. downloading and unzipping ext-2.2.zip in /var/lib/oozie/libext/ still didn't work. Click to read answer
  2. Oozie is not installed in CDH4

    • Oozie is not installed in CDH4. Will you pls provide the binaries and installation instructions to install it? Click to read answer
  3. Unable to install oozie UI

    • I am unable to unzip the zip file in /var/lib/oozie folder. I have downloaded the zip file in /home/cloudera/Downloads folder. But it says unable to find the file. Please see the error screenshot below. Please suggest a solution. Thanks. Click to read answer

Oozie Assignments

cd /var/lib/oozie/

sudo chown oozie:oozie ext-2.2.zip 

sudo -u oozie unzip ext-2.2.zip 
hadoop dfs -copyFromLocal WordCountTest /user/cloudera/

oozie job -oozie http://localhost:11000/oozie -config WordCountTest/job.properties -run

hadoop dfs -copyFromLocal WordCountTest_TimeBased/ /user/cloudera/

oozie job -oozie http://localhost:11000/oozie -config WordCountTest_TimeBased/coordinator.properties -run
oozie job -oozie http://localhost:11000/oozie -info 0000023-140510051402767-oozie-oozi-W(Replace with Workflow Job Id)
oozie job -oozie http://localhost:11000/oozie -kill 0000023-140510051402767-oozie-oozi-W(Replace with Workflow Job Id)
oozie job -oozie http://localhost:11000/oozie -log 0000023-140510051402767-oozie-oozi-W(Replace with Workflow Job Id)
oozie job -oozie http://localhost:11000/oozie
processing person-icon