环境
同类产品:Azkaban
一、简介 特点: 架构:
二、安装配置
Oozie配置 Oozie共享库 web管理地址 Hue UI: 三、客户端常用命令
#启动任务: [root@node1 oozie] oozie job -oozie http://ip:11000/oozie/ -config job.properties –run #提交任务: [root@node1 oozie] oozie job -oozie http://ip:11000/oozie/ -config job.properties –submit #开始任务: [root@node1 oozie] oozie job -oozie http://ip:11000/oozie/ -config job.properties –start 0000003-150713234209387-oozie-oozi-W #停止任务: [root@node1 oozie] oozie job -oozie http://ip:11000/oozie/ -kill 0000002-150713234209387-oozie-oozi-W #查看任务执行情况: [root@node1 oozie] oozie job -oozie http://ip:11000/oozie/ -config job.properties –info 0000003-150713234209387-oozie-oozi-W 注意:启动任务其实包含:提交任务和开始任务,两个命令合成一个。
四、Oozie任务配置 参考: Hue中使用Oozie的workflow执行MR过程
2、通过配置文件使用 2.2workflow.xml WorkFlow EL – HDFS EL (3)节点 – B、动作节点 <decision name="[NODE-NAME]">
五、示例 (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/workflow/oozie/shell
注意:job.properties文件可以不上传到hdfs中,是在执行oozie job ...... -config时,批定的linux本地路径 (2)编写workflow.xml <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf"> <start to="shell-node"/> <action name="shell-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>echo</exec> <argument>my_output=Hello Oozie</argument> <capture-output/> </shell> <ok to="check-output"/> <error to="fail"/> </action> <decision name="check-output"> <switch> <case to="end"> ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'} </case> <default to="fail-output"/> </switch> </decision> <kill name="fail"> <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <kill name="fail-output"> <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message> </kill> <end name="end"/> </workflow-app>
文件上传到HDFS路径:hdfs://master:8020/user/workflow/oozie/shell 或者直接在Hue文件浏览器下创建和编辑workflow.xml (3)CLI 执行启动任务命令,返回一个job ID 在UI里查看: 点击查看详情: 查看Job DAG
(1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定oozie使用系统的共享目录 oozie.use.system.libpath=true #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/fs/workflow.xml
(2)编写workflow.xml <workflow-app xmlns="uri:oozie:workflow:0.2" name="fs"> <start to="fs-node"/> <action name="fs-node"> <fs> <delete path='/home/kongc/oozie'/> <mkdir path='/home/kongc/oozie1'/> <move source='/home/kongc/spark-application' target='/home/kongc/oozie1'/> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
3、Oozie Sqoop (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定oozie使用系统的共享目录 oozie.use.system.libpath=true #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/sqoop
#编写配置文件 #HSQL Database Engine 1.8.0.5 #Tue Oct 05 11:20:19 SGT 2010 hsqldb.script_format=0 runtime.gc_interval=0 sql.enforce_strict_size=false hsqldb.cache_size_scale=8 readonly=false hsqldb.nio_data_file=true hsqldb.cache_scale=14 version=1.8.0 hsqldb.default_table_type=memory hsqldb.cache_file_scale=1 hsqldb.log_size=200 modified=no hsqldb.cache_version=1.7.0 hsqldb.original_version=1.8.0 hsqldb.compatible_version=1.8.0
#编写SQL CREATE SCHEMA PUBLIC AUTHORIZATION DBA CREATE MEMORY TABLE TT(I INTEGER NOT NULL PRIMARY KEY,S VARCHAR(256)) CREATE USER SA PASSWORD "" GRANT DBA TO SA SET WRITE_DELAY 10 SET SCHEMA PUBLIC INSERT INTO TT VALUES(1,'a') INSERT INTO TT VALUES(2,'a') INSERT INTO TT VALUES(3,'a')
(2)编写workflow.xml <?xml version="1.0" encoding="UTF-8"?> <workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-wf"> <start to="sqoop-node"/> <action name="sqoop-node"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/oozie/${examplesRoot}/output-data/sqoop"/> <mkdir path="${nameNode}/user/oozie/${examplesRoot}/output-data"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <command>import --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir /user/oozie/${examplesRoot}/output-data/sqoop -m 1</command> <file>db.hsqldb.properties#db.hsqldb.properties</file> <file>db.hsqldb.script#db.hsqldb.script</file> </sqoop> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> 4、Oozie Java (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定oozie使用系统的共享目录 oozie.use.system.libpath=true #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/java-main (2)编写workflow.xml <workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-kc"> <start to="java-node"/> <action name="java-node"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <main-class>org.apache.oozie.example.DemoJavaMain</main-class> <arg>Hello</arg> <arg>Oozie!</arg> </java> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
5、Oozie Hive (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定oozie使用系统的共享目录 oozie.use.system.libpath=true #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/hive
(2)编写workflow.xml <workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf"> <start to="hive2-node"/> <action name="hive2-node"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/oozie/${examplesRoot}/output-data/hive2"/> <mkdir path="${nameNode}/user/oozie/${examplesRoot}/output-data"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <jdbc-url>${jdbcURL}</jdbc-url> <script>script.q</script> <param>INPUT=/user/oozie/${examplesRoot}/input-data/table</param> <param>OUTPUT=/user/oozie/${examplesRoot}/output-data/hive2</param> </hive2> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test_machine; 6、Oozie Impala (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定oozie使用系统的共享目录 oozie.use.system.libpath=true #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/impala EXEC=impala.sh
(2)编写workflow.xml <workflow-app name="shell-impala" xmlns="uri:oozie:workflow:0.4"> <start to="shell-impala-invalidate"/> <action name="shell-impala-invalidate"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>${EXEC}</exec> <file>${EXEC}#${EXEC}</file> </shell> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
(3)impala.sh #!/bin/bash impala-shell -i slave2:21000 -q "select count(*) from test_machine" echo 'Hello Shell'
7、ozie MapReduce (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/map-reduce/workflow.xml outputDir=map-reduce
(2)编写workflow.xml <workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wyl"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/oozie/${examplesRoot}/output-data/${outputDir}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.mapper.class</name> <value>org.apache.oozie.example.SampleMapper</value> </property> <property> <name>mapred.reducer.class</name> <value>org.apache.oozie.example.SampleReducer</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> <property> <name>mapred.input.dir</name> <value>/user/oozie/${examplesRoot}/input-data/text</value> </property> <property> <name>mapred.output.dir</name> <value>/user/oozie/${examplesRoot}/output-data/${outputDir}</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
8、Oozie Spark (1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples #指定oozie使用系统的共享目录 oozie.use.system.libpath=true #指定workflow.xml所在目录 oozie.wf.application.path=${nameNode}/user/examples/apps/spark
(2)编写workflow.xml <workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkFileCopy'> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/oozie/${examplesRoot}/output-data/spark"/> </prepare> <master>${master}</master> <name>Spark-FileCopy</name> <class>org.apache.oozie.example.SparkFileCopy</class> <jar>${nameNode}/user/oozie/${examplesRoot}/apps/spark/lib/oozie-examples.jar</jar> <arg>${nameNode}/user/oozie/${examplesRoot}/input-data/text/data.txt</arg> <arg>${nameNode}/user/oozie/${examplesRoot}/output-data/spark</arg> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
9、Oozie 定时任务
(1)编写job.properties nameNode=hdfs://master:8020 jobTracker=master:8032 queueName=default examplesRoot=examples oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/aggregator/coordinator.xml start=2019-01-01T01:00Z end=2019-01-01T03:00Z
(2)编写coordinator.xml <coordinator-app name="aggregator-coord" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.2"> <controls> <concurrency>1</concurrency> </controls> <datasets> <dataset name="raw-logs" frequency="${coord:minutes(20)}" initial-instance="2010-01-01T00:00Z" timezone="UTC"> <uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template> </dataset> <dataset name="aggregated-logs" frequency="${coord:hours(1)}" initial-instance="2010-01-01T01:00Z" timezone="UTC"> <uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/output-data/aggregator/aggregatedLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template> </dataset> </datasets> <input-events> <data-in name="input" dataset="raw-logs"> <start-instance>${coord:current(-2)}</start-instance> <end-instance>${coord:current(0)}</end-instance> </data-in> </input-events> <output-events> <data-out name="output" dataset="aggregated-logs"> <instance>${coord:current(0)}</instance> </data-out> </output-events> <action> <workflow> <app-path>${nameNode}/user/${coord:user()}/${examplesRoot}/apps/aggregator</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> <property> <name>inputData</name> <value>${coord:dataIn('input')}</value> </property> <property> <name>outputData</name> <value>${coord:dataOut('output')}</value> </property> </configuration> </workflow> </action> </coordinator-app>
注意事项: - workflow.xml文件,一定要上传到job.properties的oozie.wf.application.path对应的hdfs目录下。 - job.properties中的oozie.use.system.libpath=true指定oozie使用系统的共享目录。 - job.properties中的oozie.libpath={user.name}/apps/mymr,可以用来执行mr时,作业导出的jar包存放位置,否则可能报找不到类的错误。 - oozie调度作业时,本质也是启动一个mapreduce作业来调度,workflow.xml中设置的队列名称为调度作业mr的队列名称。所以如果想让作业运行在指定的队列时,需要在mr或hive中指定好。
|
|