连接Oozie的Server节点:
[root@manager93 ~]# cd /usr/hdp/3.0.1.0-187/oozie/doc # 此处依赖包,也可以不进行解压, 如果依赖包环境无法运行,可以手动删除,之后重启Oozie组件,从新生成。 [root@manager93 doc]# tar -zxvf oozie-examples.tar.gz [root@manager93 oozie]# cd doc/examples/apps/ # 备份spark示例包结构 [root@manager93 apps]# cp -r spark spark.bak [root@manager93 apps]# cd spark [root@manager93 spark]# ll total 8 -rw-r--r-- 1 oozie hadoop 1015 Sep 19 2018 job.properties # Spark样例jar包存放在lib目录下 drwxr-xr-x 2 oozie hadoop 32 May 19 20:19 lib -rw-r--r-- 1 oozie hadoop 1920 Sep 19 2018 workflow.xml # 拷贝spark自带的example的jar包 [root@manager93 spark]# cp /usr/hdp/3.0.1.0-187/spark2/examples/jars/spark-examples_2.11-2.3.1.3.0.1.0-187.jar lib/
<!-- 标签workflow-app里的name是在oozie web ui上显示的该job的名称:Spark-On-Pi --> <workflow-app xmlns='uri:oozie:workflow:0.5' name='Spark-Oozie-Pi'> <start to='spark-node' /> <action name='spark-node'> <!-- 标签签spark里是运行的spark任务的相关配置, ${key}:是映射的job.properties文件中的内容 --> <spark xmlns="uri:oozie:spark-action:0.1"> <!-- Yarn的RM地址 --> <job-tracker>${jobTracker}</job-tracker> <!-- HDFS NameNode Active 地址 --> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/> </prepare> <!-- Spark -master参数 --> <master>${master}</master> <!-- Spark任务名称,在Yarn UI中显示的名称,同-name参数 --> <name>Spark-Yarn-Pi</name> <!-- Spark入口类名称,同-class参数 --> <class>org.apache.spark.examples.SparkPi</class> <!-- 指定Jar包路径 --> <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/spark-examples_2.11-2.3.1.3.0.1.0-187.jar</jar> <!-- 提交Spark任务指定参数 --> <spark-opts>${spark_opts}</spark-opts> <!-- 参数选项 --> <!-- <arg>${nameNode}/user/${wf:user()}/${examplesRoot}/input-data/text/data.txt</arg> <arg>${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark</arg> --> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
# NameNode Active 地址 nameNode=hdfs://manager93.bigdata:8020 # Yarn的ResourceManager地址 jobTracker=manager93.bigdata:8050 # 配置Spark任务提交模式,同--master参数 master=yarn-cluster queueName=default examplesRoot=examples oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark spark_opts=--executor-memory 1G --num-executors 3 --executor-cores 2
创建目录
[root@manager93 spark]# su hdfs [hdfs@manager93 spark]$ hadoop fs -mkdir -p /user/oozie/examples/apps [hdfs@manager93 spark]$ cd /usr/hdp/3.0.1.0-187/oozie/doc/examples/apps/ [hdfs@manager93 apps]$ hadoop fs -put spark /user/oozie/examples/apps/ [hdfs@manager93 apps]$ hadoop fs -ls /user/oozie/examples/apps/spark Found 3 items -rw-r--r-- 3 hdfs hdfs 1214 2021-06-02 17:22 /user/oozie/examples/apps/spark/job.properties drwxr-xr-x - hdfs hdfs 0 2021-06-02 17:22 /user/oozie/examples/apps/spark/lib -rw-r--r-- 3 hdfs hdfs 2648 2021-06-02 17:22 /user/oozie/examples/apps/spark/workflow.xml
[root@manager93 ~]# cd /usr/hdp/3.0.1.0-187/oozie/doc/examples/apps/spark [root@manager93 spark]# su oozie [oozie@manager93 spark]$ oozie job --oozie http://manager93.bigdata:11000/oozie -config ./job.properties -run job: 0000000-210602114459890-oozie-oozi-W # 查看日志信息 [oozie@manager93 spark]$ oozie job -oozie http://manager93.bigdata:11000/oozie -log 0000000-210602114459890-oozie-oozi-W
由于集群开启了Kerberos,Chrome没有配置Kerberos信息,无法访问Oozie UI页面,从Yarn UI监控: