在完成spark环境安装部署之后,部署过程这里略过,就可以通过java开发spark程序了。
<dependencies> <!-- spark-core spark核心--> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>2.4.8</version> </dependency> <!-- spark-streaming spark流计算--> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.12</artifactId> <version>2.4.8</version> </dependency> </dependencies>
package com.demo; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; /** * author: */ public class SparkDemo { public static void main(String[] args) { String readme = "D:\\spark\\CHANGES.txt"; SparkConf conf = new SparkConf().setAppName("tiger's first spark app"); conf.setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(conf); // 从指定的文件中读取数据到RDD JavaRDD<String> logData = sc.textFile(readme).cache(); // 过滤包含h的字符串,然后在获取数量 long num = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("h"); } }).count(); System.out.println("the count of word a is " + num); } }
conf.setMaster()的参数及含义如下:
D:\\spark\\CHANGES.txt
Thing is test hello word test test hello word
可以得到输出结果
以上输出,可以验证程序可以正确运行了。