代码如下:
package com.github.ralgond.sparkjavaapi.sql; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; import static org.apache.spark.sql.functions.col; public class Sql { public static void main(String args[]) { SparkSession spark = SparkSession.builder().getOrCreate(); Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json"); df.show(); df.select(col("name").plus(1)).show(); } }
这段代码的结果是:
+----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+ +----------+ |(name + 1)| +----------+ | null| | null| | null| +----------+
Spark SQL发现正要处理的对象的值不是数值类型时,并不会报错,也不会将1追加到字符串后面(比如Andy1),而是设置结果为null。