比如下面代码:
package com.github.ralgond.sparkjavaapi.sql; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class Sql { public static void main(String args[]) { SparkSession spark = SparkSession.builder().getOrCreate(); Dataset<Row> df = spark.read().format("csv") .option("sep", ";") .option("inferSchema", "false") .option("header", "true") .load("examples/src/main/resources/people.csv"); df.show(); df.printSchema(); } }
输出结果为
root |-- name: string (nullable = true) |-- age: string (nullable = true) |-- job: string (nullable = true)
修改.option("inferSchema", "false")成.option("inferSchema", "true")后的结果为:
root |-- name: string (nullable = true) |-- age: integer (nullable = true) |-- job: string (nullable = true)
选项inferSchema的默认值为false。
比如下面的代码:
package com.github.ralgond.sparkjavaapi.sql; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class Sql { public static void main(String args[]) { SparkSession spark = SparkSession.builder().getOrCreate(); Dataset<Row> df = spark.read().format("json") .option("inferSchema", "false") .load("examples/src/main/resources/people.json"); df.printSchema(); } }
输出结果为:
root |-- age: long (nullable = true) |-- name: string (nullable = true)