DataFrame是Spark中对带模式(schema)行列数据的抽象。DateFrame广泛应用于使用SQL处理大数据的各种场景。
spark.read
操作{"name":"min","age":20,} {"name":"ho", "age":19} {"name":"zi", "age":21}
代码:
val dfJson = spark.read.format("json").load("/Users/testJson.json") dfJson.show()
name,age,phone min,20 ho,19 zi,21
代码
val dfCsv = spark.read.format("csv").option("header", true).load("/Users/testCsv.csv") dfCsv.show()
val dfCsv = spark.read.format("parquet").option("header", true).load("/Users/testParquet.parquet") dfCsv.show()
val spark = SparkSession .builder() .appName("test") .master("local") .getOrCreate() val df = spark.createDataFrame(Seq( ("min", 20), ("ho", 19), ("zi", 21) )) toDF("name", "age") df.show()
spark.write
操作