Java教程

Spark SQL 数据源 hive表

本文主要是介绍Spark SQL 数据源 hive表,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

Spark SQL 数据源(json文件、hive表、parquet文件)

-- json 详见 524

hive表

 

scala> val hivecontext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: one deprecation (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
22/06/24 14:29:08 WARN sql.SparkSession$Builder: Using an existing SparkSession; the static sql configurations will not take effect.
hivecontext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@7c089fbc

scala> hivecontext.sql("CREATE TABLE IF NOT EXISTS Demo(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ")
22/06/24 14:31:36 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
res1: org.apache.spark.sql.DataFrame = []
建表

 

scala> hivecontext.sql("CREATE TABLE IF NOT EXISTS mycdh.Demo(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ")
res5: org.apache.spark.sql.DataFrame = []
上述将Demo建在了默认库;这里修改为自己的hive库,最好先删除这个表,以免搞混
scala> hivecontext.sql("LOAD DATA INPATH 'hdfs://cdh1:9013/user/hive/employee.txt' INTO TABLE mycdh.Demo")
res12: org.apache.spark.sql.DataFrame = []
scala> val result = hivecontext.sql("FROM mycdh.Demo SELECT id,name")
result: org.apache.spark.sql.DataFrame = [id: int, name: string]

scala> result.show()
+----+--------+                                                                 
|  id|    name|
+----+--------+
|1201|  satish|
|1202| krishna|
|1203|   amith|
|1204|   javed|
|1205|  prudvi|
+----+--------+

 

 

 

这篇关于Spark SQL 数据源 hive表的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!