InsertIntoHiveDirCommand Logical Command¶
:spark-version: 2.4.5 :hive-version: 2.3.6 :hadoop-version: 2.10.0 :url-hive-javadoc: https://hive.apache.org/javadocs/r{hive-version}/api :url-hadoop-javadoc: https://hadoop.apache.org/docs/r{hadoop-version}/api
InsertIntoHiveDirCommand
is a SaveAsHiveFile.md[logical command] that writes the result of executing a <
InsertIntoHiveDirCommand
is <
[source, scala]¶
// // The example does NOT work when executed // "Data not in the BIGINT data type range so converted to null" // It is enough to show the InsertIntoHiveDirCommand operator though // assert(spark.version == "2.4.5")
val tableName = "insert_into_hive_dir_demo" sql(s"""CREATE TABLE IF NOT EXISTS $tableName (id LONG) USING hive""")
val locationUri = spark.sharedState.externalCatalog.getTable("default", tableName).location.toString val q = sql(s"""INSERT OVERWRITE DIRECTORY '$locationUri' USING hive SELECT 1L AS id""") scala> q.explain(true) == Parsed Logical Plan == 'InsertIntoDir false, Storage(Location: hdfs://localhost:9000/user/hive/warehouse/insert_into_hive_dir_demo), hive, true +- Project [1 AS id#49L] +- OneRowRelation
== Analyzed Logical Plan == InsertIntoHiveDirCommand false, Storage(Location: hdfs://localhost:9000/user/hive/warehouse/insert_into_hive_dir_demo), true, [id] +- Project [1 AS id#49L] +- OneRowRelation
== Optimized Logical Plan == InsertIntoHiveDirCommand false, Storage(Location: hdfs://localhost:9000/user/hive/warehouse/insert_into_hive_dir_demo), true, [id] +- Project [1 AS id#49L] +- OneRowRelation
== Physical Plan == Execute InsertIntoHiveDirCommand InsertIntoHiveDirCommand false, Storage(Location: hdfs://localhost:9000/user/hive/warehouse/insert_into_hive_dir_demo), true, [id] +- *(1) Project [1 AS id#49L] +- Scan OneRowRelation[]
// FIXME Why does the following throw an exception? // spark.table(tableName)
Creating Instance¶
InsertIntoHiveDirCommand
takes the following to be created:
- [[isLocal]]
isLocal
Flag - [[storage]] CatalogStorageFormat
- [[query]] Structured query (as a ../spark-sql-LogicalPlan.md[LogicalPlan])
- [[overwrite]]
overwrite
Flag - [[outputColumnNames]] Names of the output columns
=== [[run]] Executing Logical Command -- run
Method
[source, scala]¶
run( sparkSession: SparkSession, child: SparkPlan): Seq[Row]
NOTE: run
is part of DataWritingCommand contract.
run
asserts that the table location of the CatalogStorageFormat is specified (or throws an AssertionError
).
run
makes sure that there are no duplicates among the given output columns.
run
creates a CatalogTable for the table location (and the VIEW
table type) and HiveClientImpl.md#toHiveTable[converts it to a Hive Table metadata].
run
specifies serialization.lib
metadata to the serde of the given CatalogStorageFormat or LazySimpleSerDe
if not defined.
run
creates a new map-reduce job for execution (a Hadoop JobConf) with a new Hadoop Configuration (from the input SparkSession).
run
prepares the path to write to (based on the given <run
SaveAsHiveFile.md#getExternalTmpPath[getExternalTmpPath].
run
saveAsHiveFile.
In the end, run
SaveAsHiveFile.md#deleteExternalTmpPath[deleteExternalTmpPath].
In case of any error (Throwable
), run
throws an SparkException
:
Failed inserting overwrite directory [locationUri]