SaveAsHiveFile¶
:spark-version: 2.4.5 :hive-version: 2.3.6 :hadoop-version: 2.10.0 :url-hive-javadoc: https://hive.apache.org/javadocs/r{hive-version}/api :url-hadoop-docs: https://hadoop.apache.org/docs/r{hadoop-version} :url-hadoop-javadoc: {url-hadoop-docs}/api
SaveAsHiveFile
is an extension of the ../DataWritingCommand.md[DataWritingCommand] contract for <
[NOTE]¶
SaveAsHiveFile
supports viewfs://
URI scheme for <
Read up on ViewFs
in the {url-hadoop-docs}/hadoop-project-dist/hadoop-hdfs/ViewFs.html[Hadoop official documentation].¶
[[implementations]] .SaveAsHiveFiles [cols="30,70",options="header",width="100%"] |=== | SaveAsHiveFile | Description
| InsertIntoHiveDirCommand.md[InsertIntoHiveDirCommand] | [[InsertIntoHiveDirCommand]]
| InsertIntoHiveTable.md[InsertIntoHiveTable] | [[InsertIntoHiveTable]]
|===
saveAsHiveFile¶
saveAsHiveFile(
sparkSession: SparkSession,
plan: SparkPlan,
hadoopConf: Configuration,
fileSinkConf: FileSinkDesc,
outputLocation: String,
customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty,
partitionAttributes: Seq[Attribute] = Nil): Set[String]
saveAsHiveFile
sets Hadoop configuration properties when a compressed file output format is used (based on hive.exec.compress.output configuration property).
saveAsHiveFile
uses FileCommitProtocol
utility to instantiate a committer for the input outputLocation
based on the spark.sql.sources.commitProtocolClass configuration property.
saveAsHiveFile
uses FileFormatWriter
utility to write out the result of executing the input physical operator (with a HiveFileFormat for the input FileSinkDesc
, the new FileCommitProtocol
committer, and the input arguments).
saveAsHiveFile
is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.
=== [[getExternalTmpPath]] getExternalTmpPath
Method
[source, scala]¶
getExternalTmpPath( sparkSession: SparkSession, hadoopConf: Configuration, path: Path): Path
getExternalTmpPath
finds the Hive version used. getExternalTmpPath
requests the input ../SparkSession.md[SparkSession] for the ../SharedState.md#externalCatalog[ExternalCatalog] (that is expected to be a HiveExternalCatalog). getExternalTmpPath
requests it for the underlying HiveClient that is in turn requested for the HiveClient.md#version[Hive version].
getExternalTmpPath
divides (splits) the supported Hive versions into the ones (old versions) that use index.md#hive.exec.scratchdir[hive.exec.scratchdir] directory (0.12.0
to 1.0.0
) and the ones (new versions) that use index.md#hive.exec.stagingdir[hive.exec.stagingdir] directory (1.1.0
to 2.3.3
).
getExternalTmpPath
<
getExternalTmpPath
throws an IllegalStateException
for unsupported Hive version:
Unsupported hive version: [hiveVersion]
NOTE: getExternalTmpPath
is used when InsertIntoHiveDirCommand.md[InsertIntoHiveDirCommand] and InsertIntoHiveTable.md[InsertIntoHiveTable] logical commands are executed.
=== [[deleteExternalTmpPath]] deleteExternalTmpPath
Method
[source, scala]¶
deleteExternalTmpPath( hadoopConf: Configuration): Unit
deleteExternalTmpPath
...FIXME
NOTE: deleteExternalTmpPath
is used when...FIXME
=== [[oldVersionExternalTempPath]] oldVersionExternalTempPath
Internal Method
[source, scala]¶
oldVersionExternalTempPath( path: Path, hadoopConf: Configuration, scratchDir: String): Path
oldVersionExternalTempPath
...FIXME
NOTE: oldVersionExternalTempPath
is used when SaveAsHiveFile
is requested to <
=== [[newVersionExternalTempPath]] newVersionExternalTempPath
Internal Method
[source, scala]¶
newVersionExternalTempPath( path: Path, hadoopConf: Configuration, stagingDir: String): Path
newVersionExternalTempPath
...FIXME
NOTE: newVersionExternalTempPath
is used when SaveAsHiveFile
is requested to <
=== [[getExtTmpPathRelTo]] getExtTmpPathRelTo
Internal Method
[source, scala]¶
getExtTmpPathRelTo( path: Path, hadoopConf: Configuration, stagingDir: String): Path
getExtTmpPathRelTo
...FIXME
NOTE: getExtTmpPathRelTo
is used when SaveAsHiveFile
is requested to <
=== [[getExternalScratchDir]] getExternalScratchDir
Internal Method
[source, scala]¶
getExternalScratchDir( extURI: URI, hadoopConf: Configuration, stagingDir: String): Path
getExternalScratchDir
...FIXME
NOTE: getExternalScratchDir
is used when SaveAsHiveFile
is requested to <
=== [[getStagingDir]] getStagingDir
Internal Method
[source, scala]¶
getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path
getStagingDir
...FIXME
NOTE: getStagingDir
is used when SaveAsHiveFile
is requested to <
=== [[executionId]] executionId
Internal Method
[source, scala]¶
executionId: String¶
executionId
...FIXME
NOTE: executionId
is used when...FIXME
=== [[createdTempDir]] createdTempDir
Internal Registry
[source, scala]¶
createdTempDir: Option[Path] = None¶
createdTempDir
is a Hadoop {url-hadoop-javadoc}/org/apache/hadoop/fs/Path.html[Path] of a staging directory.
createdTempDir
is initialized when SaveAsHiveFile
is requested to <
createdTempDir
is the index.md#hive.exec.stagingdir[hive.exec.stagingdir] configuration property.
createdTempDir
is deleted when SaveAsHiveFile
is requested to <deleteOnExit
is used).