SaveAsHiveFile¶
:spark-version: 2.4.5 :hive-version: 2.3.6 :hadoop-version: 2.10.0 :url-hive-javadoc: https://hive.apache.org/javadocs/r{hive-version}/api :url-hadoop-docs: https://hadoop.apache.org/docs/r{hadoop-version} :url-hadoop-javadoc: {url-hadoop-docs}/api
SaveAsHiveFile is an extension of the ../DataWritingCommand.md[DataWritingCommand] contract for <
[NOTE]¶
SaveAsHiveFile supports viewfs:// URI scheme for <
Read up on ViewFs in the {url-hadoop-docs}/hadoop-project-dist/hadoop-hdfs/ViewFs.html[Hadoop official documentation].¶
[[implementations]] .SaveAsHiveFiles [cols="30,70",options="header",width="100%"] |=== | SaveAsHiveFile | Description
| InsertIntoHiveDirCommand.md[InsertIntoHiveDirCommand] | [[InsertIntoHiveDirCommand]]
| InsertIntoHiveTable.md[InsertIntoHiveTable] | [[InsertIntoHiveTable]]
|===
saveAsHiveFile¶
saveAsHiveFile(
sparkSession: SparkSession,
plan: SparkPlan,
hadoopConf: Configuration,
fileSinkConf: FileSinkDesc,
outputLocation: String,
customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty,
partitionAttributes: Seq[Attribute] = Nil): Set[String]
saveAsHiveFile sets Hadoop configuration properties when a compressed file output format is used (based on hive.exec.compress.output configuration property).
saveAsHiveFile uses FileCommitProtocol utility to instantiate a committer for the input outputLocation based on the spark.sql.sources.commitProtocolClass configuration property.
saveAsHiveFile uses FileFormatWriter utility to write out the result of executing the input physical operator (with a HiveFileFormat for the input FileSinkDesc, the new FileCommitProtocol committer, and the input arguments).
saveAsHiveFile is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.
=== [[getExternalTmpPath]] getExternalTmpPath Method
[source, scala]¶
getExternalTmpPath( sparkSession: SparkSession, hadoopConf: Configuration, path: Path): Path
getExternalTmpPath finds the Hive version used. getExternalTmpPath requests the input ../SparkSession.md[SparkSession] for the ../SharedState.md#externalCatalog[ExternalCatalog] (that is expected to be a HiveExternalCatalog). getExternalTmpPath requests it for the underlying HiveClient that is in turn requested for the HiveClient.md#version[Hive version].
getExternalTmpPath divides (splits) the supported Hive versions into the ones (old versions) that use index.md#hive.exec.scratchdir[hive.exec.scratchdir] directory (0.12.0 to 1.0.0) and the ones (new versions) that use index.md#hive.exec.stagingdir[hive.exec.stagingdir] directory (1.1.0 to 2.3.3).
getExternalTmpPath <
getExternalTmpPath throws an IllegalStateException for unsupported Hive version:
Unsupported hive version: [hiveVersion]
NOTE: getExternalTmpPath is used when InsertIntoHiveDirCommand.md[InsertIntoHiveDirCommand] and InsertIntoHiveTable.md[InsertIntoHiveTable] logical commands are executed.
=== [[deleteExternalTmpPath]] deleteExternalTmpPath Method
[source, scala]¶
deleteExternalTmpPath( hadoopConf: Configuration): Unit
deleteExternalTmpPath...FIXME
NOTE: deleteExternalTmpPath is used when...FIXME
=== [[oldVersionExternalTempPath]] oldVersionExternalTempPath Internal Method
[source, scala]¶
oldVersionExternalTempPath( path: Path, hadoopConf: Configuration, scratchDir: String): Path
oldVersionExternalTempPath...FIXME
NOTE: oldVersionExternalTempPath is used when SaveAsHiveFile is requested to <
=== [[newVersionExternalTempPath]] newVersionExternalTempPath Internal Method
[source, scala]¶
newVersionExternalTempPath( path: Path, hadoopConf: Configuration, stagingDir: String): Path
newVersionExternalTempPath...FIXME
NOTE: newVersionExternalTempPath is used when SaveAsHiveFile is requested to <
=== [[getExtTmpPathRelTo]] getExtTmpPathRelTo Internal Method
[source, scala]¶
getExtTmpPathRelTo( path: Path, hadoopConf: Configuration, stagingDir: String): Path
getExtTmpPathRelTo...FIXME
NOTE: getExtTmpPathRelTo is used when SaveAsHiveFile is requested to <
=== [[getExternalScratchDir]] getExternalScratchDir Internal Method
[source, scala]¶
getExternalScratchDir( extURI: URI, hadoopConf: Configuration, stagingDir: String): Path
getExternalScratchDir...FIXME
NOTE: getExternalScratchDir is used when SaveAsHiveFile is requested to <
=== [[getStagingDir]] getStagingDir Internal Method
[source, scala]¶
getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path
getStagingDir...FIXME
NOTE: getStagingDir is used when SaveAsHiveFile is requested to <
=== [[executionId]] executionId Internal Method
[source, scala]¶
executionId: String¶
executionId...FIXME
NOTE: executionId is used when...FIXME
=== [[createdTempDir]] createdTempDir Internal Registry
[source, scala]¶
createdTempDir: Option[Path] = None¶
createdTempDir is a Hadoop {url-hadoop-javadoc}/org/apache/hadoop/fs/Path.html[Path] of a staging directory.
createdTempDir is initialized when SaveAsHiveFile is requested to <
createdTempDir is the index.md#hive.exec.stagingdir[hive.exec.stagingdir] configuration property.
createdTempDir is deleted when SaveAsHiveFile is requested to <deleteOnExit is used).