Skip to content

HiveFileFormat

HiveFileFormat is a FileFormat for writing Hive tables.

[[shortName]] HiveFileFormat is a DataSourceRegister and registers itself as hive data source.

NOTE: Hive data source can only be used with tables and you cannot read or write files of Hive data source directly. Use DataFrameReader.table to load from or DataFrameWriter.saveAsTable to write data to a Hive table.

HiveFileFormat is <> exclusively when SaveAsHiveFile is requested to ../hive/SaveAsHiveFile.md#saveAsHiveFile[saveAsHiveFile] (when InsertIntoHiveDirCommand.md[InsertIntoHiveDirCommand] and InsertIntoHiveTable.md[InsertIntoHiveTable] logical commands are executed).

[[fileSinkConf]][[creating-instance]] HiveFileFormat takes a FileSinkDesc when created.

[[inferSchema]] HiveFileFormat throws a UnsupportedOperationException when requested to inferSchema.

inferSchema is not supported for hive data source.

=== [[prepareWrite]] Preparing Write Job -- prepareWrite Method

[source, scala]

prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory


prepareWrite sets the mapred.output.format.class property to be the getOutputFileFormatClassName of the Hive TableDesc of the <>.

prepareWrite requests the HiveTableUtil helper object to configureJobPropertiesForStorageHandler.

prepareWrite requests the Hive Utilities helper object to copyTableJobPropertiesToConf.

In the end, prepareWrite creates a new OutputWriterFactory that creates a new HiveOutputWriter when requested for a new OutputWriter instance.

prepareWrite is part of the FileFormat abstraction.