InsertIntoHadoopFsRelationCommand Logical Command¶
InsertIntoHadoopFsRelationCommand
is a logical command that writes the result of executing a query to an output path in the given format.
Creating Instance¶
InsertIntoHadoopFsRelationCommand
takes the following to be created:
- Output Path (as a Hadoop Path)
- Static Partitions
-
ifPartitionNotExists
Flag - Partition Columns (
Seq[Attribute]
) - BucketSpec if defined
- FileFormat
- Options (
Map[String, String]
) - Query
- SaveMode
- CatalogTable if available
- FileIndex if defined
- Names of the output columns
InsertIntoHadoopFsRelationCommand
is created when:
OptimizedCreateHiveTableAsSelectCommand
logical command is executedDataSource
is requested to planForWritingFileFormat- DataSourceAnalysis logical resolution rule is executed (for a
InsertIntoStatement
over a LogicalRelation with a HadoopFsRelation)
Static Partitions¶
type TablePartitionSpec = Map[String, String]
staticPartitions: TablePartitionSpec
InsertIntoHadoopFsRelationCommand
is given a specification of a table partition (as a mapping of column names to column values) when created.
Partitions can only be given when created for DataSourceAnalysis posthoc logical resolution rule when executed for a InsertIntoStatement
over a LogicalRelation with a HadoopFsRelation
There will be no partitions when created for the following:
OptimizedCreateHiveTableAsSelectCommand
logical commandDataSource
when requested to planForWritingFileFormat
Dynamic Partition Inserts and dynamicPartitionOverwrite Flag¶
dynamicPartitionOverwrite: Boolean
InsertIntoHadoopFsRelationCommand
defines a dynamicPartitionOverwrite
flag to indicate whether dynamic partition inserts is enabled or not.
dynamicPartitionOverwrite
is based on the following (in the order of precedence):
- partitionOverwriteMode option (
STATIC
orDYNAMIC
) in the parameters if available - spark.sql.sources.partitionOverwriteMode
dynamicPartitionOverwrite
is used when:
- DataSourceAnalysis logical resolution rule is executed (for dynamic partition overwrite)
InsertIntoHadoopFsRelationCommand
is executed
Executing Command¶
run(
sparkSession: SparkSession,
child: SparkPlan): Seq[Row]
run
uses the spark.sql.hive.manageFilesourcePartitions configuration property to...FIXME
CAUTION: FIXME When is the catalogTable
defined?
CAUTION: FIXME When is tracksPartitionsInCatalog
of CatalogTable
enabled?
run
gets the partitionOverwriteMode option...FIXME
run
uses FileCommitProtocol
utility to instantiate a committer based on the spark.sql.sources.commitProtocolClass and the outputPath, the dynamicPartitionOverwrite, and random jobId
.
For insertion, run
simply uses the FileFormatWriter
utility to write
and then...FIXME (does some table-specific "tasks").
Otherwise (for non-insertion case), run
simply prints out the following INFO message to the logs and finishes.
Skipping insertion into a relation that already exists.
run
makes sure that there are no duplicates in the outputColumnNames.
run
is part of the DataWritingCommand abstraction.