Skip to content

InsertIntoHadoopFsRelationCommand Logical Command

InsertIntoHadoopFsRelationCommand is a logical command that writes the result of executing a query to an output path in the given format.

Creating Instance

InsertIntoHadoopFsRelationCommand takes the following to be created:

InsertIntoHadoopFsRelationCommand is created when:

Static Partitions

type TablePartitionSpec = Map[String, String]
staticPartitions: TablePartitionSpec

InsertIntoHadoopFsRelationCommand is given a specification of a table partition (as a mapping of column names to column values) when created.

Partitions can only be given when created for DataSourceAnalysis posthoc logical resolution rule when executed for a InsertIntoStatement over a LogicalRelation with a HadoopFsRelation

There will be no partitions when created for the following:

Dynamic Partition Inserts and dynamicPartitionOverwrite Flag

dynamicPartitionOverwrite: Boolean

InsertIntoHadoopFsRelationCommand defines a dynamicPartitionOverwrite flag to indicate whether dynamic partition inserts is enabled or not.

dynamicPartitionOverwrite is based on the following (in the order of precedence):

dynamicPartitionOverwrite is used when:

  • DataSourceAnalysis logical resolution rule is executed (for dynamic partition overwrite)
  • InsertIntoHadoopFsRelationCommand is executed

Executing Command

run(
  sparkSession: SparkSession,
  child: SparkPlan): Seq[Row]

run is part of the DataWritingCommand abstraction.


run creates a new Hadoop Configuration with the options and resolves the outputPath.

run uses the following to determine whether partitions are tracked by a catalog (partitionsTrackedByCatalog):

FIXME

When is the catalogTable defined?

FIXME

When is tracksPartitionsInCatalog enabled?

With partitions tracked by a catalog, run...FIXME

run uses FileCommitProtocol utility to instantiate a FileCommitProtocol based on the spark.sql.sources.commitProtocolClass with the following:

For insertion (doInsertion), run...FIXME

Otherwise (for a non-insertion case), run does nothing but prints out the following INFO message to the logs and finishes.

Skipping insertion into a relation that already exists.

run makes sure that there are no duplicates in the outputColumnNames.