ImplicitMetadataOperation¶
ImplicitMetadataOperation is an abstraction of operations that can update the metadata of a delta table (while writing out a new data).
ImplicitMetadataOperation operations can update schema by merging and overwriting schema.
Contract¶
Auto Schema Merging¶
canMergeSchema: Boolean
Controls Auto Schema Merging (evolution)
See:
Used when:
ImplicitMetadataOperationis requested to update the metadata
canOverwriteSchema¶
canOverwriteSchema: Boolean
Used when:
ImplicitMetadataOperationis requested to update the metadata
Implementations¶
Updating Metadata¶
updateMetadata(
spark: SparkSession,
txn: OptimisticTransaction,
schema: StructType,
partitionColumns: Seq[String],
configuration: Map[String, String],
isOverwriteMode: Boolean,
rearrangeOnly: Boolean): Unit
Final Method
updateMetadata is a Scala final method and may not be overridden in subclasses.
Learn more in the Scala Language Specification.
updateMetadata is used when:
MergeIntoCommandis executed (with canMergeSchema enabled)WriteIntoDeltacommand is requested to writeDeltaSinkis requested to add a streaming micro-batch
updateMetadata dropColumnMappingMetadata from the given schema (that produces dataSchema).
updateMetadata mergeSchema (with the dataSchema and the isOverwriteMode and canOverwriteSchema flags).
updateMetadata normalizePartitionColumns.
updateMetadata branches off based on the following conditions:
- Delta table is just being created
- Overwriting schema is enabled (i.e.
isOverwriteModeandcanOverwriteSchemaflags are enabled, and either the schema is new or partitioning changed) - Merging schema is enabled the schema is new and the canMergeSchema is enabled (but the partitioning has not changed)
- Data or Partitioning Schema has changed
Table Being Created¶
updateMetadata creates a new Metadata with the following:
- Uses the value of
commentkey (in the configuration) for the description - FIXME
updateMetadata requests the given OptimisticTransaction to updateMetadata.
Overwriting Schema¶
updateMetadata...FIXME
Merging Schema¶
updateMetadata...FIXME
New Data or Partitioning Schema¶
updateMetadata...FIXME
isOverwriteMode¶
updateMetadata is given isOverwriteMode flag as follows:
- Only
falsefor MergeIntoCommand with canMergeSchema enabled truefor WriteIntoDelta in Overwrite save mode;falseotherwisetruefor DeltaSink in Complete output mode;falseotherwise
rearrangeOnly¶
updateMetadata is given rearrangeOnly flag as follows:
- Only
falsefor MergeIntoCommand with canMergeSchema enabled - [rearrangeOnly]((spark-connector/DeltaWriteOptionsImpl.md#rearrangeOnly) option for WriteIntoDelta
falsefor DeltaSink
configuration¶
updateMetadata is given configuration as follows:
- The existing configuration (of the metadata of the transaction) for MergeIntoCommand with canMergeSchema enabled
- configuration of the
WriteIntoDeltacommand (while writing out) - Always empty for DeltaSink
Normalizing Partition Columns¶
normalizePartitionColumns(
spark: SparkSession,
partitionCols: Seq[String],
schema: StructType): Seq[String]
normalizePartitionColumns...FIXME
mergeSchema¶
mergeSchema(
txn: OptimisticTransaction,
dataSchema: StructType,
isOverwriteMode: Boolean,
canOverwriteSchema: Boolean): StructType
mergeSchema...FIXME
New DomainMetadatas¶
getNewDomainMetadata(
txn: OptimisticTransaction,
canUpdateMetadata: Boolean,
isReplacingTable: Boolean,
clusterBySpecOpt: Option[ClusterBySpec] = None): Seq[DomainMetadata]
getNewDomainMetadata is empty (no DomainMetadata) if either of the following holds:
- The given
canUpdateMetadataflag isfalse - The given
isReplacingTableflag isfalseand the delta table (of the given OptimisticTransaction) exists
canUpdateMetadata flag
The input canUpdateMetadata flag is exactly canUpdateMetadata of the given OptimisticTransaction.
isReplacingTable flag
The input isReplacingTable flag holds true for the SaveMode being Overwrite with no replaceWhere option enabled.
For all other cases, getNewDomainMetadata does one of the following:
- When the delta table (of the given OptimisticTransaction) does not exist,
getNewDomainMetadatagives a DomainMetadata for the given ClusterBySpec - Otherwise,
getNewDomainMetadatahandles domain metadata for replacing a table (with the existing and the new clusteredDomainMetadata)
getNewDomainMetadata is used when:
WriteIntoDeltacommand is requested to write data out
Logging¶
ImplicitMetadataOperation is an abstract class and logging is configured using the logger of the implementations.