ImplicitMetadataOperation¶
ImplicitMetadataOperation
is an abstraction of operations that can update the metadata of a delta table (while writing out a new data).
ImplicitMetadataOperation
operations can update schema by merging and overwriting schema.
Contract¶
Auto Schema Merging¶
canMergeSchema: Boolean
Controls Auto Schema Merging (evolution)
See:
Used when:
ImplicitMetadataOperation
is requested to update the metadata
canOverwriteSchema¶
canOverwriteSchema: Boolean
Used when:
ImplicitMetadataOperation
is requested to update the metadata
Implementations¶
Updating Metadata¶
updateMetadata(
spark: SparkSession,
txn: OptimisticTransaction,
schema: StructType,
partitionColumns: Seq[String],
configuration: Map[String, String],
isOverwriteMode: Boolean,
rearrangeOnly: Boolean): Unit
Final Method
updateMetadata
is a Scala final method and may not be overridden in subclasses.
Learn more in the Scala Language Specification.
updateMetadata
is used when:
MergeIntoCommand
is executed (with canMergeSchema enabled)WriteIntoDelta
command is requested to writeDeltaSink
is requested to add a streaming micro-batch
updateMetadata
dropColumnMappingMetadata from the given schema
(that produces dataSchema
).
updateMetadata
mergeSchema (with the dataSchema
and the isOverwriteMode
and canOverwriteSchema
flags).
updateMetadata
normalizePartitionColumns.
updateMetadata
branches off based on the following conditions:
- Delta table is just being created
- Overwriting schema is enabled (i.e.
isOverwriteMode
andcanOverwriteSchema
flags are enabled, and either the schema is new or partitioning changed) - Merging schema is enabled the schema is new and the canMergeSchema is enabled (but the partitioning has not changed)
- Data or Partitioning Schema has changed
Table Being Created¶
updateMetadata
creates a new Metadata with the following:
- Uses the value of
comment
key (in the configuration) for the description - FIXME
updateMetadata
requests the given OptimisticTransaction to updateMetadata.
Overwriting Schema¶
updateMetadata
...FIXME
Merging Schema¶
updateMetadata
...FIXME
New Data or Partitioning Schema¶
updateMetadata
...FIXME
isOverwriteMode¶
updateMetadata
is given isOverwriteMode
flag as follows:
- Only
false
for MergeIntoCommand with canMergeSchema enabled true
for WriteIntoDelta in Overwrite save mode;false
otherwisetrue
for DeltaSink in Complete output mode;false
otherwise
rearrangeOnly¶
updateMetadata
is given rearrangeOnly
flag as follows:
- Only
false
for MergeIntoCommand with canMergeSchema enabled - [rearrangeOnly]((spark-connector/DeltaWriteOptionsImpl.md#rearrangeOnly) option for WriteIntoDelta
false
for DeltaSink
configuration¶
updateMetadata
is given configuration
as follows:
- The existing configuration (of the metadata of the transaction) for MergeIntoCommand with canMergeSchema enabled
- configuration of the
WriteIntoDelta
command (while writing out) - Always empty for DeltaSink
Normalizing Partition Columns¶
normalizePartitionColumns(
spark: SparkSession,
partitionCols: Seq[String],
schema: StructType): Seq[String]
normalizePartitionColumns
...FIXME
mergeSchema¶
mergeSchema(
txn: OptimisticTransaction,
dataSchema: StructType,
isOverwriteMode: Boolean,
canOverwriteSchema: Boolean): StructType
mergeSchema
...FIXME
New DomainMetadatas¶
getNewDomainMetadata(
txn: OptimisticTransaction,
canUpdateMetadata: Boolean,
isReplacingTable: Boolean,
clusterBySpecOpt: Option[ClusterBySpec] = None): Seq[DomainMetadata]
getNewDomainMetadata
is empty (no DomainMetadata) if either of the following holds:
- The given
canUpdateMetadata
flag isfalse
- The given
isReplacingTable
flag isfalse
and the delta table (of the given OptimisticTransaction) exists
canUpdateMetadata
flag
The input canUpdateMetadata
flag is exactly canUpdateMetadata of the given OptimisticTransaction.
isReplacingTable
flag
The input isReplacingTable
flag holds true for the SaveMode being Overwrite with no replaceWhere option enabled.
For all other cases, getNewDomainMetadata
does one of the following:
- When the delta table (of the given OptimisticTransaction) does not exist,
getNewDomainMetadata
gives a DomainMetadata for the given ClusterBySpec - Otherwise,
getNewDomainMetadata
handles domain metadata for replacing a table (with the existing and the new clusteredDomainMetadata
)
getNewDomainMetadata
is used when:
WriteIntoDelta
command is requested to write data out
Logging¶
ImplicitMetadataOperation
is an abstract class and logging is configured using the logger of the implementations.