Skip to content

ImplicitMetadataOperation

ImplicitMetadataOperation is an abstraction of operations that can update the metadata of a delta table (while writing out a new data).

ImplicitMetadataOperation operations can update schema by merging and overwriting schema.

Contract

Auto Schema Merging

canMergeSchema: Boolean

Controls Auto Schema Merging (evolution)

See:

Used when:

canOverwriteSchema

canOverwriteSchema: Boolean

Used when:

Implementations

Updating Metadata

updateMetadata(
  spark: SparkSession,
  txn: OptimisticTransaction,
  schema: StructType,
  partitionColumns: Seq[String],
  configuration: Map[String, String],
  isOverwriteMode: Boolean,
  rearrangeOnly: Boolean): Unit
Final Method

updateMetadata is a Scala final method and may not be overridden in subclasses.

Learn more in the Scala Language Specification.

updateMetadata is used when:


updateMetadata dropColumnMappingMetadata from the given schema (that produces dataSchema).

updateMetadata mergeSchema (with the dataSchema and the isOverwriteMode and canOverwriteSchema flags).

updateMetadata normalizePartitionColumns.

updateMetadata branches off based on the following conditions:

  1. Delta table is just being created
  2. Overwriting schema is enabled (i.e. isOverwriteMode and canOverwriteSchema flags are enabled, and either the schema is new or partitioning changed)
  3. Merging schema is enabled the schema is new and the canMergeSchema is enabled (but the partitioning has not changed)
  4. Data or Partitioning Schema has changed

Table Being Created

updateMetadata creates a new Metadata with the following:

  • Uses the value of comment key (in the configuration) for the description
  • FIXME

updateMetadata requests the given OptimisticTransaction to updateMetadata.

Overwriting Schema

updateMetadata...FIXME

Merging Schema

updateMetadata...FIXME

New Data or Partitioning Schema

updateMetadata...FIXME

isOverwriteMode

updateMetadata is given isOverwriteMode flag as follows:

rearrangeOnly

updateMetadata is given rearrangeOnly flag as follows:

configuration

updateMetadata is given configuration as follows:

Normalizing Partition Columns

normalizePartitionColumns(
  spark: SparkSession,
  partitionCols: Seq[String],
  schema: StructType): Seq[String]

normalizePartitionColumns...FIXME

mergeSchema

mergeSchema(
  txn: OptimisticTransaction,
  dataSchema: StructType,
  isOverwriteMode: Boolean,
  canOverwriteSchema: Boolean): StructType

mergeSchema...FIXME

New DomainMetadatas

getNewDomainMetadata(
  txn: OptimisticTransaction,
  canUpdateMetadata: Boolean,
  isReplacingTable: Boolean,
  clusterBySpecOpt: Option[ClusterBySpec] = None): Seq[DomainMetadata]

getNewDomainMetadata is empty (no DomainMetadata) if either of the following holds:

  • The given canUpdateMetadata flag is false
  • The given isReplacingTable flag is false and the delta table (of the given OptimisticTransaction) exists
canUpdateMetadata flag

The input canUpdateMetadata flag is exactly canUpdateMetadata of the given OptimisticTransaction.

isReplacingTable flag

The input isReplacingTable flag holds true for the SaveMode being Overwrite with no replaceWhere option enabled.

For all other cases, getNewDomainMetadata does one of the following:

  1. When the delta table (of the given OptimisticTransaction) does not exist, getNewDomainMetadata gives a DomainMetadata for the given ClusterBySpec
  2. Otherwise, getNewDomainMetadata handles domain metadata for replacing a table (with the existing and the new clustered DomainMetadata)

getNewDomainMetadata is used when:

Logging

ImplicitMetadataOperation is an abstract class and logging is configured using the logger of the implementations.