Skip to content

GeneratedColumn Utility

GeneratedColumn is a utility for Generated Columns.

import org.apache.spark.sql.delta.GeneratedColumn

isGeneratedColumn

isGeneratedColumn(
  protocol: Protocol,
  field: StructField): Boolean
isGeneratedColumn(
  field: StructField): Boolean

isGeneratedColumn returns true when the following all hold:

  1. satisfyGeneratedColumnProtocol
  2. The metadata of the given StructField (Spark SQL) contains (a binding for) the delta.generationExpression key.

isGeneratedColumn is used when:

getGeneratedColumns

getGeneratedColumns(
  snapshot: Snapshot): Seq[StructField]

getGeneratedColumns satisfyGeneratedColumnProtocol (with the protocol of the given Snapshot) and returns generated columns (based on the schema of the Metadata of the given Snapshot).

getGeneratedColumns is used when:

enforcesGeneratedColumns

enforcesGeneratedColumns(
  protocol: Protocol,
  metadata: Metadata): Boolean

enforcesGeneratedColumns is true when the following all hold:


enforcesGeneratedColumns is used when:

satisfyGeneratedColumnProtocol

satisfyGeneratedColumnProtocol(
  protocol: Protocol): Boolean

satisfyGeneratedColumnProtocol is true when the minWriterVersion of the given Protocol is at least 4.

satisfyGeneratedColumnProtocol is used when:

addGeneratedColumnsOrReturnConstraints

addGeneratedColumnsOrReturnConstraints(
  deltaLog: DeltaLog,
  queryExecution: QueryExecution,
  schema: StructType,
  df: DataFrame): (DataFrame, Seq[Constraint])

addGeneratedColumnsOrReturnConstraints returns a DataFrame with generated columns (missing in the schema) and constraints for generated columns (existing in the schema).

addGeneratedColumnsOrReturnConstraints finds generated columns (among the top-level columns in the given schema (StructType)).

For every generated column, addGeneratedColumnsOrReturnConstraints creates a Check constraint with the following:

  • Generated Column name
  • EqualNullSafe expression that compares the generated column expression with the value provided by the user

In the end, addGeneratedColumnsOrReturnConstraints uses select operator on the given DataFrame.

addGeneratedColumnsOrReturnConstraints is used when:

hasGeneratedColumns

hasGeneratedColumns(
  schema: StructType): Boolean

hasGeneratedColumns returns true if any of the top-level columns in the given StructType (Spark SQL) is a generated column.

hasGeneratedColumns is used when:

validateGeneratedColumns

validateGeneratedColumns(
  spark: SparkSession,
  schema: StructType): Unit

validateGeneratedColumns...FIXME

validateGeneratedColumns is used when: