GeneratedColumn Utility¶
GeneratedColumn is a utility for Generated Columns.
import org.apache.spark.sql.delta.GeneratedColumn
isGeneratedColumn¶
isGeneratedColumn(
protocol: Protocol,
field: StructField): Boolean
isGeneratedColumn(
field: StructField): Boolean
isGeneratedColumn returns true when the following all hold:
- satisfyGeneratedColumnProtocol
- The metadata of the given
StructField(Spark SQL) contains (a binding for) the delta.generationExpression key.
isGeneratedColumn is used when:
ColumnWithDefaultExprUtilsutility is used to removeDefaultExpressions and columnHasDefaultExprGeneratedColumnutility is used to hasGeneratedColumns, getGeneratedColumns, enforcesGeneratedColumns and validateGeneratedColumns
getGeneratedColumns¶
getGeneratedColumns(
snapshot: Snapshot): Seq[StructField]
getGeneratedColumns satisfyGeneratedColumnProtocol (with the protocol of the given Snapshot) and returns generated columns (based on the schema of the Metadata of the given Snapshot).
getGeneratedColumns is used when:
- PreprocessTableUpdate logical resolution rule is executed (and toCommand)
enforcesGeneratedColumns¶
enforcesGeneratedColumns(
protocol: Protocol,
metadata: Metadata): Boolean
enforcesGeneratedColumns is true when the following all hold:
- satisfyGeneratedColumnProtocol with the given Protocol
- There is at least one generated column in the table schema (of the given Metadata)
enforcesGeneratedColumns is used when:
TransactionalWriteis requested to write data out (and normalizeData)
satisfyGeneratedColumnProtocol¶
satisfyGeneratedColumnProtocol(
protocol: Protocol): Boolean
satisfyGeneratedColumnProtocol is true when the minWriterVersion of the given Protocol is at least 4.
satisfyGeneratedColumnProtocol is used when:
ColumnWithDefaultExprUtilsutility is used to satisfyProtocolGeneratedColumnutility is used to isGeneratedColumn, getGeneratedColumns, enforcesGeneratedColumns and generatePartitionFilters- AlterTableChangeColumnDeltaCommand is executed
ImplicitMetadataOperationis requested to mergeSchema
addGeneratedColumnsOrReturnConstraints¶
addGeneratedColumnsOrReturnConstraints(
deltaLog: DeltaLog,
queryExecution: QueryExecution,
schema: StructType,
df: DataFrame): (DataFrame, Seq[Constraint])
addGeneratedColumnsOrReturnConstraints returns a DataFrame with generated columns (missing in the schema) and constraints for generated columns (existing in the schema).
addGeneratedColumnsOrReturnConstraints finds generated columns (among the top-level columns in the given schema (StructType)).
For every generated column, addGeneratedColumnsOrReturnConstraints creates a Check constraint with the following:
Generated ColumnnameEqualNullSafeexpression that compares the generated column expression with the value provided by the user
In the end, addGeneratedColumnsOrReturnConstraints uses select operator on the given DataFrame.
addGeneratedColumnsOrReturnConstraints is used when:
TransactionalWriteis requested to write data out (and normalizeData)
hasGeneratedColumns¶
hasGeneratedColumns(
schema: StructType): Boolean
hasGeneratedColumns returns true if any of the top-level columns in the given StructType (Spark SQL) is a generated column.
hasGeneratedColumns is used when:
OptimisticTransactionImplis requested to verify a new metadataProtocolis requested for the required minimum protocolSchemaUtilsutility is used to findDependentGeneratedColumns
validateGeneratedColumns¶
validateGeneratedColumns(
spark: SparkSession,
schema: StructType): Unit
validateGeneratedColumns...FIXME
validateGeneratedColumns is used when:
OptimisticTransactionImplis requested to verify a new metadata