DeltaColumnMappingBase (DeltaColumnMapping)¶
DeltaColumnMappingBase
is an abstraction of DeltaColumnMappings.
Implementations¶
Compatible Protocol¶
DeltaColumnMappingBase
defines a Protocol (with MIN_READER_VERSION and MIN_WRITER_VERSION) as the minimum protocol version for the readers and writers to delta tables with column mapping.
Protocol
utility is used for requiredMinimumProtocol- delta.columnMapping.mode configuration property
- delta.columnMapping.maxColumnId configuration property
DeltaErrors
is requested to changeColumnMappingModeOnOldProtocol (for error reporting)
Minimum Reader Version¶
DeltaColumnMappingBase
defines MIN_READER_VERSION
constant as 2
for the minimum version of the compatible readers of delta tables to satisfyColumnMappingProtocol.
Minimum Writer Version¶
DeltaColumnMappingBase
defines MIN_WRITER_VERSION
constant as 5
for the minimum version of the compatible writers to delta tables to satisfyColumnMappingProtocol.
createPhysicalSchema¶
createPhysicalSchema(
schema: StructType,
referenceSchema: StructType,
columnMappingMode: DeltaColumnMappingMode,
checkSupportedMode: Boolean = true): StructType
createPhysicalSchema
...FIXME
createPhysicalSchema
is used when:
DeltaColumnMappingBase
is requested to checkColumnIdAndPhysicalNameAssignments and createPhysicalAttributesDeltaParquetFileFormat
is requested to prepare a schema
renameColumns¶
renameColumns(
schema: StructType): StructType
renameColumns
...FIXME
renameColumns
is used when:
Metadata
is requested for the physicalPartitionSchema
requiresNewProtocol¶
requiresNewProtocol(
metadata: Metadata): Boolean
requiresNewProtocol
is true
when the DeltaColumnMappingMode (of this delta table per the given Metadata) is either IdMapping or NameMapping. Otherwise, requiresNewProtocol
is false
requiresNewProtocol
is used when:
Protocol
utility is used to determine the required minimum protocol.
checkColumnIdAndPhysicalNameAssignments¶
checkColumnIdAndPhysicalNameAssignments(
schema: StructType,
mode: DeltaColumnMappingMode): Unit
checkColumnIdAndPhysicalNameAssignments
...FIXME
checkColumnIdAndPhysicalNameAssignments
is used when:
OptimisticTransactionImpl
is requested to verify the new metadata
dropColumnMappingMetadata¶
dropColumnMappingMetadata(
schema: StructType): StructType
dropColumnMappingMetadata
...FIXME
dropColumnMappingMetadata
is used when:
DeltaLog
is requested for a BaseRelation and for a DataFrameDeltaTableV2
is requested for the tableSchema- AlterTableSetLocationDeltaCommand command is executed
- CreateDeltaTableCommand command is executed
ImplicitMetadataOperation
is requested to update the metadata
Mapping Virtual to Physical Field Name¶
getPhysicalName(
field: StructField): String
getPhysicalName
requests the given StructField
(Spark SQL) for the Metadata
to extract delta.columnMapping.physicalName
key, if available (for column mapping). Otherwise, getPhysicalName
returns the name of the given StructField
(with no name changes).
getPhysicalName
is used when:
CheckpointV2
utility is used to extractPartitionValuesConflictChecker
is requested to getPrettyPartitionMessageDeltaColumnMappingBase
is requested to renameColumns, assignPhysicalNames and createPhysicalSchemaDeltaLog
utility is used to rewritePartitionFilters- AlterTableChangeColumnDeltaCommand is executed
ConvertToDeltaCommand
utility is used to create an AddFileTahoeFileIndex
is requested to makePartitionDirectoriesDataSkippingReaderBase
is requested to getStatsColumnOptStatisticsCollection
is requested to collect statistics
verifyAndUpdateMetadataChange¶
verifyAndUpdateMetadataChange(
oldProtocol: Protocol,
oldMetadata: Metadata,
newMetadata: Metadata,
isCreatingNewTable: Boolean): Metadata
verifyAndUpdateMetadataChange
...FIXME
In the end, verifyAndUpdateMetadataChange
tryFixMetadata with the given newMetadata
and oldMetadata
metadata.
verifyAndUpdateMetadataChange
is used when:
OptimisticTransactionImpl
is requested to updateMetadataInternal
tryFixMetadata¶
tryFixMetadata(
oldMetadata: Metadata,
newMetadata: Metadata,
isChangingModeOnExistingTable: Boolean): Metadata
tryFixMetadata
reads columnMapping.mode table property from the given newMetadata
table metadata.
If the DeltaColumnMappingMode is IdMapping or NameMapping, tryFixMetadata
assignColumnIdAndPhysicalName with the given newMetadata
and oldMetadata
metadata and isChangingModeOnExistingTable
flag.
For NoMapping
, tryFixMetadata
does nothing and returns the given newMetadata
.
satisfyColumnMappingProtocol¶
satisfyColumnMappingProtocol(
protocol: Protocol): Boolean
satisfyColumnMappingProtocol
returns true
when all the following hold true:
- minWriterVersion of the given
Protocol
is at least 5 - minReaderVersion of the given
Protocol
is at least 2
Allowed Mapping Mode Change¶
allowMappingModeChange(
oldMode: DeltaColumnMappingMode,
newMode: DeltaColumnMappingMode): Boolean
allowMappingModeChange
is true
when either of the following holds true:
- There is no mode change (and the old and new modes are the same)
- There is a mode change from NoMapping old mode to NameMapping
Otherwise, allowMappingModeChange
is false
.
DeltaColumnMapping¶
DeltaColumnMapping
is the only DeltaColumnMappingBase.
Supported Column Mapping Modes¶
supportedModes: Set[DeltaColumnMappingMode]
DeltaColumnMappingBase
defines supportedModes
value with NoMapping and NameMapping column mapping modes.
supportedModes
is used when:
DeltaColumnMappingBase
is requested to verifyAndUpdateMetadataChange and createPhysicalSchema
getColumnMappingMetadata¶
getColumnMappingMetadata(
field: StructField,
mode: DeltaColumnMappingMode): Metadata
Note
getColumnMappingMetadata
returns Spark SQL's Metadata not Delta Lake's.
getColumnMappingMetadata
...FIXME
getColumnMappingMetadata
is used when:
DeltaColumnMappingBase
is requested to setColumnMetadata and createPhysicalSchema