DeltaConfigs (DeltaConfigsBase)¶
DeltaConfigs holds the supported table properties in Delta Lake.
Accessing DeltaConfigs¶
import org.apache.spark.sql.delta.OptimisticTransaction
val txn: OptimisticTransaction = ???
import org.apache.spark.sql.delta.actions.Metadata
val metadata: Metadata = txn.metadata
import org.apache.spark.sql.delta.DeltaConfigs
DeltaConfigs.CHANGE_DATA_FEED.fromMetaData(metadata)
System-Wide Defaults¶
spark.databricks.delta.properties.defaults prefix is used for global table properties.
For every table property (without the delta. prefix) there is the corresponding system-wide (global) configuration property with spark.databricks.delta.properties.defaults prefix for the default values of the table properties for all delta tables.
Table Properties¶
All table properties start with delta. prefix.
appendOnly¶
delta.appendOnly
Turns a table into append-only
When enabled, a table allows appends only and no updates or deletes.
Default: false
Used when:
DeltaLogis requested to assertRemovable (that in turn usesDeltaErrorsutility to modifyAppendOnlyTableException)AppendOnlyTableFeatureis requested to metadataRequiresFeatureToBeEnabled
autoOptimize¶
delta.autoOptimize
Deprecated
delta.autoOptimize is deprecated in favour of delta.autoOptimize.autoCompact table property since 3.1.0.
Whether this delta table will automagically optimize the layout of files during writes.
Default: false
autoOptimize.autoCompact¶
delta.autoOptimize.autoCompact
Enables Auto Compaction
Default: false
Replaces delta.autoOptimize
delta.autoOptimize.autoCompact replaces delta.autoOptimize.autoCompact table property since 3.1.0.
Used when:
AutoCompactBaseis requested for the type of Auto Compaction
checkpointInterval¶
How often to checkpoint the state of a delta table (at the end of transaction commit)
Default: 10
checkpointRetentionDuration¶
delta.checkpointRetentionDuration
How long to keep checkpoint files around before deleting them
Default: interval 2 days
The most recent checkpoint is never deleted. It is acceptable to keep checkpoint files beyond this duration until the next calendar day.
checkpoint.writeStatsAsJson¶
delta.checkpoint.writeStatsAsJson
Controls whether to write file statistics in the checkpoint in JSON format as the stats column.
Default: true
checkpoint.writeStatsAsStruct¶
delta.checkpoint.writeStatsAsStruct
Controls whether to write file statistics in the checkpoint in the struct format in the stats_parsed column and partition values as a struct as partitionValues_parsed
Default: undefined (Option[Boolean])
columnMapping.maxColumnId¶
delta.columnMapping.maxColumnId
Maximum columnId used in the schema so far for column mapping
Cannot be set
Default: 0
columnMapping.mode¶
delta.columnMapping.mode
DeltaColumnMappingMode to read and write parquet data files
| Name | Description |
|---|---|
none | (default) A display name is the only valid identifier of a column |
id | A column ID is the identifier of a column. This mode is used for tables converted from Iceberg and parquet files in this mode will also have corresponding field Ids for each column in their file schema. |
name | The physical column name is the identifier of a column. Stored as part of StructField metadata in the schema. Used for reading statistics and partition values in the DeltaLog. |
Used when:
DeltaColumnMappingBaseis requested to tryFixMetadata (whileOptimisticTransactionImplis requested to update the metadata)DeltaErrorsutility is used to create a DeltaColumnMappingUnsupportedException (whileOptimisticTransactionImplis requested to update the metadata)DeltaErrorsutility is used to create a DeltaColumnMappingUnsupportedException (while ConvertToDeltaCommand is executed)Metadatais requested for the column mapping mode (whileDeltaFileFormatis requested for the FileFormat)
compatibility.symlinkFormatManifest.enabled¶
delta.compatibility.symlinkFormatManifest.enabled
Whether to register the GenerateSymlinkManifest post-commit hook while committing a transaction or not
Default: false
dataSkippingNumIndexedCols¶
delta.dataSkippingNumIndexedCols
The number of columns to collect stats on for data skipping. -1 means collecting stats for all columns.
Default: 32
Must be larger than or equal to -1.
Used when:
Snapshotis requested for the maximum number of indexed columnsTransactionalWriteis requested to write data out
deletedFileRetentionDuration¶
delta.deletedFileRetentionDuration
How long to keep logically deleted data files around before deleting them physically (to prevent failures in stale readers after compactions or partition overwrites)
Default: interval 1 week
enableChangeDataFeed¶
delta.enableChangeDataFeed
Enables Change Data Feed
Default: false
Legacy configuration: enableChangeDataCapture
Used when:
Protocolis requested for the requiredMinimumProtocolDeleteCommandis requested to rewriteFilesMergeIntoCommandis requested to writeAllChangesUpdateCommandis requested to shouldOutputCdcCDCReaderis requested to isCDCEnabledOnTable
enableDeletionVectors¶
delta.enableDeletionVectors
Enables Deletion Vectors
Default: false
Used when:
DeletionVectorsTableFeatureis requested to metadataRequiresFeatureToBeEnabledDeletionVectorUtilsis requested to deletionVectorsWritableProtocolis requested to assertTablePropertyConstraintsSatisfiedUniversalFormatis requested to enforceHudiDependencies
enableExpiredLogCleanup¶
delta.enableExpiredLogCleanup
Controls Log Cleanup
Default: true
Used when:
MetadataCleanupis requested for whether to clean up expired log files and checkpoints
enableFullRetentionRollback¶
delta.enableFullRetentionRollback
Controls whether or not a delta table can be rolled back to any point within logRetentionDuration. When disabled, the table can be rolled back checkpointRetentionDuration only.
Default: true
enableRowTracking¶
delta.enableRowTracking
Default: false
Used when:
DeltaErrorsBaseis requested to convertToDeltaRowTrackingEnabledWithoutStatsCollectionRowIdis requested to isEnabledRowTrackingis requested to isEnabledRowTrackingFeatureis requested to metadataRequiresFeatureToBeEnabled
logRetentionDuration¶
delta.logRetentionDuration
How long to keep obsolete logs around before deleting them. Delta can keep logs beyond the duration until the next calendar day to avoid constantly creating checkpoints.
Default: interval 30 days (CalendarInterval)
Examples: 2 weeks, 365 days (months and years are not accepted)
Used when:
MetadataCleanupis requested for the deltaRetentionMillis
minReaderVersion¶
delta.minReaderVersion
The protocol reader version
Default: 1
This property is not stored as a table property in the Metadata action. It is stored as its own action. Having it modelled as a table property makes it easier to upgrade, and view the version.
minWriterVersion¶
delta.minWriterVersion
The protocol reader version
Default: 3
This property is not stored as a table property in the Metadata action. It is stored as its own action. Having it modelled as a table property makes it easier to upgrade, and view the version.
randomizeFilePrefixes¶
delta.randomizeFilePrefixes
Whether to use a random prefix in a file path instead of partition information (may be required for very high volume S3 calls to better be partitioned across S3 servers)
Default: false
randomPrefixLength¶
delta.randomPrefixLength
The length of the random prefix in a file path for randomizeFilePrefixes
Default: 2
sampleRetentionDuration¶
delta.sampleRetentionDuration
How long to keep delta sample files around before deleting them
Default: interval 7 days
universalFormat.enabledFormats¶
delta.universalFormat.enabledFormats
A comma-separated list of table formats
Default: (empty)
Supported values:
hudiiceberg
Used when:
ReorgTableForUpgradeUniformHelperis requested to doRewriteUniversalFormatis requested to enforceIcebergInvariantsAndDependencies, hudiEnabled, icebergEnabled
Building Configuration¶
buildConfig[T](
key: String,
defaultValue: String,
fromString: String => T,
validationFunction: T => Boolean,
helpMessage: String,
minimumProtocolVersion: Option[Protocol] = None): DeltaConfig[T]
buildConfig creates a DeltaConfig for the given key (with delta prefix added) and adds it to the entries internal registry.
buildConfig is used to define all of the configuration properties in a type-safe way and (as a side effect) register them with the system-wide entries internal registry.
System-Wide Configuration Entries Registry¶
entries: HashMap[String, DeltaConfig[_]]
DeltaConfigs utility (a Scala object) uses entries internal registry of DeltaConfigs by their key.
New entries are added in buildConfig.
entries is used when:
mergeGlobalConfigs¶
mergeGlobalConfigs(
sqlConfs: SQLConf,
tableConf: Map[String, String],
protocol: Protocol): Map[String, String]
mergeGlobalConfigs finds all spark.databricks.delta.properties.defaults-prefixed table properties among the entries.
mergeGlobalConfigs is used when:
OptimisticTransactionImplis requested to withGlobalConfigDefaultsInitialSnapshotis created
validateConfigurations¶
validateConfigurations(
configurations: Map[String, String]): Map[String, String]
validateConfigurations...FIXME
validateConfigurations is used when:
DeltaCatalogis requested to verifyTableAndSolidify, alterTableCloneTableBaseis requested to runInternalDeltaDataSourceis requested to create a BaseRelation
normalizeConfigKeys¶
normalizeConfigKeys(
propKeys: Seq[String]): Seq[String]
normalizeConfigKeys...FIXME
normalizeConfigKeys is used when:
- AlterTableUnsetPropertiesDeltaCommand is executed