DeltaConfigs (DeltaConfigsBase)¶
DeltaConfigs
holds the supported table properties in Delta Lake.
Accessing DeltaConfigs¶
import org.apache.spark.sql.delta.OptimisticTransaction
val txn: OptimisticTransaction = ???
import org.apache.spark.sql.delta.actions.Metadata
val metadata: Metadata = txn.metadata
import org.apache.spark.sql.delta.DeltaConfigs
DeltaConfigs.CHANGE_DATA_FEED.fromMetaData(metadata)
System-Wide Defaults¶
spark.databricks.delta.properties.defaults prefix is used for global table properties.
For every table property (without the delta.
prefix) there is the corresponding system-wide (global) configuration property with spark.databricks.delta.properties.defaults
prefix for the default values of the table properties for all delta tables.
Table Properties¶
All table properties start with delta.
prefix.
appendOnly¶
delta.appendOnly
Turns a table into append-only
When enabled, a table allows appends only and no updates or deletes.
Default: false
Used when:
DeltaLog
is requested to assertRemovable (that in turn usesDeltaErrors
utility to modifyAppendOnlyTableException)AppendOnlyTableFeature
is requested to metadataRequiresFeatureToBeEnabled
autoOptimize¶
delta.autoOptimize
Deprecated
delta.autoOptimize
is deprecated in favour of delta.autoOptimize.autoCompact table property since 3.1.0.
Whether this delta table will automagically optimize the layout of files during writes.
Default: false
autoOptimize.autoCompact¶
delta.autoOptimize.autoCompact
Enables Auto Compaction
Default: false
Replaces delta.autoOptimize
delta.autoOptimize.autoCompact
replaces delta.autoOptimize.autoCompact table property since 3.1.0.
Used when:
AutoCompactBase
is requested for the type of Auto Compaction
checkpointInterval¶
How often to checkpoint the state of a delta table (at the end of transaction commit)
Default: 10
checkpointRetentionDuration¶
delta.checkpointRetentionDuration
How long to keep checkpoint files around before deleting them
Default: interval 2 days
The most recent checkpoint is never deleted. It is acceptable to keep checkpoint files beyond this duration until the next calendar day.
checkpoint.writeStatsAsJson¶
delta.checkpoint.writeStatsAsJson
Controls whether to write file statistics in the checkpoint in JSON format as the stats
column.
Default: true
checkpoint.writeStatsAsStruct¶
delta.checkpoint.writeStatsAsStruct
Controls whether to write file statistics in the checkpoint in the struct format in the stats_parsed
column and partition values as a struct as partitionValues_parsed
Default: undefined
(Option[Boolean]
)
columnMapping.maxColumnId¶
delta.columnMapping.maxColumnId
Maximum columnId used in the schema so far for column mapping
Cannot be set
Default: 0
columnMapping.mode¶
delta.columnMapping.mode
DeltaColumnMappingMode to read and write parquet data files
Name | Description |
---|---|
none | (default) A display name is the only valid identifier of a column |
id | A column ID is the identifier of a column. This mode is used for tables converted from Iceberg and parquet files in this mode will also have corresponding field Ids for each column in their file schema. |
name | The physical column name is the identifier of a column. Stored as part of StructField metadata in the schema. Used for reading statistics and partition values in the DeltaLog. |
Used when:
DeltaColumnMappingBase
is requested to tryFixMetadata (whileOptimisticTransactionImpl
is requested to update the metadata)DeltaErrors
utility is used to create a DeltaColumnMappingUnsupportedException (whileOptimisticTransactionImpl
is requested to update the metadata)DeltaErrors
utility is used to create a DeltaColumnMappingUnsupportedException (while ConvertToDeltaCommand is executed)Metadata
is requested for the column mapping mode (whileDeltaFileFormat
is requested for the FileFormat)
compatibility.symlinkFormatManifest.enabled¶
delta.compatibility.symlinkFormatManifest.enabled
Whether to register the GenerateSymlinkManifest post-commit hook while committing a transaction or not
Default: false
dataSkippingNumIndexedCols¶
delta.dataSkippingNumIndexedCols
The number of columns to collect stats on for data skipping. -1
means collecting stats for all columns.
Default: 32
Must be larger than or equal to -1
.
Used when:
Snapshot
is requested for the maximum number of indexed columnsTransactionalWrite
is requested to write data out
deletedFileRetentionDuration¶
delta.deletedFileRetentionDuration
How long to keep logically deleted data files around before deleting them physically (to prevent failures in stale readers after compactions or partition overwrites)
Default: interval 1 week
enableChangeDataFeed¶
delta.enableChangeDataFeed
Enables Change Data Feed
Default: false
Legacy configuration: enableChangeDataCapture
Used when:
Protocol
is requested for the requiredMinimumProtocolDeleteCommand
is requested to rewriteFilesMergeIntoCommand
is requested to writeAllChangesUpdateCommand
is requested to shouldOutputCdcCDCReader
is requested to isCDCEnabledOnTable
enableDeletionVectors¶
delta.enableDeletionVectors
Enables Deletion Vectors
Default: false
Used when:
DeletionVectorsTableFeature
is requested to metadataRequiresFeatureToBeEnabledDeletionVectorUtils
is requested to deletionVectorsWritableProtocol
is requested to assertTablePropertyConstraintsSatisfiedUniversalFormat
is requested to enforceHudiDependencies
enableExpiredLogCleanup¶
delta.enableExpiredLogCleanup
Controls Log Cleanup
Default: true
Used when:
MetadataCleanup
is requested for whether to clean up expired log files and checkpoints
enableFullRetentionRollback¶
delta.enableFullRetentionRollback
Controls whether or not a delta table can be rolled back to any point within logRetentionDuration. When disabled, the table can be rolled back checkpointRetentionDuration only.
Default: true
enableRowTracking¶
delta.enableRowTracking
Default: false
Used when:
DeltaErrorsBase
is requested to convertToDeltaRowTrackingEnabledWithoutStatsCollectionRowId
is requested to isEnabledRowTracking
is requested to isEnabledRowTrackingFeature
is requested to metadataRequiresFeatureToBeEnabled
logRetentionDuration¶
delta.logRetentionDuration
How long to keep obsolete logs around before deleting them. Delta can keep logs beyond the duration until the next calendar day to avoid constantly creating checkpoints.
Default: interval 30 days
(CalendarInterval
)
Examples: 2 weeks
, 365 days
(months
and years
are not accepted)
Used when:
MetadataCleanup
is requested for the deltaRetentionMillis
minReaderVersion¶
delta.minReaderVersion
The protocol reader version
Default: 1
This property is not stored as a table property in the Metadata
action. It is stored as its own action. Having it modelled as a table property makes it easier to upgrade, and view the version.
minWriterVersion¶
delta.minWriterVersion
The protocol reader version
Default: 3
This property is not stored as a table property in the Metadata
action. It is stored as its own action. Having it modelled as a table property makes it easier to upgrade, and view the version.
randomizeFilePrefixes¶
delta.randomizeFilePrefixes
Whether to use a random prefix in a file path instead of partition information (may be required for very high volume S3 calls to better be partitioned across S3 servers)
Default: false
randomPrefixLength¶
delta.randomPrefixLength
The length of the random prefix in a file path for randomizeFilePrefixes
Default: 2
sampleRetentionDuration¶
delta.sampleRetentionDuration
How long to keep delta sample files around before deleting them
Default: interval 7 days
universalFormat.enabledFormats¶
delta.universalFormat.enabledFormats
A comma-separated list of table formats
Default: (empty)
Supported values:
hudi
iceberg
Used when:
ReorgTableForUpgradeUniformHelper
is requested to doRewriteUniversalFormat
is requested to enforceIcebergInvariantsAndDependencies, hudiEnabled, icebergEnabled
Building Configuration¶
buildConfig[T](
key: String,
defaultValue: String,
fromString: String => T,
validationFunction: T => Boolean,
helpMessage: String,
minimumProtocolVersion: Option[Protocol] = None): DeltaConfig[T]
buildConfig
creates a DeltaConfig for the given key
(with delta prefix added) and adds it to the entries internal registry.
buildConfig
is used to define all of the configuration properties in a type-safe way and (as a side effect) register them with the system-wide entries internal registry.
System-Wide Configuration Entries Registry¶
entries: HashMap[String, DeltaConfig[_]]
DeltaConfigs
utility (a Scala object) uses entries
internal registry of DeltaConfigs by their key.
New entries are added in buildConfig.
entries
is used when:
mergeGlobalConfigs¶
mergeGlobalConfigs(
sqlConfs: SQLConf,
tableConf: Map[String, String],
protocol: Protocol): Map[String, String]
mergeGlobalConfigs
finds all spark.databricks.delta.properties.defaults-prefixed table properties among the entries.
mergeGlobalConfigs
is used when:
OptimisticTransactionImpl
is requested to withGlobalConfigDefaultsInitialSnapshot
is created
validateConfigurations¶
validateConfigurations(
configurations: Map[String, String]): Map[String, String]
validateConfigurations
...FIXME
validateConfigurations
is used when:
DeltaCatalog
is requested to verifyTableAndSolidify, alterTableCloneTableBase
is requested to runInternalDeltaDataSource
is requested to create a BaseRelation
normalizeConfigKeys¶
normalizeConfigKeys(
propKeys: Seq[String]): Seq[String]
normalizeConfigKeys
...FIXME
normalizeConfigKeys
is used when:
- AlterTableUnsetPropertiesDeltaCommand is executed