AddFile¶
AddFile is a FileAction that represents an action of adding a new file to a delta table.
Creating Instance¶
AddFile takes the following to be created:
- File Path
- Partition values (
Map[String, String]) - Size (in bytes)
- Modification time
- dataChange flag
- JSON-encoded File Statistics
- DeletionVectorDescriptor
- Base Row ID
- Default Row Commit Version
- Clustering Provider (default: undefined)
AddFile is created when:
ConvertUtilsBaseis requested tocreateAddFileDelayedCommitProtocolis requested to buildActionFromAddedFileTahoeChangeFileIndexis requested tomatchingFilesTahoeRemoveFileIndexis requested to matchingFilesDeltaSourceis requested to filterAndGetIndexedFiles (for a sentinel)
dataChange¶
AddFile is given dataChange flag when created.
dataChange is enabled (true) when:
ConvertUtilsBaseis requested tocreateAddFileDelayedCommitProtocolis requested to buildActionFromAddedFile
dataChange is disabled (false) when:
TahoeChangeFileIndexis requested to matchingFilesDeltaSourceis requested to filterAndGetIndexedFiles (for a sentinel)
dataChange can also be specified when:
TahoeRemoveFileIndexis requested to matchingFiles
File Statistics¶
stats: String
AddFile can be given a JSON-encoded file statistics when created.
The statistics are undefined (null) by default.
The statistics can be defined when:
ConvertToDeltaCommandUtilsis requested to computeStatsTransactionalWriteis requested to write data out (and spark.databricks.delta.stats.collect configuration property is enabled)StatisticsCollectionis requested to recompute statistics for a delta table (seems to be used for testing only)
stats is used when:
AddFileis requested for parsedStatsFields
numLogicalRecords¶
numLogicalRecords is numLogicalRecords from the parsedStatsFields, if available.
Lazy Value
numLogicalRecords is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
parsedStatsFields¶
parsedStatsFields: Option[ParsedStatsFields]
parsedStatsFields takes the value of numRecords in the stats, if available, minus the numDeletedRecords.
Lazy Value
parsedStatsFields is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Converting to SingleAction¶
wrap: SingleAction
wrap is part of the Action abstraction.
wrap creates a new SingleAction with the add field set to this AddFile.
Converting to RemoveFile with Defaults¶
remove: RemoveFile
remove creates a RemoveFile for the path (with the current time and dataChange flag enabled).
remove is used when:
- MergeIntoCommand is executed
WriteIntoDeltais requested to write (withOverwritemode)DeltaSinkis requested to add a streaming micro-batch (withCompleteoutput mode)
Converting to RemoveFile¶
removeWithTimestamp(
timestamp: Long = System.currentTimeMillis(),
dataChange: Boolean = true): RemoveFile
remove creates a new RemoveFile action for the path with the given timestamp and dataChange flag.
dataChange Flag is Disabled for OptimizeTableCommand
dataChange is true (enabled) by default.
dataChange can only be changed (to false) when:
AddFileis requested to removeRows (that does not changedataChangeflag though)OptimizeExecutoris requested to runOptimizeBinJob
It is only OptimizeTableCommand that explicitly turns dataChange off (false).
removeWithTimestamp is used when:
AddFileis requested to create a RemoveFile action with the defaults, removeRows- CreateDeltaTableCommand, DeleteCommand, OptimizeTableCommand, RestoreTableCommand and UpdateCommand commands are executed
DMLWithDeletionVectorsHelperis requested to processUnmodifiedDataDeltaCommandis requested to removeFilesFromPaths
removeRows¶
removeRows(
deletionVector: DeletionVectorDescriptor,
updateStats: Boolean,
dataChange: Boolean = true): (AddFile, RemoveFile)
removeRows...FIXME
removeRows is used when:
DMLWithDeletionVectorsHelperis requested to processUnmodifiedData
tag¶
tag(
tag: AddFile.Tags.KeyType): Option[String]
tag gets the value of the given tag.
tag is used when:
AddFileis requested for an insertionTime (that does not seem to be used anywhere)
numLogicalRecords¶
Lazy Value
numLogicalRecords is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
numLogicalRecords is parsedStatsFields.
numLogicalRecords is used when:
DeleteCommandMetricsis requested to getDeletedRowsFromAddFilesAndUpdateMetricsMergeIntoCommandis requested to writeInsertsOnlyWhenNoMatchedClausesTransactionalWriteis requested to writeFilesWriteIntoDeltais requested to registerReplaceWhereMetrics