AddFile¶
AddFile
is a FileAction that represents an action of adding a new file to a delta table.
Creating Instance¶
AddFile
takes the following to be created:
- File Path
- Partition values (
Map[String, String]
) - Size (in bytes)
- Modification time
- dataChange flag
- JSON-encoded File Statistics
- DeletionVectorDescriptor
- Base Row ID
- Default Row Commit Version
- Clustering Provider (default: undefined)
AddFile
is created when:
ConvertUtilsBase
is requested tocreateAddFile
DelayedCommitProtocol
is requested to buildActionFromAddedFileTahoeChangeFileIndex
is requested tomatchingFiles
TahoeRemoveFileIndex
is requested to matchingFilesDeltaSource
is requested to filterAndGetIndexedFiles (for a sentinel)
dataChange¶
AddFile
is given dataChange
flag when created.
dataChange
is enabled (true
) when:
ConvertUtilsBase
is requested tocreateAddFile
DelayedCommitProtocol
is requested to buildActionFromAddedFile
dataChange
is disabled (false
) when:
TahoeChangeFileIndex
is requested to matchingFilesDeltaSource
is requested to filterAndGetIndexedFiles (for a sentinel)
dataChange
can also be specified when:
TahoeRemoveFileIndex
is requested to matchingFiles
File Statistics¶
stats: String
AddFile
can be given a JSON-encoded file statistics when created.
The statistics are undefined (null
) by default.
The statistics can be defined when:
ConvertToDeltaCommandUtils
is requested to computeStatsTransactionalWrite
is requested to write data out (and spark.databricks.delta.stats.collect configuration property is enabled)StatisticsCollection
is requested to recompute statistics for a delta table (seems to be used for testing only)
stats
is used when:
AddFile
is requested for parsedStatsFields
numLogicalRecords¶
numLogicalRecords
is numLogicalRecords
from the parsedStatsFields, if available.
Lazy Value
numLogicalRecords
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
parsedStatsFields¶
parsedStatsFields: Option[ParsedStatsFields]
parsedStatsFields
takes the value of numRecords
in the stats, if available, minus the numDeletedRecords.
Lazy Value
parsedStatsFields
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Converting to SingleAction¶
wrap: SingleAction
wrap
is part of the Action abstraction.
wrap
creates a new SingleAction with the add
field set to this AddFile
.
Converting to RemoveFile with Defaults¶
remove: RemoveFile
remove
creates a RemoveFile for the path (with the current time and dataChange
flag enabled).
remove
is used when:
- MergeIntoCommand is executed
WriteIntoDelta
is requested to write (withOverwrite
mode)DeltaSink
is requested to add a streaming micro-batch (withComplete
output mode)
Converting to RemoveFile¶
removeWithTimestamp(
timestamp: Long = System.currentTimeMillis(),
dataChange: Boolean = true): RemoveFile
remove
creates a new RemoveFile action for the path with the given timestamp
and dataChange
flag.
dataChange
Flag is Disabled for OptimizeTableCommand
dataChange
is true
(enabled) by default.
dataChange
can only be changed (to false
) when:
AddFile
is requested to removeRows (that does not changedataChange
flag though)OptimizeExecutor
is requested to runOptimizeBinJob
It is only OptimizeTableCommand that explicitly turns dataChange
off (false
).
removeWithTimestamp
is used when:
AddFile
is requested to create a RemoveFile action with the defaults, removeRows- CreateDeltaTableCommand, DeleteCommand, OptimizeTableCommand, RestoreTableCommand and UpdateCommand commands are executed
DMLWithDeletionVectorsHelper
is requested to processUnmodifiedDataDeltaCommand
is requested to removeFilesFromPaths
removeRows¶
removeRows(
deletionVector: DeletionVectorDescriptor,
updateStats: Boolean,
dataChange: Boolean = true): (AddFile, RemoveFile)
removeRows
...FIXME
removeRows
is used when:
DMLWithDeletionVectorsHelper
is requested to processUnmodifiedData
tag¶
tag(
tag: AddFile.Tags.KeyType): Option[String]
tag
gets the value of the given tag.
tag
is used when:
AddFile
is requested for an insertionTime (that does not seem to be used anywhere)
numLogicalRecords¶
Lazy Value
numLogicalRecords
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
numLogicalRecords
is parsedStatsFields.
numLogicalRecords
is used when:
DeleteCommandMetrics
is requested to getDeletedRowsFromAddFilesAndUpdateMetricsMergeIntoCommand
is requested to writeInsertsOnlyWhenNoMatchedClausesTransactionalWrite
is requested to writeFilesWriteIntoDelta
is requested to registerReplaceWhereMetrics