AddFile¶
AddFile
is a FileAction that represents an action of adding a file to a delta table.
Creating Instance¶
AddFile
takes the following to be created:
- Path
- Partition values (
Map[String, String]
) - Size (in bytes)
- Modification time
-
dataChange
flag - File Statistics
AddFile
is created when:
-
ConvertToDeltaCommand is executed (for every data file to import)
-
DelayedCommitProtocol
is requested to commit a task (after successful write) (for optimistic transactional writers)
File Statistics¶
AddFile
can be given a JSON-encoded file statistics when created.
The statistics are undefined (null
) by default.
The statistics can be defined when:
TransactionalWrite
is requested to write data out (and spark.databricks.delta.stats.collect configuration property is enabled)StatisticsCollection
utility is used to recompute statistics for a delta table (that seems unused though)
Converting to SingleAction¶
wrap: SingleAction
wrap
is part of the Action abstraction.
wrap
creates a new SingleAction with the add
field set to this AddFile
.
Converting to RemoveFile with Defaults¶
remove: RemoveFile
remove
creates a RemoveFile for the path (with the current time and dataChange
flag enabled).
remove
is used when:
- MergeIntoCommand is executed
WriteIntoDelta
is requested to write (withOverwrite
mode)DeltaSink
is requested to add a streaming micro-batch (withComplete
output mode)
Converting to RemoveFile¶
removeWithTimestamp(
timestamp: Long = System.currentTimeMillis(),
dataChange: Boolean = true): RemoveFile
remove
creates a new RemoveFile action for the path with the given timestamp
and dataChange
flag.
removeWithTimestamp
is used when:
AddFile
is requested to create a RemoveFile action with the defaults- CreateDeltaTableCommand, DeleteCommand and UpdateCommand commands are executed
DeltaCommand
is requested to removeFilesFromPaths