Skip to content

OptimizeExecutor

OptimizeExecutor is a DeltaCommand with a SQLMetricsReporting.

Creating Instance

OptimizeExecutor takes the following to be created:

OptimizeExecutor is created when:

  • OptimizeTableCommand is requested to run

optimize

optimize(): Seq[Row]

optimize reads the following configuration properties:

optimize requests the DeltaLog to startTransaction.

optimize requests the OptimisticTransaction for the files matching the partition predicates.

optimize finds the files of the size below the spark.databricks.delta.optimize.minFileSize threshold (that are the files considered for compacting) and groups them by partition values.

optimize group the files into bins (of the spark.databricks.delta.optimize.maxFileSize size).

Note

A bin is a group of files, whose total size does not exceed the desired size. They will be coalesced into a single output file.

optimize creates a ForkJoinPool with spark.databricks.delta.optimize.maxThreads threads (with the OptimizeJob thread prefix). The task pool is then used to parallelize the submission of runCompactBinJob optimization jobs to Spark.

Once the compaction jobs are done, optimize tries to commit the transaction (the given actions to the log) when there were any AddFiles.

In the end, optimize returns a Row with the data path (of the Delta table) and the optimize statistics.

optimize is used when:

  • OptimizeTableCommand is requested to run
Back to top