Skip to content


From Optimize performance with file management:

To improve query speed, Delta Lake on Databricks supports the ability to optimize the layout of data stored in cloud storage. Delta Lake on Databricks supports two layout algorithms: bin-packing and Z-Ordering.

As of Delta Lake 2.0.0, the above quote applies to the open source version, too.

OPTIMIZE command can be executed using the following:


In bin-packing (aka. file compaction) mode, OPTIMIZE command compacts files together (that are smaller than to files of size).


OPTIMIZE can specify ZORDER BY columns for multi-dimensional clustering.


OPTIMIZE command uses threads for compaction.


Demo: Optimize

Learning More

There seems so many articles and academic papers about space filling curve based clustering algorithms. I'm hoping that one day I'll have read enough to develop my own intuition about z-order multi-dimensional optimization. If you know good articles about this space (pun intended), let me know. I'll collect them here for future reference (for others to learn along).

Thank you! 🙏

  1. Z-order curve