As it was well said: "Delta is a storage format while Spark is an execution engine...to separate storage from compute." Yet, Delta Lake can run with other execution engines like Trino or Apache Flink.
Delta Lake 2.1.0rc1 supports Apache Spark 3.3.0 (cf. build.sbt).
Delta Lake uses OptimisticTransaction for transactional writes. A commit is successful when the transaction can write the actions to a delta file (in the transactional log). In case the delta file for the commit version already exists, the transaction is retried.
More importantly, multiple queries can write to the same delta table simultaneously (at exactly the same time).
TransactionalWrite is an interface for writing out data to a delta table.
The following commands and operations can transactionally write new data files out to a data directory of a delta table:
Delta Lake provides the following programmatic APIs:
Delta Lake supports batch and streaming queries (Spark SQL and Structured Streaming, respectively) using delta format.
In order to fine tune queries over data in Delta Lake use options.
Structured queries can write (transactionally) to a delta table using the following interfaces:
- WriteIntoDelta command for batch queries (Spark SQL)
- DeltaSink for streaming queries (Spark Structured Streaming)
Delta Lake supports reading and writing in batch queries:
Delta Lake supports reading and writing in streaming queries:
Delta Tables in Logical Query Plans¶
Put simply, delta tables are
HadoopFsRelation with TahoeFileIndex in logical query plans.
Concurrent Blind Append Transactions¶
Blind append transactions are marked in the commit info to distinguish them from read-modify-appends (deletes, merges or updates) and assume no conflict between concurrent transactions.
Blind Append Transactions allow for concurrent updates.
Delta Lake supports Generated Columns.
Delta Lake introduces table constraints to ensure data quality and integrity (during writes).
Exception Public API¶
Delta Lake introduces exceptions due to conflicts between concurrent operations as a public API.
Simplified Storage Configuration¶
Delta Lake 1.2.0¶
Compacting Small Files (Optimize)¶
Delta Lake 1.2.0 introduces a new OPTIMIZE SQL command for compacting small files into larger ones.
Delta Lake 1.2.0 introduces support for Data Skipping.