The Internals of Delta Lake 0.5.0
Delta Lake uses OptimisticTransaction for transactional writes. A commit is successful when the transaction can write the actions to a delta file (in the transactional log). In case the delta file for the commit version already exists, the transaction is retried.
Structured queries can write (transactionally) to a delta table using the following interfaces:
More importantly, multiple queries can write to the same delta table simultaneously (at exactly the same time).
Delta Lake provides DeltaTable API to programmatically access Delta tables. A delta table can be created based on a parquet table (DeltaTable.convertToDelta) or from scratch (DeltaTable.forPath).
Delta Lake supports Spark SQL and Structured Streaming using delta format.
Delta Lake supports reading and writing in batch queries:
Delta Lake supports reading and writing in streaming queries:
In order to "install" and use Delta Lake in a Spark application (e.g.
--packages command-line option.
/* ./bin/spark-shell \ --packages io.delta:delta-core_2.12:0.5.0 \ --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension */ assert(spark.isInstanceOf[org.apache.spark.sql.SparkSession]) assert(spark.version.matches("2.4.[2-4]"), "Delta Lake supports Spark 2.4.2+") val input = spark .read .format("delta") .option("path", "delta") .load