DeltaDataSource — Entry Point of Delta Data Source

DeltaDataSource is a DataSourceRegister and acts as the entry point to all features provided by delta data source.

DeltaDataSource is a RelationProvider.

DeltaDataSource is a StreamSinkProvider for a streaming sink for streaming queries (Structured Streaming).

DataSourceRegister for delta alias

DeltaDataSource is a DataSourceRegister and registers itself to be available using delta alias.

Reading From Delta Table
assert(spark.isInstanceOf[org.apache.spark.sql.SparkSession])
spark.read.format("delta")
spark.readStream.format("delta")
Writing To Delta Table
assert(df.isInstanceOf[org.apache.spark.sql.Dataset[_]])
df.write.format("delta")
df.writeStream.format("delta")
Read up on DataSourceRegister in The Internals of Spark SQL online book.

DeltaDataSource is registered using META-INF/services/org.apache.spark.sql.sources.DataSourceRegister:

org.apache.spark.sql.delta.sources.DeltaDataSource

RelationProvider — Creating Insertable HadoopFsRelation For Batch Queries

DeltaDataSource is a RelationProvider for reading (loading) data from a delta table in a structured query.

Read up on RelationProvider in The Internals of Spark SQL online book.
createRelation(
  sqlContext: SQLContext,
  parameters: Map[String, String]): BaseRelation

createRelation…​FIXME

In the end, createRelation requests the DeltaLog for an insertable HadoopFsRelation.

CreatableRelationProvider

DeltaDataSource is a CreatableRelationProvider for writing out the result of a structured query.

Creating Streaming Source (Structured Streaming) — createSource Method

DeltaDataSource is a StreamSourceProvider.

Creating Streaming Sink (Structured Streaming) — createSink Method

DeltaDataSource is a StreamSinkProvider for a streaming sink for Structured Streaming.

DeltaDataSource supports Append and Complete output modes only.

In the end, DeltaDataSource creates a DeltaSink.

sourceSchema Method

sourceSchema(
  sqlContext: SQLContext,
  schema: Option[StructType],
  providerName: String,
  parameters: Map[String, String]): (String, StructType)
sourceSchema is part of the StreamSourceProvider contract (Spark Structured Streaming) for the name and schema of the streaming source.

sourceSchema…​FIXME

getTimeTravelVersion Internal Method

getTimeTravelVersion(
  parameters: Map[String, String]): Option[DeltaTimeTravelSpec]

getTimeTravelVersion…​FIXME

getTimeTravelVersion is used exclusively when DeltaDataSource is requested to create a relation (as a RelationProvider).

extractDeltaPath Utility

extractDeltaPath(
  dataset: Dataset[_]): Option[String]

extractDeltaPath…​FIXME

extractDeltaPath does not seem to be used whatsoever.