Skip to content


DeltaDataSource is a DataSourceRegister and is the entry point to all the features provided by delta data source.

DeltaDataSource is a RelationProvider.

DeltaDataSource is a StreamSinkProvider for a streaming sink for streaming queries (Structured Streaming).

DataSourceRegister and delta Alias

DeltaDataSource is a DataSourceRegister and registers delta alias.


Read up on DataSourceRegister in The Internals of Spark SQL online book.

DeltaDataSource is registered using META-INF/services/org.apache.spark.sql.sources.DataSourceRegister:

RelationProvider for Batch Queries

DeltaDataSource is a RelationProvider for reading (loading) data from a delta table in a structured query.


Read up on RelationProvider in The Internals of Spark SQL online book.

  sqlContext: SQLContext,
  parameters: Map[String, String]): BaseRelation

createRelation reads the path option from the given parameters.

createRelation verifies the given parameters.

createRelation extracts time travel specification from the given parameters.

In the end, createRelation creates a DeltaTableV2 (for the path option and the time travel specification) and requests it for an insertable HadoopFsRelation.

createRelation throws an IllegalArgumentException when path option is not specified:

'path' is not specified

Source Schema

  sqlContext: SQLContext,
  schema: Option[StructType],
  providerName: String,
  parameters: Map[String, String]): (String, StructType)

sourceSchema creates a DeltaLog for a Delta table in the directory specified by the required path option (in the parameters) and returns the delta name with the schema (of the Delta table).

sourceSchema throws an IllegalArgumentException when the path option has not been specified:

'path' is not specified

sourceSchema throws an AnalysisException when the path option uses time travel:

Cannot time travel views, subqueries or streams.

sourceSchema is part of the StreamSourceProvider abstraction (Spark Structured Streaming).


DeltaDataSource is a CreatableRelationProvider for writing out the result of a structured query.


Read up on CreatableRelationProvider in The Internals of Spark SQL online book.

Creating Streaming Source

DeltaDataSource is a StreamSourceProvider.

Creating Streaming Sink

DeltaDataSource is a StreamSinkProvider for a streaming sink for Structured Streaming.

DeltaDataSource supports Append and Complete output modes only.

In the end, DeltaDataSource creates a DeltaSink.

Loading Table

  schema: StructType,
  partitioning: Array[Transform],
  properties: java.util.Map[String, String]): Table


getTable is part of the TableProvider (Spark SQL 3.0.0) abstraction.



  parameters: Map[String, String]): Option[DeltaTimeTravelSpec]


getTimeTravelVersion is used when DeltaDataSource is requested to create a relation (as a RelationProvider).


  spark: SparkSession,
  userPath: String): (Path, Seq[(String, String)], Option[DeltaTimeTravelSpec])


parsePathIdentifier is used when DeltaTableV2 is requested for metadata (for a non-catalog table).

Last update: 2020-12-11