Skip to content

DeltaDataSource

DeltaDataSource is a DataSourceRegister and is the entry point to all the features provided by delta data source.

DeltaDataSource is a RelationProvider.

DeltaDataSource is a StreamSinkProvider for a streaming sink for streaming queries (Structured Streaming).

DataSourceRegister and delta Alias

DeltaDataSource is a DataSourceRegister (Spark SQL) and registers delta alias.

DeltaDataSource is registered using META-INF/services/org.apache.spark.sql.sources.DataSourceRegister:

org.apache.spark.sql.delta.sources.DeltaDataSource

RelationProvider for Batch Queries

DeltaDataSource is a RelationProvider (Spark SQL) for reading (loading) data from a delta table in a structured query.

createRelation(
  sqlContext: SQLContext,
  parameters: Map[String, String]): BaseRelation

createRelation reads the path option from the given parameters.

createRelation verifies the given parameters.

createRelation extracts time travel specification from the given parameters.

In the end, createRelation creates a DeltaTableV2 (for the path option and the time travel specification) and requests it for an insertable HadoopFsRelation.

createRelation throws an IllegalArgumentException when path option is not specified:

'path' is not specified

Source Schema

sourceSchema(
  sqlContext: SQLContext,
  schema: Option[StructType],
  providerName: String,
  parameters: Map[String, String]): (String, StructType)

sourceSchema creates a DeltaLog for a Delta table in the directory specified by the required path option (in the parameters) and returns the delta name with the schema (of the Delta table).

sourceSchema throws an IllegalArgumentException when the path option has not been specified:

'path' is not specified

sourceSchema throws an AnalysisException when the path option uses time travel:

Cannot time travel views, subqueries or streams.

sourceSchema is part of the StreamSourceProvider abstraction (Spark Structured Streaming).

CreatableRelationProvider

DeltaDataSource is a CreatableRelationProvider (Spark SQL) for writing out the result of a structured query.

Creating Streaming Source

DeltaDataSource is a StreamSourceProvider (Spark Structured Streaming) for a streaming source in streaming queries.

Creating Streaming Sink

DeltaDataSource is a StreamSinkProvider (Spark Structured Streaming) for a streaming sink in streaming queries.

DeltaDataSource supports Append and Complete output modes only.

In the end, DeltaDataSource creates a DeltaSink.

Loading Table

getTable(
  schema: StructType,
  partitioning: Array[Transform],
  properties: java.util.Map[String, String]): Table

getTable...FIXME

getTable is part of the TableProvider (Spark SQL 3.0.0) abstraction.

Utilities

getTimeTravelVersion

getTimeTravelVersion(
  parameters: Map[String, String]): Option[DeltaTimeTravelSpec]

getTimeTravelVersion...FIXME

getTimeTravelVersion is used when DeltaDataSource is requested to create a relation (as a RelationProvider).

parsePathIdentifier

parsePathIdentifier(
  spark: SparkSession,
  userPath: String): (Path, Seq[(String, String)], Option[DeltaTimeTravelSpec])

parsePathIdentifier...FIXME

parsePathIdentifier is used when DeltaTableV2 is requested for metadata (for a non-catalog table).


Last update: 2021-03-19