DeltaDataSource¶
DeltaDataSource
is a DataSourceRegister and is the entry point to all the features provided by delta
data source.
DeltaDataSource
is a RelationProvider.
DeltaDataSource
is a StreamSinkProvider for a streaming sink for streaming queries (Structured Streaming).
DataSourceRegister and delta Alias¶
DeltaDataSource
is a DataSourceRegister
and registers delta alias.
Tip
Read up on DataSourceRegister in The Internals of Spark SQL online book.
DeltaDataSource is registered using META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
:
org.apache.spark.sql.delta.sources.DeltaDataSource
RelationProvider for Batch Queries¶
DeltaDataSource is a RelationProvider
for reading (loading) data from a delta table in a structured query.
Tip
Read up on RelationProvider in The Internals of Spark SQL online book.
createRelation(
sqlContext: SQLContext,
parameters: Map[String, String]): BaseRelation
createRelation
reads the path
option from the given parameters.
createRelation
verifies the given parameters.
createRelation
extracts time travel specification from the given parameters.
In the end, createRelation
creates a DeltaTableV2 (for the path
option and the time travel specification) and requests it for an insertable HadoopFsRelation.
createRelation
throws an IllegalArgumentException
when path
option is not specified:
'path' is not specified
Source Schema¶
sourceSchema(
sqlContext: SQLContext,
schema: Option[StructType],
providerName: String,
parameters: Map[String, String]): (String, StructType)
sourceSchema
creates a DeltaLog for a Delta table in the directory specified by the required path
option (in the parameters) and returns the delta name with the schema (of the Delta table).
sourceSchema
throws an IllegalArgumentException
when the path
option has not been specified:
'path' is not specified
sourceSchema
throws an AnalysisException
when the path
option uses time travel:
Cannot time travel views, subqueries or streams.
sourceSchema
is part of the StreamSourceProvider
abstraction (Spark Structured Streaming).
CreatableRelationProvider¶
DeltaDataSource is a CreatableRelationProvider
for writing out the result of a structured query.
Tip
Read up on CreatableRelationProvider in The Internals of Spark SQL online book.
Creating Streaming Source¶
DeltaDataSource is a StreamSourceProvider
.
Tip
Read up on StreamSourceProvider in The Internals of Spark Structured Streaming online book.
Creating Streaming Sink¶
DeltaDataSource is a StreamSinkProvider
for a streaming sink for Structured Streaming.
Tip
Read up on StreamSinkProvider in The Internals of Spark Structured Streaming online book.
DeltaDataSource supports Append
and Complete
output modes only.
In the end, DeltaDataSource creates a DeltaSink.
Tip
Consult the demo Using Delta Lake (as Streaming Sink) in Streaming Queries.
Loading Table¶
getTable(
schema: StructType,
partitioning: Array[Transform],
properties: java.util.Map[String, String]): Table
getTable
...FIXME
getTable
is part of the TableProvider
(Spark SQL 3.0.0) abstraction.
Utilities¶
getTimeTravelVersion¶
getTimeTravelVersion(
parameters: Map[String, String]): Option[DeltaTimeTravelSpec]
getTimeTravelVersion
...FIXME
getTimeTravelVersion
is used when DeltaDataSource
is requested to create a relation (as a RelationProvider).
parsePathIdentifier¶
parsePathIdentifier(
spark: SparkSession,
userPath: String): (Path, Seq[(String, String)], Option[DeltaTimeTravelSpec])
parsePathIdentifier
...FIXME
parsePathIdentifier
is used when DeltaTableV2
is requested for metadata (for a non-catalog table).