DataSourceV2Utils Utility¶
DataSourceV2Utils
is an utility to extractSessionConfigs and getTableFromProvider for batch and streaming reads and writes.
extractSessionConfigs¶
extractSessionConfigs(
source: TableProvider,
conf: SQLConf): Map[String, String]
Note
extractSessionConfigs
supports data sources with SessionConfigSupport only.
extractSessionConfigs
requests the SessionConfigSupport
data source for the custom key prefix for configuration options that is used to find all configuration options with the keys in the format of spark.datasource.[keyPrefix] in the given SQLConf.
extractSessionConfigs
returns the matching keys with the spark.datasource.[keyPrefix] prefix removed (i.e. spark.datasource.keyPrefix.k1
becomes k1
).
extractSessionConfigs
is used when:
DataFrameReader
is requested to load dataDataFrameWriter
is requested to save data- (Spark Structured Streaming)
DataStreamReader
is requested to load data from a streaming data source - (Spark Structured Streaming)
DataStreamWriter
is requested to start a streaming query
Creating Table (using TableProvider)¶
getTableFromProvider(
provider: TableProvider,
options: CaseInsensitiveStringMap,
userSpecifiedSchema: Option[StructType]): Table
getTableFromProvider
creates a Table for the given TableProvider, options and user-defined schema.
getTableFromProvider
is used when:
DataFrameWriter
is requested to save dataDataSourceV2Utils
is requested to loadV2SourceDataStreamReader
(Spark Structured Streaming) is requested to load data from a streaming data sourceDataStreamWriter
(Spark Structured Streaming) is requested to start a streaming query
Load V2 Source¶
loadV2Source(
sparkSession: SparkSession,
provider: TableProvider,
userSpecifiedSchema: Option[StructType],
extraOptions: CaseInsensitiveMap[String],
source: String,
paths: String*): Option[DataFrame]
loadV2Source
extractSessionConfigs and adds the given paths if specified.
For the given TableProvider being a SupportsCatalogOptions, loadV2Source
...FIXME
In the end, for a SupportsRead table with BATCH_READ capability, loadV2Source
creates a DataFrame
with DataSourceV2Relation logical operator. Otherwise, loadV2Source
gives no DataFrame
(None
).
loadV2Source
is used when:
- DataFrameReader.load operator is used
- CreateTempViewUsing logical operator is executed