DataSourceV2Utils Utility¶
DataSourceV2Utils is an utility to extractSessionConfigs and getTableFromProvider for batch and streaming reads and writes.
extractSessionConfigs¶
extractSessionConfigs(
source: TableProvider,
conf: SQLConf): Map[String, String]
Note
extractSessionConfigs supports data sources with SessionConfigSupport only.
extractSessionConfigs requests the SessionConfigSupport data source for the custom key prefix for configuration options that is used to find all configuration options with the keys in the format of spark.datasource.[keyPrefix] in the given SQLConf.
extractSessionConfigs returns the matching keys with the spark.datasource.[keyPrefix] prefix removed (i.e. spark.datasource.keyPrefix.k1 becomes k1).
extractSessionConfigs is used when:
DataFrameReaderis requested to load dataDataFrameWriteris requested to save data- (Spark Structured Streaming)
DataStreamReaderis requested to load data from a streaming data source - (Spark Structured Streaming)
DataStreamWriteris requested to start a streaming query
Creating Table (using TableProvider)¶
getTableFromProvider(
provider: TableProvider,
options: CaseInsensitiveStringMap,
userSpecifiedSchema: Option[StructType]): Table
getTableFromProvider creates a Table for the given TableProvider, options and user-defined schema.
getTableFromProvider is used when:
DataFrameWriteris requested to save dataDataSourceV2Utilsis requested to loadV2SourceDataStreamReader(Spark Structured Streaming) is requested to load data from a streaming data sourceDataStreamWriter(Spark Structured Streaming) is requested to start a streaming query
Load V2 Source¶
loadV2Source(
sparkSession: SparkSession,
provider: TableProvider,
userSpecifiedSchema: Option[StructType],
extraOptions: CaseInsensitiveMap[String],
source: String,
paths: String*): Option[DataFrame]
loadV2Source extractSessionConfigs and adds the given paths if specified.
For the given TableProvider being a SupportsCatalogOptions, loadV2Source...FIXME
In the end, for a SupportsRead table with BATCH_READ capability, loadV2Source creates a DataFrame with DataSourceV2Relation logical operator. Otherwise, loadV2Source gives no DataFrame (None).
loadV2Source is used when:
- DataFrameReader.load operator is used
- CreateTempViewUsing logical operator is executed