Skip to content

DataSourceV2Utils Utility

DataSourceV2Utils is an utility to extractSessionConfigs and getTableFromProvider for batch and streaming reads and writes.

extractSessionConfigs

extractSessionConfigs(
  source: TableProvider,
  conf: SQLConf): Map[String, String]

Note

extractSessionConfigs supports data sources with SessionConfigSupport only.

extractSessionConfigs requests the SessionConfigSupport data source for the custom key prefix for configuration options that is used to find all configuration options with the keys in the format of spark.datasource.[keyPrefix] in the given SQLConf.

extractSessionConfigs returns the matching keys with the spark.datasource.[keyPrefix] prefix removed (i.e. spark.datasource.keyPrefix.k1 becomes k1).

extractSessionConfigs is used when:

  • DataFrameReader is requested to load data
  • DataFrameWriter is requested to save data
  • (Spark Structured Streaming) DataStreamReader is requested to load data from a streaming data source
  • (Spark Structured Streaming) DataStreamWriter is requested to start a streaming query

Creating Table (using TableProvider)

getTableFromProvider(
  provider: TableProvider,
  options: CaseInsensitiveStringMap,
  userSpecifiedSchema: Option[StructType]): Table

getTableFromProvider creates a Table for the given TableProvider, options and user-defined schema.


getTableFromProvider is used when:

Load V2 Source

loadV2Source(
  sparkSession: SparkSession,
  provider: TableProvider,
  userSpecifiedSchema: Option[StructType],
  extraOptions: CaseInsensitiveMap[String],
  source: String,
  paths: String*): Option[DataFrame]

loadV2Source extractSessionConfigs and adds the given paths if specified.

For the given TableProvider being a SupportsCatalogOptions, loadV2Source...FIXME

In the end, for a SupportsRead table with BATCH_READ capability, loadV2Source creates a DataFrame with DataSourceV2Relation logical operator. Otherwise, loadV2Source gives no DataFrame (None).


loadV2Source is used when: