FileDataSourceV2 Table Providers¶

FileDataSourceV2 is an extension of the TableProvider abstraction for file-based table providers.

Contract¶

fallbackFileFormat¶

fallbackFileFormat: Class[_ <: FileFormat]

A V1 FileFormat class of this file-based data source

See:

ParquetDataSourceV2

Used when:

DDLUtils is requested to checkDataColNames
DataSource is requested for the providingClass (for resolving data source relation for catalog tables)
PreprocessTableCreation logical analysis rule is executed

Table¶

getTable(
  options: CaseInsensitiveStringMap): Table
getTable(
  options: CaseInsensitiveStringMap,
  schema: StructType): Table
getTable(
  schema: StructType,
  partitioning: Array[Transform],
  properties: Map[String, String]): Table // (1)!

Part of the TableProvider abstraction

A Table of this table provider

See:

ParquetDataSourceV2

Used when:

FileDataSourceV2 is requested for a table (as a TableProvider) and inferSchema

Implementations¶

AvroDataSourceV2
CSVDataSourceV2
JsonDataSourceV2
OrcDataSourceV2
ParquetDataSourceV2
TextDataSourceV2

DataSourceRegister¶

FileDataSourceV2 is a DataSourceRegister.

Schema Inference¶

inferSchema(
  options: CaseInsensitiveStringMap): StructType

inferSchema is part of the TableProvider abstraction.

inferSchema requests the Table for the schema.

If not available, inferSchema creates a Table and "saves" it for later (in t registry).

Table Name¶

getTableName(
  map: CaseInsensitiveStringMap,
  paths: Seq[String]): String

getTableName uses short name and the given paths to create the following table name (possibly redacting sensitive parts per spark.sql.redaction.string.regex):

[short name] [comma-separated paths]

Paths¶

getPaths(
  map: CaseInsensitiveStringMap): Seq[String]

getPaths concatenates the values of the paths and path keys (from the given map).