Skip to content

FileDataSourceV2 Table Providers

FileDataSourceV2 is an extension of the TableProvider abstraction for file-based table providers.

Contract

fallbackFileFormat

fallbackFileFormat: Class[_ <: FileFormat]

A V1 FileFormat class of this file-based data source

See:

Used when:

  • DDLUtils is requested to checkDataColNames
  • DataSource is requested for the providingClass (for resolving data source relation for catalog tables)
  • PreprocessTableCreation logical analysis rule is executed

Table

getTable(
  options: CaseInsensitiveStringMap): Table
getTable(
  options: CaseInsensitiveStringMap,
  schema: StructType): Table
getTable(
  schema: StructType,
  partitioning: Array[Transform],
  properties: Map[String, String]): Table // (1)!
  1. Part of the TableProvider abstraction

A Table of this table provider

See:

Used when:

Implementations

  • AvroDataSourceV2
  • CSVDataSourceV2
  • JsonDataSourceV2
  • OrcDataSourceV2
  • ParquetDataSourceV2
  • TextDataSourceV2

DataSourceRegister

FileDataSourceV2 is a DataSourceRegister.

Schema Inference

inferSchema(
  options: CaseInsensitiveStringMap): StructType

inferSchema is part of the TableProvider abstraction.


inferSchema requests the Table for the schema.

If not available, inferSchema creates a Table and "saves" it for later (in t registry).

Table Name

getTableName(
  map: CaseInsensitiveStringMap,
  paths: Seq[String]): String

getTableName uses short name and the given paths to create the following table name (possibly redacting sensitive parts per spark.sql.redaction.string.regex):

[short name] [comma-separated paths]

Paths

getPaths(
  map: CaseInsensitiveStringMap): Seq[String]

getPaths concatenates the values of the paths and path keys (from the given map).