Skip to content

ParquetDataSourceV2

ParquetDataSourceV2 is the FileDataSourceV2 (and hence indirectly a DataSourceRegister) of Parquet Data Source.

ParquetDataSourceV2 uses ParquetTable for scanning and writing.

ParquetDataSourceV2 is registered in META-INF/services/org.apache.spark.sql.sources.DataSourceRegister.

Creating Instance

ParquetDataSourceV2 takes no arguments to be created.

ParquetDataSourceV2 is created when:

Creating Table

Signature
getTable(
  options: CaseInsensitiveStringMap): Table
getTable(
  options: CaseInsensitiveStringMap,
  schema: StructType): Table

getTable is part of the FileDataSourceV2 abstraction.

getTable creates a ParquetTable with the following:

Property Value
name Table name from the paths (and based on the given options)
paths Paths (in the given options)
userSpecifiedSchema The given schema, if given
fallbackFileFormat ParquetFileFormat

shortName

Signature
shortName(): String

shortName is part of the DataSourceRegister abstraction.

shortName is the following text:

parquet

fallbackFileFormat

Signature
fallbackFileFormat: Class[_ <: FileFormat]

fallbackFileFormat is part of the FileDataSourceV2 abstraction.

fallbackFileFormat is ParquetFileFormat.