ParquetTable¶
ParquetTable is a FileTable of ParquetDataSourceV2 in Parquet Data Source.
ParquetTable uses ParquetScanBuilder for scanning and ParquetWrite for writing.
Creating Instance¶
ParquetTable takes the following to be created:
- Name
- SparkSession
- Case-insensitive options
- Paths
- User-specified schema
- Fallback FileFormat
ParquetTable is created when:
ParquetDataSourceV2is requested for a Table
Format Name¶
formatName is the following text:
Parquet
Schema Inference¶
Signature
inferSchema(
files: Seq[FileStatus]): Option[StructType]
inferSchema is part of the FileTable abstraction.
inferSchema infers the schema (with the options and the input Hadoop FileStatuses).
Creating ScanBuilder¶
Signature
newScanBuilder(
options: CaseInsensitiveStringMap): ParquetScanBuilder
newScanBuilder is part of the SupportsRead abstraction.
newScanBuilder creates a ParquetScanBuilder with the following:
Creating WriteBuilder¶
Signature
newWriteBuilder(
info: LogicalWriteInfo): WriteBuilder
newWriteBuilder is part of the SupportsWrite abstraction.
newWriteBuilder creates a WriteBuilder that creates a ParquetWrite (when requested to build a Write).
supportsDataType¶
Signature
supportsDataType(
dataType: DataType): Boolean
supportsDataType is part of the FileTable abstraction.
supportsDataType supports all AtomicTypes and the following complex DataTypes with AtomicTypes:
- ArrayType
MapType- StructType
- UserDefinedType