ParquetScanBuilder¶
ParquetScanBuilder
is a FileScanBuilder that SupportsPushDownFilters.
Creating Instance¶
ParquetScanBuilder
takes the following to be created:
- SparkSession
- PartitioningAwareFileIndex
- Schema
- Data Schema
- Case-Insensitive Options
ParquetScanBuilder
is created when:
ParquetTable
is requested to newScanBuilder
Building Scan¶
build(): Scan
build
creates a ParquetScan (with the readDataSchema, the readPartitionSchema and the pushedParquetFilters).
build
is part of the ScanBuilder abstraction.
Pushed Filters¶
pushedFilters(): Array[Filter]
pushedFilters
is the pushedParquetFilters.
pushedFilters
is part of the SupportsPushDownFilters abstraction.
pushedParquetFilters¶
pushedParquetFilters: Array[Filter]
Lazy Value
pushedParquetFilters
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
pushedParquetFilters
creates a ParquetFilters with the readDataSchema (converted) and the following configuration properties:
- spark.sql.parquet.filterPushdown.date
- spark.sql.parquet.filterPushdown.timestamp
- spark.sql.parquet.filterPushdown.decimal
- spark.sql.parquet.filterPushdown.string.startsWith
- spark.sql.parquet.pushdown.inFilterThreshold
- spark.sql.caseSensitive
pushedParquetFilters
requests the ParquetFilters
for the convertibleFilters.
pushedParquetFilters
is used when:
ParquetScanBuilder
is requested for the pushedFilters and to build
supportsNestedSchemaPruning¶
supportsNestedSchemaPruning: Boolean
supportsNestedSchemaPruning
is true
.
supportsNestedSchemaPruning
is part of the FileScanBuilder abstraction.