ParquetScanBuilder¶

ParquetScanBuilder builds ParquetScans.

Creating Instance¶

ParquetScanBuilder takes the following to be created:

ParquetScanBuilder is created when:

Signature

build(): Scan

build is part of the ScanBuilder abstraction.

build creates a ParquetScan with the following:

pushedAggregations: Option[Aggregation]

ParquetScanBuilder defines pushedAggregations registry for an Aggregation.

The pushedAggregations is undefined when ParquetScanBuilder is created and can only be assigned when pushAggregation.

pushedAggregations controls the finalSchema. When undefined, the finalSchema is readDataSchema when building a ParquetScan.

pushedAggregations is used to create a ParquetScan.

Signature

pushAggregation(
  aggregation: Aggregation): Boolean

pushAggregation is part of the SupportsPushDownAggregates abstraction.

pushAggregation does nothing and returns false for spark.sql.parquet.aggregatePushdown disabled.

With the schema determined, pushAggregation registers it as finalSchema and the given Aggregation as pushedAggregations. pushAggregation returns true.

Otherwise, pushAggregation returns false.

Signature

pushDataFilters(
  dataFilters: Array[Filter]): Array[Filter]

pushDataFilters is part of the FileScanBuilder abstraction.

spark.sql.parquet.filterPushdown

pushDataFilters does nothing and returns no Catalyst Filters with spark.sql.parquet.filterPushdown disabled.

pushDataFilters creates a ParquetFilters with the readDataSchema (converted into the corresponding parquet schema) and the following configuration properties:

In the end, pushedParquetFilters requests the ParquetFilters for the convertibleFilters for the given dataFilters.

Signature

supportsNestedSchemaPruning: Boolean

supportsNestedSchemaPruning is part of the FileScanBuilder abstraction.

supportsNestedSchemaPruning is enabled (true).