ParquetScan¶
ParquetScan
is a FileScan.
Creating Instance¶
ParquetScan
takes the following to be created:
- SparkSession
- Hadoop Configuration
- PartitioningAwareFileIndex
- Data schema
- Read data schema
- Read partition schema
- Pushed Filters
- Case-insensitive options
- Partition filter expressions (optional)
- Data filter expressions (optional)
ParquetScan
is created when:
ParquetScanBuilder
is requested to build a Scan
createReaderFactory¶
createReaderFactory(): PartitionReaderFactory
createReaderFactory
creates a ParquetPartitionReaderFactory (with the Hadoop Configuration broadcast).
createReaderFactory
adds the following properties to the Hadoop Configuration before broadcasting it (to executors).
Name | Value |
---|---|
ParquetInputFormat.READ_SUPPORT_CLASS | ParquetReadSupport |
others |
createReaderFactory
is part of the Batch abstraction.
isSplitable¶
isSplitable(
path: Path): Boolean
isSplitable
is true
.
isSplitable
is part of the FileScan abstraction.