Skip to content

PartitionReaderFactory

PartitionReaderFactory is an abstraction of partition reader factories that can create partition or columnar partition readers.

Contract

Creating Columnar PartitionReader

PartitionReader<ColumnarBatch> createColumnarReader(
    InputPartition partition)

Creates a PartitionReader for a columnar scan (to read data) from the given InputPartition

By default, createColumnarReader throws an UnsupportedOperationException:

Cannot create columnar reader.

Used when:

Creating PartitionReader

PartitionReader<InternalRow> createReader(
    InputPartition partition)

Creates a PartitionReader for a row-based scan (to read data) from the given InputPartition

Used when:

  • DataSourceRDD is requested to compute a partition
  • ContinuousDataSourceRDD (Spark Structured Streaming) is requested to compute a partition

supportColumnarReads

boolean supportColumnarReads(
    InputPartition partition)

Controls whether columnar scan can be used (and hence createColumnarReader) or not

By default, supportColumnarReads indicates no support for columnar scans (and returns false).

Used when:

Implementations

  • ContinuousPartitionReaderFactory
  • FilePartitionReaderFactory
  • KafkaBatchReaderFactory
  • MemoryStreamReaderFactory
  • RateStreamMicroBatchReaderFactory