RowDataSourceScanExec Leaf Physical Operator¶
RowDataSourceScanExec is a DataSourceScanExec (and so indirectly a leaf physical operator) for scanning data from a BaseRelation.
RowDataSourceScanExec is an InputRDDCodegen.
Performance Metrics¶
| Key | Name (in web UI) | Description |
|---|---|---|
| numOutputRows | number of output rows | Number of output rows |
Creating Instance¶
RowDataSourceScanExec takes the following to be created:
- Output Schema (Attributes)
- Required Schema (StructType)
- Data Source Filter Predicates
- Handled Data Source Filter Predicates
-
RDD[InternalRow] - BaseRelation
- Optional
TableIdentifier
RowDataSourceScanExec is created when:
- DataSourceStrategy execution planning strategy is executed (for LogicalRelation logical operators)
Metadata¶
metadata: Map[String, String]
metadata is part of the DataSourceScanExec abstraction.
metadata marks the filter predicates that are included in the handled filters predicates with * (star).
Note
Filter predicates with * (star) are to denote filters that are pushed down to a relation (aka data source).
In the end, metadata creates the following mapping:
- ReadSchema with the required schema converted to catalog representation
- PushedFilters with the marked and unmarked filter predicates
createUnsafeProjection¶
createUnsafeProjection: Boolean
createUnsafeProjection is true.
createUnsafeProjection is part of the InputRDDCodegen abstraction.
Input RDD¶
inputRDD: RDD[InternalRow]
inputRDD is the RDD.
inputRDD is part of the InputRDDCodegen abstraction.