DataSourceV2Relation Leaf Logical Operator¶
DataSourceV2Relation is a leaf logical operator that represents a scan over tables with support for BATCH_READ (at the very least).
DataSourceV2Relation is an ExposesMetadataColumns and can add extra metadata columns to the output columns.
Creating Instance¶
DataSourceV2Relation takes the following to be created:
- Table
- Output Columns
- CatalogPlugin
-
Identifier - Options
- TimeTravelSpec
DataSourceV2Relation is created (indirectly) using create utility (and withMetadataColumns).
CatalogPlugin¶
DataSourceV2Relation can be given a CatalogPlugin when created.
The CatalogPlugin can be as follows:
- Current Catalog for a single-part table reference
- v2SessionCatalog for global temp views
- Custom Catalog by name
Creating DataSourceV2Relation¶
create(
table: Table,
catalog: Option[CatalogPlugin],
identifier: Option[Identifier]): DataSourceV2Relation
create(
table: Table,
catalog: Option[CatalogPlugin],
identifier: Option[Identifier],
options: CaseInsensitiveStringMap): DataSourceV2Relation
create replaces CharType and VarcharType types in the schema of the given Table with "annotated" StringType (as the query engine doesn't support char/varchar).
In the end, create uses the new schema to create a DataSourceV2Relation.
create is used when:
CatalogV2Utilutility is used to loadRelationDataFrameWriteris requested to insertInto, saveAsTable and saveInternalDataSourceV2Strategyexecution planning strategy is requested to invalidateCacheRenameTableExecphysical command is executedResolveTableslogical resolution rule is executed- ResolveRelations logical resolution rule is executed (and requested to lookupRelation)
DataFrameReaderis requested to load data
Metadata Columns¶
LogicalPlan
metadataOutput: Seq[AttributeReference]
metadataOutput is part of the LogicalPlan abstraction.
metadataOutput checks out whether this Table is a SupportsMetadataColumns. If so, metadataOutput requests this Table for metadata columns.
Otherwise, metadataOutput returns no metadata columns (Nil).
Lazy Value
metadataOutput is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Add Metadata Columns to Output Columns¶
ExposesMetadataColumns
withMetadataColumns(): DataSourceV2Relation
withMetadataColumns is part of the ExposesMetadataColumns abstraction.
withMetadataColumns creates a DataSourceV2Relation with the extra metadata columns added (if there are any) to this output columns.
Required Table Capabilities¶
TableCapabilityCheck is used to assert the following regarding DataSourceV2Relation and the Table:
- Table supports BATCH_READ
- Table supports BATCH_WRITE or V1_BATCH_WRITE for AppendData (append in batch mode)
- Table supports BATCH_WRITE with OVERWRITE_DYNAMIC for OverwritePartitionsDynamic (dynamic overwrite in batch mode)
- Table supports BATCH_WRITE, V1_BATCH_WRITE or OVERWRITE_BY_FILTER possibly with TRUNCATE for OverwriteByExpression (truncate in batch mode and overwrite by filter in batch mode)