DataSourceV2Relation Leaf Logical Operator¶
DataSourceV2Relation
is a leaf logical operator that represents a scan over tables with support for BATCH_READ (at the very least).
DataSourceV2Relation
is a NamedRelation.
DataSourceV2Relation
is an ExposesMetadataColumns.
Creating Instance¶
DataSourceV2Relation
takes the following to be created:
- Table
- Output
AttributeReference
s - CatalogPlugin
- (optional)
Identifier
- Case-Insensitive Options
DataSourceV2Relation
is created (indirectly) using create utility and withMetadataColumns.
CatalogPlugin¶
DataSourceV2Relation
can be given a CatalogPlugin when created.
The CatalogPlugin
can be as follows:
- Current Catalog for a single-part table reference
- v2SessionCatalog for global temp views
- Custom Catalog by name
Creating DataSourceV2Relation¶
create(
table: Table,
catalog: Option[CatalogPlugin],
identifier: Option[Identifier]): DataSourceV2Relation
create(
table: Table,
catalog: Option[CatalogPlugin],
identifier: Option[Identifier],
options: CaseInsensitiveStringMap): DataSourceV2Relation
create
replaces CharType
and VarcharType
types in the schema of the given Table with "annotated" StringType
(as the query engine doesn't support char/varchar).
In the end, create
uses the new schema to create a DataSourceV2Relation.
create
is used when:
CatalogV2Util
utility is used to loadRelationDataFrameWriter
is requested to insertInto, saveAsTable and saveInternalDataSourceV2Strategy
execution planning strategy is requested to invalidateCacheRenameTableExec
physical command is executedResolveTables
logical resolution rule is executed- ResolveRelations logical resolution rule is executed (and requested to lookupRelation)
DataFrameReader
is requested to load data
MultiInstanceRelation¶
DataSourceV2Relation
is a MultiInstanceRelation.
Metadata Columns¶
LogicalPlan
metadataOutput: Seq[AttributeReference]
metadataOutput
is part of the LogicalPlan abstraction.
metadataOutput
requests the Table for the metadata columns (if it is a SupportsMetadataColumns).
metadataOutput
filters out metadata columns with the same name as regular output columns.
Creating DataSourceV2Relation with Metadata Columns¶
withMetadataColumns(): DataSourceV2Relation
withMetadataColumns
creates a DataSourceV2Relation with the extra metadataOutput (for the output attributes) if defined.
withMetadataColumns
is used when:
- AddMetadataColumns logical resolution rule is executed
Required Table Capabilities¶
TableCapabilityCheck is used to assert the following regarding DataSourceV2Relation
and the Table:
- Table supports BATCH_READ
- Table supports BATCH_WRITE or V1_BATCH_WRITE for AppendData (append in batch mode)
- Table supports BATCH_WRITE with OVERWRITE_DYNAMIC for OverwritePartitionsDynamic (dynamic overwrite in batch mode)
- Table supports BATCH_WRITE, V1_BATCH_WRITE or OVERWRITE_BY_FILTER possibly with TRUNCATE for OverwriteByExpression (truncate in batch mode and overwrite by filter in batch mode)
Name¶
name
requests the Table for the name
Simple Node Description¶
simpleString
gives the following (with the output and the name):
RelationV2[output] [name]
skipSchemaResolution¶
NamedRelation
skipSchemaResolution: Boolean
skipSchemaResolution
is part of the NamedRelation abstraction.
skipSchemaResolution
is enabled (true
) when the Table supports ACCEPT_ANY_SCHEMA capability.