Skip to content

DataSourceV2Relation Leaf Logical Operator

DataSourceV2Relation is a leaf logical operator that represents a scan over tables with support for BATCH_READ (at the very least).

DataSourceV2Relation is a NamedRelation.

DataSourceV2Relation is an ExposesMetadataColumns.

Creating Instance

DataSourceV2Relation takes the following to be created:

  • Table
  • Output AttributeReferences
  • CatalogPlugin
  • (optional) Identifier
  • Case-Insensitive Options

DataSourceV2Relation is created (indirectly) using create utility and withMetadataColumns.

CatalogPlugin

DataSourceV2Relation can be given a CatalogPlugin when created.

The CatalogPlugin can be as follows:

Creating DataSourceV2Relation

create(
  table: Table,
  catalog: Option[CatalogPlugin],
  identifier: Option[Identifier]): DataSourceV2Relation
create(
  table: Table,
  catalog: Option[CatalogPlugin],
  identifier: Option[Identifier],
  options: CaseInsensitiveStringMap): DataSourceV2Relation

create replaces CharType and VarcharType types in the schema of the given Table with "annotated" StringType (as the query engine doesn't support char/varchar).

In the end, create uses the new schema to create a DataSourceV2Relation.


create is used when:

MultiInstanceRelation

DataSourceV2Relation is a MultiInstanceRelation.

Metadata Columns

LogicalPlan
metadataOutput: Seq[AttributeReference]

metadataOutput is part of the LogicalPlan abstraction.

metadataOutput requests the Table for the metadata columns (if it is a SupportsMetadataColumns).

metadataOutput filters out metadata columns with the same name as regular output columns.

Creating DataSourceV2Relation with Metadata Columns

withMetadataColumns(): DataSourceV2Relation

withMetadataColumns creates a DataSourceV2Relation with the extra metadataOutput (for the output attributes) if defined.

withMetadataColumns is used when:

Required Table Capabilities

TableCapabilityCheck is used to assert the following regarding DataSourceV2Relation and the Table:

  1. Table supports BATCH_READ
  2. Table supports BATCH_WRITE or V1_BATCH_WRITE for AppendData (append in batch mode)
  3. Table supports BATCH_WRITE with OVERWRITE_DYNAMIC for OverwritePartitionsDynamic (dynamic overwrite in batch mode)
  4. Table supports BATCH_WRITE, V1_BATCH_WRITE or OVERWRITE_BY_FILTER possibly with TRUNCATE for OverwriteByExpression (truncate in batch mode and overwrite by filter in batch mode)

Name

NamedRelation
name: String

name is part of the NamedRelation abstraction.

name requests the Table for the name

Simple Node Description

TreeNode
simpleString(
  maxFields: Int): String

simpleString is part of the TreeNode abstraction.

simpleString gives the following (with the output and the name):

RelationV2[output] [name]

skipSchemaResolution

NamedRelation
skipSchemaResolution: Boolean

skipSchemaResolution is part of the NamedRelation abstraction.

skipSchemaResolution is enabled (true) when the Table supports ACCEPT_ANY_SCHEMA capability.