Skip to content

BaseRelation — Collection of Tuples with Schema

BaseRelation is an abstraction of relations that are collections of tuples (rows) with a known schema.

BaseRelation represents an external data source with data to load datasets from or write to.

BaseRelation is "created" when DataSource is requested to resolve a relation.

BaseRelation is then transformed into a DataFrame when SparkSession is requested to create a DataFrame.

Note

"Relation" and "table" used to be synonyms, but Connector API in Spark 3 changed it with Table abstraction.

Contract

SQLContext

sqlContext: SQLContext

SQLContext

Schema

schema: StructType

Schema of the tuples of the relation

Size

sizeInBytes: Long

Estimated size of the relation (in bytes)

Default: spark.sql.defaultSizeInBytes configuration property

sizeInBytes is used when LogicalRelation is requested for statistics (and they are not available in a catalog).

Needs Conversion

needConversion: Boolean

Controls type conversion (whether or not JVM objects inside Rows needs to be converted to Catalyst types, e.g. java.lang.String to UTF8String)

Default: true

Note

It is recommended to leave needConversion enabled (as is) for custom data sources (outside Spark SQL).

Used when DataSourceStrategy execution planning strategy is executed (and does the RDD conversion from RDD[Row] to RDD[InternalRow]).

Unhandled Filters

unhandledFilters(
  filters: Array[Filter]): Array[Filter]

Filter predicates that the relation does not support (handle) natively

Default: the input filters (as it is considered safe to double evaluate filters regardless whether they could be supported or not)

Used when DataSourceStrategy execution planning strategy is executed (and selectFilters).

Implementations