BaseRelation — Collection of Tuples with Schema¶
BaseRelation is an abstraction of relations that are collections of tuples (rows) with a known schema.
BaseRelation represents an external data source with data to load datasets from or write to.
BaseRelation is "created" when
DataSource is requested to resolve a relation.
BaseRelation is then transformed into a
SparkSession is requested to create a DataFrame.
"Relation" and "table" used to be synonyms, but Connector API in Spark 3 changed it with Table abstraction.
Schema of the tuples of the relation
Estimated size of the relation (in bytes)
Default: spark.sql.defaultSizeInBytes configuration property
sizeInBytes is used when
LogicalRelation is requested for statistics (and they are not available in a catalog).
Controls type conversion (whether or not JVM objects inside Rows needs to be converted to Catalyst types, e.g.
It is recommended to leave
needConversion enabled (as is) for custom data sources (outside Spark SQL).
Used when DataSourceStrategy execution planning strategy is executed (and does the RDD conversion from
unhandledFilters( filters: Array[Filter]): Array[Filter]
Filter predicates that the relation does not support (handle) natively
Default: the input filters (as it is considered safe to double evaluate filters regardless whether they could be supported or not)
Used when DataSourceStrategy execution planning strategy is executed (and selectFilters).
- ConsoleRelation (Spark Structured Streaming)