Data Source Filter Predicate¶

Filter is the <> for <> that can be pushed down to a relation (aka data source).

Filter is used when:

(Data Source API V1) BaseRelation is requested for unhandled filter predicates (and hence BaseRelation implementations, i.e. JDBCRelation)
(Data Source API V1) PrunedFilteredScan is requested for build a scan (and hence PrunedFilteredScan implementations, i.e. JDBCRelation)
FileFormat is requested to buildReader (and hence FileFormat implementations, i.e. OrcFileFormat, CSVFileFormat, JsonFileFormat, TextFileFormat and Spark MLlib's LibSVMFileFormat)
FileFormat is requested to build a Data Reader with partition column values appended (and hence FileFormat implementations, i.e. OrcFileFormat, ParquetFileFormat)
RowDataSourceScanExec is RowDataSourceScanExec.md#creating-instance[created] (for a DataSourceScanExec.md#simpleString[simple text representation (in a query plan tree)])
DataSourceStrategy execution planning strategy is requested to pruneFilterProject (when executed for LogicalRelation.md[LogicalRelation] logical operators with a PrunedFilteredScan or a PrunedScan)
DataSourceStrategy execution planning strategy is requested to selectFilters
JDBCRDD is created and requested to scanTable

[[contract]] [source, scala]

package org.apache.spark.sql.sources

abstract class Filter { // only required methods that have no implementation // the others follow def references: Array[String] }

.Filter Contract [cols="1,2",options="header",width="100%"] |=== | Method | Description

| references a| [[references]] Column references, i.e. list of column names that are referenced by a filter

Used when:

=== [[findReferences]] Finding Column References in Any Value -- findReferences Method

[source, scala]¶

findReferences takes the <> from the value filter is it is one or returns an empty array.