Data Source Filter Predicate¶
Filter
is the <
Filter
is used when:
-
(Data Source API V1)
BaseRelation
is requested for unhandled filter predicates (and henceBaseRelation
implementations, i.e. JDBCRelation) -
(Data Source API V1)
PrunedFilteredScan
is requested for build a scan (and hencePrunedFilteredScan
implementations, i.e. JDBCRelation) -
FileFormat
is requested to buildReader (and henceFileFormat
implementations, i.e.OrcFileFormat
,CSVFileFormat
,JsonFileFormat
,TextFileFormat
and Spark MLlib'sLibSVMFileFormat
) -
FileFormat
is requested to build a Data Reader with partition column values appended (and henceFileFormat
implementations, i.e.OrcFileFormat
, ParquetFileFormat) -
RowDataSourceScanExec
is RowDataSourceScanExec.md#creating-instance[created] (for a DataSourceScanExec.md#simpleString[simple text representation (in a query plan tree)]) -
DataSourceStrategy
execution planning strategy is requested to pruneFilterProject (when executed for LogicalRelation.md[LogicalRelation] logical operators with a PrunedFilteredScan or a PrunedScan) -
DataSourceStrategy
execution planning strategy is requested to selectFilters
[[contract]] [source, scala]
package org.apache.spark.sql.sources
abstract class Filter { // only required methods that have no implementation // the others follow def references: Array[String] }
.Filter Contract [cols="1,2",options="header",width="100%"] |=== | Method | Description
| references
a| [[references]] Column references, i.e. list of column names that are referenced by a filter
Used when:
-
Filter
is requested to <> -
<
>, < > and < > filters are requested for the < > |===
=== [[findReferences]] Finding Column References in Any Value -- findReferences
Method
[source, scala]¶
findReferences(value: Any): Array[String]¶
findReferences
takes the <value
filter is it is one or returns an empty array.