Skip to content

DataFiltersBuilder

DataFiltersBuilder builds data filters for Data Skipping.

DataFiltersBuilder used when DataSkippingReaderBase is requested for the filesForScan with data filters and spark.databricks.delta.stats.skipping enabled.

Creating Instance

DataFiltersBuilder takes the following to be created:

DataFiltersBuilder is created when:

StatsProvider

DataFiltersBuilder creates a StatsProvider (for the getStatsColumnOpt) when created.

Creating DataSkippingPredicate

apply(
  dataFilter: Expression): Option[DataSkippingPredicate]

apply constructDataFilters for the given dataFilter expression.


apply is used when:

constructDataFilters

constructDataFilters(
  dataFilter: Expression): Option[DataSkippingPredicate]

constructDataFilters creates a DataSkippingPredicate for expression types that can be used for data skipping.


constructDataFilters...FIXME

For IsNull with a skipping-eligible column, constructDataFilters requests the StatsProvider for the getPredicateWithStatType for nullCount to build a Catalyst expression to match files with null count larger than zero.

nullCount > Literal(0)

For IsNotNull with a skipping-eligible column, constructDataFilters creates StatsColumns for the following:

constructDataFilters requests the StatsProvider for the getPredicateWithStatsColumns for the two StatsColumns to build a Catalyst expression to match files with null count less than the row count.

nullCount < numRecords

constructDataFilters...FIXME

constructLiteralInListDataFilters

constructLiteralInListDataFilters(
  a: Expression,
  possiblyNullValues: Seq[Any]): Option[DataSkippingPredicate]

constructLiteralInListDataFilters...FIXME