DataFiltersBuilder¶
DataFiltersBuilder builds data filters for Data Skipping.
DataFiltersBuilder used when DataSkippingReaderBase is requested for the filesForScan with data filters and spark.databricks.delta.stats.skipping enabled.
Creating Instance¶
DataFiltersBuilder takes the following to be created:
-
SparkSession(Spark SQL) - DeltaDataSkippingType
DataFiltersBuilder is created when:
DataSkippingReaderBaseis requested to filesForScan (with data filters and spark.databricks.delta.stats.skipping enabled)
StatsProvider¶
DataFiltersBuilder creates a StatsProvider (for the getStatsColumnOpt) when created.
Creating DataSkippingPredicate¶
apply(
dataFilter: Expression): Option[DataSkippingPredicate]
apply constructDataFilters for the given dataFilter expression.
apply is used when:
DataSkippingReaderBaseis requested to filesForScan (with data filters and spark.databricks.delta.stats.skipping enabled)
constructDataFilters¶
constructDataFilters(
dataFilter: Expression): Option[DataSkippingPredicate]
constructDataFilters creates a DataSkippingPredicate for expression types that can be used for data skipping.
constructDataFilters...FIXME
For IsNull with a skipping-eligible column, constructDataFilters requests the StatsProvider for the getPredicateWithStatType for nullCount to build a Catalyst expression to match files with null count larger than zero.
nullCount > Literal(0)
For IsNotNull with a skipping-eligible column, constructDataFilters creates StatsColumns for the following:
constructDataFilters requests the StatsProvider for the getPredicateWithStatsColumns for the two StatsColumns to build a Catalyst expression to match files with null count less than the row count.
nullCount < numRecords
constructDataFilters...FIXME
constructLiteralInListDataFilters¶
constructLiteralInListDataFilters(
a: Expression,
possiblyNullValues: Seq[Any]): Option[DataSkippingPredicate]
constructLiteralInListDataFilters...FIXME