HDFSBackedStateStoreProvider¶
HDFSBackedStateStoreProvider
is a StateStoreProvider that uses a Hadoop DFS-compatible file system for versioned state checkpointing.
HDFSBackedStateStoreProvider
is the default StateStoreProvider
per the spark.sql.streaming.stateStore.providerClass internal configuration property.
Creating Instance¶
HDFSBackedStateStoreProvider
takes no arguments to be created.
HDFSBackedStateStoreProvider
is created when:
StateStoreProvider
is requested to createAndInit
getMetricsForProvider¶
getMetricsForProvider(): Map[String, Long]
getMetricsForProvider
returns the following performance metrics:
getMetricsForProvider
is used when:
HDFSBackedStateStore
is requested for performance metrics.
Loading Specified Version of State (Store) For Update¶
getStore(
version: Long): StateStore
getStore
is part of the StateStoreProvider abstraction.
getStore
getLoadedMapForStore for the given version
.
getStore
prints out the following INFO message to the logs:
Retrieved version [version] of [this] for update
In the end, getStore
creates a new HDFSBackedStateStore for the given version and the new state.
Supported Custom Metrics¶
supportedCustomMetrics: Seq[StateStoreCustomMetric]
supportedCustomMetrics
is part of the StateStoreProvider abstraction.
loadedMapCacheHitCount¶
count of cache hit on states cache in provider
loadedMapCacheMissCount¶
count of cache miss on states cache in provider
stateOnCurrentVersionSizeBytes¶
estimated size of state only on current version
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider=ALL
Refer to Logging.