HDFSBackedStateStoreProvider¶
HDFSBackedStateStoreProvider is a StateStoreProvider that uses a Hadoop DFS-compatible file system for versioned state checkpointing.
HDFSBackedStateStoreProvider is the default StateStoreProvider per the spark.sql.streaming.stateStore.providerClass internal configuration property.
Creating Instance¶
HDFSBackedStateStoreProvider takes no arguments to be created.
HDFSBackedStateStoreProvider is created when:
StateStoreProvideris requested to createAndInit
getMetricsForProvider¶
getMetricsForProvider(): Map[String, Long]
getMetricsForProvider returns the following performance metrics:
getMetricsForProvider is used when:
HDFSBackedStateStoreis requested for performance metrics.
Loading Specified Version of State (Store) For Update¶
getStore(
version: Long): StateStore
getStore is part of the StateStoreProvider abstraction.
getStore getLoadedMapForStore for the given version.
getStore prints out the following INFO message to the logs:
Retrieved version [version] of [this] for update
In the end, getStore creates a new HDFSBackedStateStore for the given version and the new state.
Supported Custom Metrics¶
supportedCustomMetrics: Seq[StateStoreCustomMetric]
supportedCustomMetrics is part of the StateStoreProvider abstraction.
loadedMapCacheHitCount¶
count of cache hit on states cache in provider
loadedMapCacheMissCount¶
count of cache miss on states cache in provider
stateOnCurrentVersionSizeBytes¶
estimated size of state only on current version
Logging¶
Enable ALL logging level for org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider logger to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider=ALL
Refer to Logging.