SQLHadoopMapReduceCommitProtocol¶
SQLHadoopMapReduceCommitProtocol
is a HadoopMapReduceCommitProtocol
(Spark Core) that uses spark.sql.sources.outputCommitterClass configuration property for the actual Hadoop OutputCommitter.
spark.sql.sources.commitProtocolClass¶
SQLHadoopMapReduceCommitProtocol
is the default value of spark.sql.sources.commitProtocolClass configuration property.
Creating Instance¶
SQLHadoopMapReduceCommitProtocol
takes the following to be created:
- Job ID
- Path
-
dynamicPartitionOverwrite
flag (default:false
)
Setting Up OutputCommitter¶
setupCommitter(
context: TaskAttemptContext): OutputCommitter
setupCommitter
allows specifying a custom user-defined Hadoop OutputCommitter based on spark.sql.sources.outputCommitterClass configuration property (in the Hadoop Configuration of the given Hadoop TaskAttemptContext).
setupCommitter
takes the default parent OutputCommitter
(for the given Hadoop TaskAttemptContext).
If, for some reason, spark.sql.sources.outputCommitterClass configuration property is defined, setupCommitter
uses it to create an OutputCommitter
. setupCommitter
prints out the following INFO message to the logs:
Using user defined output committer class [className]
In the end, setupCommitter
prints out the following INFO message to the logs:
Using output committer class [className]
setupCommitter
is part of the HadoopMapReduceCommitProtocol
(Spark Core) abstraction.
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol=ALL
Refer to Logging.