Skip to content

SQLHadoopMapReduceCommitProtocol

SQLHadoopMapReduceCommitProtocol is a HadoopMapReduceCommitProtocol (Spark Core) that uses spark.sql.sources.outputCommitterClass configuration property for the actual Hadoop OutputCommitter.

spark.sql.sources.commitProtocolClass

SQLHadoopMapReduceCommitProtocol is the default value of spark.sql.sources.commitProtocolClass configuration property.

Creating Instance

SQLHadoopMapReduceCommitProtocol takes the following to be created:

  • Job ID
  • Path
  • dynamicPartitionOverwrite flag (default: false)

Setting Up OutputCommitter

setupCommitter(
  context: TaskAttemptContext): OutputCommitter

setupCommitter allows specifying a custom user-defined Hadoop OutputCommitter based on spark.sql.sources.outputCommitterClass configuration property (in the Hadoop Configuration of the given Hadoop TaskAttemptContext).


setupCommitter takes the default parent OutputCommitter (for the given Hadoop TaskAttemptContext).

If, for some reason, spark.sql.sources.outputCommitterClass configuration property is defined, setupCommitter uses it to create an OutputCommitter. setupCommitter prints out the following INFO message to the logs:

Using user defined output committer class [className]

In the end, setupCommitter prints out the following INFO message to the logs:

Using output committer class [className]

setupCommitter is part of the HadoopMapReduceCommitProtocol (Spark Core) abstraction.

Logging

Enable ALL logging level for org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol=ALL

Refer to Logging.