Skip to content

SQLHadoopMapReduceCommitProtocol

SQLHadoopMapReduceCommitProtocol is a HadoopMapReduceCommitProtocol (Spark Core) that allows for a custom user-defined Hadoop OutputCommitter based on spark.sql.sources.outputCommitterClass configuration property.

SQLHadoopMapReduceCommitProtocol is the default value of spark.sql.sources.commitProtocolClass configuration property.

SQLHadoopMapReduceCommitProtocol is Serializable.

Creating Instance

SQLHadoopMapReduceCommitProtocol takes the following to be created:

dynamicPartitionOverwrite

SQLHadoopMapReduceCommitProtocol can be given dynamicPartitionOverwrite flag when created. Unless given, dynamicPartitionOverwrite is disabled (false).

Setting Up OutputCommitter

HadoopMapReduceCommitProtocol
setupCommitter(
  context: TaskAttemptContext): OutputCommitter

setupCommitter is part of the HadoopMapReduceCommitProtocol (Spark Core) abstraction.

setupCommitter allows specifying a custom user-defined Hadoop OutputCommitter based on spark.sql.sources.outputCommitterClass configuration property (in the Hadoop Configuration of the given Hadoop TaskAttemptContext).


setupCommitter takes the default parent OutputCommitter (for the given Hadoop TaskAttemptContext) unless spark.sql.sources.outputCommitterClass configuration property is defined (that overrides the parent's OutputCommitter).

If spark.sql.sources.outputCommitterClass is defined, setupCommitter prints out the following INFO message to the logs:

Using user defined output committer class [className]

In the end, setupCommitter prints out the following INFO message to the logs (and returns the OutputCommitter):

Using output committer class [className]

Logging

Enable ALL logging level for org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol logger to see what happens inside.

Add the following line to conf/log4j2.properties:

logger.SQLHadoopMapReduceCommitProtocol.name = org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
logger.SQLHadoopMapReduceCommitProtocol.level = all

Refer to Logging.