SQLHadoopMapReduceCommitProtocol¶
SQLHadoopMapReduceCommitProtocol
is a HadoopMapReduceCommitProtocol
(Spark Core) that allows for a custom user-defined Hadoop OutputCommitter based on spark.sql.sources.outputCommitterClass configuration property.
SQLHadoopMapReduceCommitProtocol
is the default value of spark.sql.sources.commitProtocolClass configuration property.
SQLHadoopMapReduceCommitProtocol
is Serializable
.
Creating Instance¶
SQLHadoopMapReduceCommitProtocol
takes the following to be created:
- Job ID
- Path
- dynamicPartitionOverwrite
dynamicPartitionOverwrite¶
SQLHadoopMapReduceCommitProtocol
can be given dynamicPartitionOverwrite
flag when created. Unless given, dynamicPartitionOverwrite
is disabled (false
).
Setting Up OutputCommitter¶
HadoopMapReduceCommitProtocol
setupCommitter(
context: TaskAttemptContext): OutputCommitter
setupCommitter
is part of the HadoopMapReduceCommitProtocol
(Spark Core) abstraction.
setupCommitter
allows specifying a custom user-defined Hadoop OutputCommitter based on spark.sql.sources.outputCommitterClass configuration property (in the Hadoop Configuration of the given Hadoop TaskAttemptContext).
setupCommitter
takes the default parent OutputCommitter
(for the given Hadoop TaskAttemptContext) unless spark.sql.sources.outputCommitterClass configuration property is defined (that overrides the parent's OutputCommitter
).
If spark.sql.sources.outputCommitterClass is defined, setupCommitter
prints out the following INFO message to the logs:
Using user defined output committer class [className]
In the end, setupCommitter
prints out the following INFO message to the logs (and returns the OutputCommitter
):
Using output committer class [className]
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
logger.SQLHadoopMapReduceCommitProtocol.name = org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
logger.SQLHadoopMapReduceCommitProtocol.level = all
Refer to Logging.