Skip to content

UDFRegistration

UDFRegistration is a facade to a session-scoped FunctionRegistry to register user-defined functions (UDFs) and user-defined aggregate functions (UDAFs).

Creating Instance

UDFRegistration takes the following to be created:

UDFRegistration is created when:

Accessing UDFRegistration

UDFRegistration is available as SparkSession.udf.

import org.apache.spark.sql.UDFRegistration
assert(spark.udf.isInstanceOf[UDFRegistration])

SessionState

UDFRegistration is used to create a SessionState.

Registering UserDefinedFunction

register(
  name: String,
  udf: UserDefinedFunction): UserDefinedFunction

register associates the given name with the given UserDefinedFunction.

register requests the FunctionRegistry to createOrReplaceTempFunction under the given name and with scala_udf source name and a function builder based on the type of the UserDefinedFunction:

Registering User-Defined Python Function

registerPython(
  name: String,
  udf: UserDefinedPythonFunction): Unit

registerPython prints out the following DEBUG message to the logs:

Registering new PythonUDF:
name: [name]
command: [command]
envVars: [envVars]
pythonIncludes: [pythonIncludes]
pythonExec: [pythonExec]
dataType: [dataType]
pythonEvalType: [pythonEvalType]
udfDeterministic: [udfDeterministic]

In the end, requests the FunctionRegistry to createOrReplaceTempFunction (under the given name, the builder factory and python_udf source name).


registerPython is used when:

  • UDFRegistration (PySpark) is requested to register

Logging

Enable ALL logging level for org.apache.spark.sql.UDFRegistration logger to see what happens inside.

Add the following line to conf/log4j2.properties:

logger.UDFRegistration.name = org.apache.spark.sql.UDFRegistration
logger.UDFRegistration.level = all

Refer to Logging.