Skip to content

UserDefinedFunction

UserDefinedFunction is a Python class in pyspark.sql.udf module.

from pyspark.sql.udf import UserDefinedFunction

Creating Instance

UserDefinedFunction takes the following to be created:

  • Function (Callable)
  • Return Type (default: StringType)
  • Name (default: None)
  • Eval Type (default: SQL_BATCHED_UDF)
  • deterministic flag (default: True)

UserDefinedFunction is created when:

  • _create_udf (from pyspark.sql.udf module) is executed

_judf_placeholder

UserDefinedFunction initializes _judf_placeholder to be None when created.

_judf_placeholder is _create_judf of the func when UserDefinedFunction is requested to _judf.

_judf_placeholder is available as _judf.

_judf_placeholder can be reset (None) when UserDefinedFunction is requested to asNondeterministic.

__call__

__call__(
  self,
  *cols: "ColumnOrName") -> Column
Emulating callable objects

Instances of arbitrary classes can be made callable by defining a __call__() method in their class.

__call__ is called when an instance is "called" as a function.

Learn more in 3.3.6. Emulating callable objects.

With profiler_collector enabled, __call__...FIXME

Otherwise, __call__ assigns the _judf as the judf and creates a PythonUDF.

In the end, __call__ creates a Column with the PythonUDF.

_judf

@property
_judf(
  self) -> JavaObject

_judf _create_judf for the func unless the _judf_placeholder has already been initialized.

In the end, _judf returns the _judf_placeholder.


_judf is used when:

  • UserDefinedFunction is requested to __call__
  • UDFRegistration is requested to register

Creating Java UserDefinedPythonFunction

_create_judf(
  self,
  func: Callable[..., Any]) -> JavaObject

_create_judf uses the _jvm bridge to create a UserDefinedPythonFunction with the following:


_create_judf is used when: