UserDefinedFunction¶
UserDefinedFunction is a Python class in pyspark.sql.udf module.
from pyspark.sql.udf import UserDefinedFunction
Creating Instance¶
UserDefinedFunction takes the following to be created:
- Function (
Callable) - Return Type (default:
StringType) - Name (default:
None) - Eval Type (default: SQL_BATCHED_UDF)
-
deterministicflag (default:True)
UserDefinedFunction is created when:
- _create_udf (from
pyspark.sql.udfmodule) is executed
_judf_placeholder¶
UserDefinedFunction initializes _judf_placeholder to be None when created.
_judf_placeholder is _create_judf of the func when UserDefinedFunction is requested to _judf.
_judf_placeholder is available as _judf.
_judf_placeholder can be reset (None) when UserDefinedFunction is requested to asNondeterministic.
__call__¶
__call__(
self,
*cols: "ColumnOrName") -> Column
Emulating callable objects
Instances of arbitrary classes can be made callable by defining a __call__() method in their class.
__call__ is called when an instance is "called" as a function.
Learn more in 3.3.6. Emulating callable objects.
With profiler_collector enabled, __call__...FIXME
Otherwise, __call__ assigns the _judf as the judf and creates a PythonUDF.
In the end, __call__ creates a Column with the PythonUDF.
_judf¶
@property
_judf(
self) -> JavaObject
_judf _create_judf for the func unless the _judf_placeholder has already been initialized.
In the end, _judf returns the _judf_placeholder.
_judf is used when:
Creating Java UserDefinedPythonFunction¶
_create_judf(
self,
func: Callable[..., Any]) -> JavaObject
_create_judf uses the _jvm bridge to create a UserDefinedPythonFunction with the following:
- _name
- SimplePythonFunction (with a pickled version) of the given
funcand the returnType - The returnType (parsed from JSON format to Java)
- evalType
- deterministic
_create_judf is used when: