UserDefinedFunction¶
UserDefinedFunction
is a Python class in pyspark.sql.udf module.
from pyspark.sql.udf import UserDefinedFunction
Creating Instance¶
UserDefinedFunction
takes the following to be created:
- Function (
Callable
) - Return Type (default:
StringType
) - Name (default:
None
) - Eval Type (default: SQL_BATCHED_UDF)
-
deterministic
flag (default:True
)
UserDefinedFunction
is created when:
- _create_udf (from
pyspark.sql.udf
module) is executed
_judf_placeholder¶
UserDefinedFunction
initializes _judf_placeholder
to be None
when created.
_judf_placeholder
is _create_judf of the func when UserDefinedFunction
is requested to _judf.
_judf_placeholder
is available as _judf.
_judf_placeholder
can be reset (None
) when UserDefinedFunction
is requested to asNondeterministic.
__call__¶
__call__(
self,
*cols: "ColumnOrName") -> Column
Emulating callable objects
Instances of arbitrary classes can be made callable by defining a __call__()
method in their class.
__call__
is called when an instance is "called" as a function.
Learn more in 3.3.6. Emulating callable objects.
With profiler_collector
enabled, __call__
...FIXME
Otherwise, __call__
assigns the _judf as the judf and creates a PythonUDF.
In the end, __call__
creates a Column
with the PythonUDF
.
_judf¶
@property
_judf(
self) -> JavaObject
_judf
_create_judf for the func unless the _judf_placeholder has already been initialized.
In the end, _judf
returns the _judf_placeholder.
_judf
is used when:
Creating Java UserDefinedPythonFunction¶
_create_judf(
self,
func: Callable[..., Any]) -> JavaObject
_create_judf
uses the _jvm bridge to create a UserDefinedPythonFunction with the following:
- _name
- SimplePythonFunction (with a pickled version) of the given
func
and the returnType - The returnType (parsed from JSON format to Java)
- evalType
- deterministic
_create_judf
is used when: