worker.py¶
worker.py
is a Python module in pyspark package.
from pyspark import worker
Entry Point¶
Top-Level Code Environment
If the module is executed in the top-level code environment (and not initialized from an import statement), its __name__
is set to the string __main__
.
Sometimes "top-level code" is called an entry point to the application.
Learn more in the __main__ — Top-level code environment.
When executed in the top-level code environment (e.g., python3 -m
), worker.py
reads the following environment variables:
Environment Variable | Description |
---|---|
PYTHON_WORKER_FACTORY_PORT | Port the JVM listens to |
PYTHON_WORKER_FACTORY_SECRET | Authorization Secret |
worker.py
local_connect_and_auth (that gives a sock_file
).
worker.py
write_int with the PID of the Python process to the sock_file
.
In the end, worker.py
main (with the sock_file
and sock_file
for the input and output files).
main¶
main(
infile,
outfile)
main
reads PYTHON_FAULTHANDLER_DIR
environment variable.
main
does a lot of initializations.
FIXME Review the initializations
main
read_udfs that gives the following:
func
profiler
deserializer
serializer
requests the deserializer
to load_stream
from the given infile
and executes func
(with the split_index
and the deserialized stream).
main
does a lot of post-processings.
FIXME Review the post-processings
read_udfs¶
read_udfs(
pickleSer,
infile,
eval_type)
read_udfs
...FIXME
read_single_udf¶
read_single_udf(
pickleSer,
infile,
eval_type,
runner_conf,
udf_index)
read_single_udf
...FIXME