worker.py is a Python module in pyspark package.
from pyspark import worker
Top-Level Code Environment
If the module is executed in the top-level code environment (and not initialized from an import statement), its
__name__ is set to the string
Sometimes "top-level code" is called an entry point to the application.
Learn more in the __main__ — Top-level code environment.
When executed in the top-level code environment (e.g.,
worker.py reads the following environment variables:
| ||Port the JVM listens to|
| ||Authorization Secret|
worker.py local_connect_and_auth (that gives a
worker.py write_int with the PID of the Python process to the
In the end,
worker.py main (with the
sock_file for the input and output files).
main( infile, outfile)
PYTHON_FAULTHANDLER_DIR environment variable.
main does a lot of initializations.
FIXME Review the initializations
main read_udfs that gives the following:
load_stream from the given
infile and executes
func (with the
split_index and the deserialized stream).
main does a lot of post-processings.
FIXME Review the post-processings
read_udfs( pickleSer, infile, eval_type)
read_single_udf( pickleSer, infile, eval_type, runner_conf, udf_index)