Skip to content

PythonGatewayServer

PythonGatewayServer is a command-line application (process) that starts a Py4JServer on an ephemeral port.

PythonGatewayServer is the Python runner for pyspark shell script (Spark Core).

main

main creates a Py4JServer and requests it to start.

main requests the Py4JServer for the listening port (boundPort) and prints out the following DEBUG message to the logs:

Started PythonGatewayServer on port [boundPort]

main uses _PYSPARK_DRIVER_CONN_INFO_PATH environment variable for the path of a connection info file (for the associated python process) with the listening port and the secret.

main pauses (blocks) until the Python driver finishes (by reading from the system input that blocks until input data is available, the end of the stream is detected, or an exception is thrown).

In the end, once the Python driver finishes, main prints out the following DEBUG message to the logs:

Exiting due to broken pipe from Python driver

main prints out the following ERROR message to the logs and exists when the listening port is -1:

[server] failed to bind; exiting

_PYSPARK_DRIVER_CONN_INFO_PATH

PythonGatewayServer uses _PYSPARK_DRIVER_CONN_INFO_PATH environment variable for the path of a connection info file for communication between this and the Python processes.

_PYSPARK_DRIVER_CONN_INFO_PATH is configured when java_gateway.py module is requested to launch_gateway.

Logging

Enable ALL logging level for org.apache.spark.api.python.PythonGatewayServer logger to see what happens inside.

Add the following line to conf/log4j2.properties:

logger.PythonGatewayServer.name = org.apache.spark.api.python.PythonGatewayServer
logger.PythonGatewayServer.level = all

Refer to Logging.