The web UI (also known as Spark UI) is the web interface for running Spark applications to monitor the execution of jobs on a web browser such as Firefox or Google Chrome. When a SparkContext launches, a web UI that displays useful information about the application gets started on port 4040 in standalone mode. The Spark web UI is available in different ways depending on whether the application is still running or has finished its execution.
Also, you can use the web UI after the application has finished its execution by persisting all the events using EventLoggingListener. The EventLoggingListener, however, cannot work alone, and the incorporation of the Spark history server is required. Combining these two features, the following facilities can be achieved:
- A list of scheduler stages and tasks
- A summary of RDD sizes
- Memory usage
- Environmental information
- Information about the running executors
You can access the UI at http://<driver-node>:4040 in a web browser. For example, a Spark job submitted and running as a standalone mode can be accessed at http://localhost:4040.
As long as the job is running, stages can be observed on Spark UI. However, to view the web UI after the job has finished the execution, you could try setting spark.eventLog.enabled as true before submitting your Spark jobs. This forces Spark to log all the events to be displayed in the UI that are already persisted on storage such as local filesystem or HDFS.
In the previous chapter, we saw how to submit a Spark job to a cluster. Let's reuse one of the commands for submitting the k-means clustering, as follows:
# Run application as standalone mode on 8 cores
SPARK_HOME/bin/spark-submit
--class org.apache.spark.examples.KMeansDemo
--master local[8]
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt
If you submit the job using the preceding command, you will not be able to see the status of the jobs that have finished the execution, so to make the changes permanent, use the following two options:
spark.eventLog.enabled=true
spark.eventLog.dir=file:///home/username/log"
By setting the preceding two configuration variables, we asked the Spark driver to make the event logging enabled to be saved at file:///home/username/log.
In summary, with the following changes, your submitting command will be as follows:
# Run application as standalone mode on 8 cores
SPARK_HOME/bin/spark-submit
--conf "spark.eventLog.enabled=true"
--conf "spark.eventLog.dir=file:///tmp/test"
--class org.apache.spark.examples.KMeansDemo
--master local[8]
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt
As shown in the preceding screenshot, Spark web UI provides the following tabs:
- Jobs
- Stages
- Storage
- Environment
- Executors
- SQL
It is to be noted that all the features may not be visible at once as they are lazily created on demand, for example, while running a streaming job.