Monitoring MapReduce

When it comes to monitoring the MapReduce status in the current Hadoop implementation, all required metrics can be obtained on the JobTracker level. There is no reason to monitor individual TaskTrackers, at least on an alert level. A periodic report on the number of alive and dead TaskTrackers should be sent out to monitor the overall framework health.

JobTracker checks

The following is the list of host-level resources to monitor on a JobTracker:

  • Check if the server is reachable using ping. Type: critical
  • Check disk space on logs and system volumes. JobTracker doesn't preserve state on a local filesystem, but not being able to write to the log files due to low disk space will cause issues. Type: critical
  • Check swap usage on the server. Type: critical

The following checks are specific to JobTracker process::

  • Monitor memory usage. You can monitor JobTracker memory usage by checking HeapMemoryUsage.used and HeapMemoryUsage.max variables. Type: critical
  • Checking the SummaryJson.nodes and SummaryJson.alive status variables will give you an idea of what portion of TaskTrackers is available at any given moment in time. There is no strict threshold for this metric. Your jobs will run even if there is only one TaskTracker available, but performance will, obviously, deteriorate significantly. Choose a threshold based on your cluster size, and adjust it over time according to what the failure trend is. Type: critical

JobTracker can blacklist some of the worker nodes if they constantly report slow performance or fail too often. You should monitor the total number of blacklisted TaskTrackers by looking at the SummaryJson.blacklisted metric. Type: critical

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset