Configuring automated alerts

This is one of the most important tasks these days, as it is very complex to troubleshoot distributed microservices. SRE is trying to adapt new toolsets and technologies to collect and send automated alerts.

There are multiple tools under the monitoring domain that collect data and send it to automated alerting systems. These include the following:

  • SignalFX
  • Runscope
  • AppDynamics
  • Check_MK/Nagios
  • Webmetric
  • AlerSite 

PagerDuty, OpsGenie, and automated emails through CI/CD tools all are widely used to send automated alerts to SREs whenever the preceding monitoring systems reach or cross a configured threshold.

Let's think about some practical examples that we use on a day-to-day basis:

  • The AWS RDS CPU utilization surpassed the 80% threshold. You can use AWS SNS to send an automated alert to email IDs. This email ID can be a PagerDuty configured service that will send further alerts to a user on their phone, email, or via SMS.
  • Check_MK can be configured to alert a user about expired HTTPs SSL certificates.
  • SignalFX can be configured to alert a user when the IIS connection queue is long.
  • External monitoring tools such as WebMetric, Runscope, and AlertSite can be configured to send alerts about public-facing websites that are not reachable.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset