Tip - Proactive & Smarter Monitoring

Customer

ANZ Bank

Description

This client wanted to proactively and intelligently monitor and resolve failing web servers in a complete automated way, with no team involvement.

  • Proactively monitor 10 different URLs (via synthetic http transaction)

  • Alert when 3 of these URLs fail

  • Automate creation of a ticket in ServiceNow

  • Auto-restart the 3 failing web servers

Solution

  • Proactively monitor 10 different URLs (via synthetic http transaction)

    • Add 10 web servers in Data Sources (Germain workspace > left menu > Data Sources)

      image-20211202-164731.png


    • Add 10 “http scenarios” (Germain workspace > left menu > wizard > Http Scenario Component Deployment)

      image-20211202-171731.png
      image-20211202-171848.png
      image-20211202-171954.png
      image-20211202-172130.png


    • Setup a single KPI to match the synthetic transactions generated by the HTTP monitors. (Germain workspace > Analytics > KPIs; Add New Configuration)

      image-20211202-231354.png


    • Create a fact-based SLA to color individual synthetic transactions based on success / failure state:

      image-20211202-231509.png
      image-20211202-231550.png


  • Alert when 3 of these URLs fail

    • Create an aggregate SLA to trigger when 3 or more servers are not reachable. The schedule of the aggregate SLA should match the one selected for the HTTP monitors (for example, if your monitors connect every 5 minutes, also evaluate the SLA every 5 minutes):

      image-20211202-231653.png
      image-20211202-231842.png
      image-20211202-231924.png


  • Automate creation of a ticket in ServiceNow

    • Select your SNOW HTTP Action as part of the final step to be executed if there are 3 or more failures:

      image-20211202-232106.png


  • Auto-restart the 3 failing web servers

    • Once the SLA has been created, you can set up an action to restart the web servers in case of failure. (Germain workspace > Automation > SSH; Add New Configuration)

      image-20211203-171327.png
      image-20211203-171429.png


image-20211203-171600.png
The specific command will vary based on your specific OS and Webserver.


image-20211203-171737.png
Select the fact-based SLA created earlier as the trigger for your new action.