In this article, we have gathered a number of ideas and suggestions about setting up good alerts when monitoring IT systems and the integrations between them. In our view, a good alert must have a clear purpose, it must be timely and it should get the attention of the right people.
We will explore these qualities of alerts through the following three questions, and reflect them with the capabilities of Aino.io.
1. We have integration monitoring, but do we need alerts?
Every alert created for integration monitoring should have a purpose. Otherwise they just cause noise and reduce the effectiveness of other alerts. Most importantly, the recipient of the alert should know exactly why they receive it and how they should react. If a clear purpose cannot be identified, it is better to disable such an alert.
2. Which process conditions require alerts?
Alerts should be set up to catch priority issues and unexpected situations. Usually, every single error or failed transaction does not require attention from anyone.
You should find an acceptable level of process errors for each monitored step of the process and alert only when that threshold is exceeded. You could also want to trigger an alert when volumes of successful transactions drop below or rise above the normal level.
For each monitored process, you should consider the following two questions:
1. Does the situation require human intervention?
If not, there is no point having an alert for it.
2. What is the potential impact of a failure in the monitored process?
If it is not a priority issue, alerts might not be needed or they could be set up to trigger at longer intervals.
3. How frequently the alert conditions should be checked?
The frequency of the alert check determines how often the alert may be triggered if the trigger conditions are met. This is one of the most important settings to be configured for alerts.
An alert that is set up with good intentions may turn into annoying and easily disregarded spam, if it triggers too often. On the other hand, if the check is done too seldom, reacting to emerging problems may be slowed down too much.
A balance between fast reaction time and less spam has to be found. Is it truly necessary to get the alert every 30 minutes or once an hour? Or is it enough to check the process four times a day with potentially a reaction time of six hours?
Tip: In Aino.io you can choose when email notifications are sent from an alert. You may choose to receive an email once when an alarm starts to trigger and once when the triggering stops. Optionally, you may want to get notified every time the alert is triggered.