Microsoft Azure DevOps Solutions: Developing an Alert Strategy
“No one wants to wAIt when there are issues that require a sysadmin to get back to you. Site reliability engineering (SRE) is a software engineering approach that tackles IT and operations problems through code which helps when scaling to very large systems.
In this course learn how to gather the information needed to trigger many self-healing types of operations. Next explore how to implement alerts based on metrics logs or health checks. Finally discover how to notify users of degraded systems and implement alerts that can trigger scaling or fAIlover operations.
This course is one of a collection that prepares learners for the Designing and Implementing Microsoft DevOps Solutions (AZ-400) exam.”