SRE Metric Management: Software Reliability Monitoring and Reporting
“Once SRE metrics have been identified site reliability engineers (SREs) must know how to perform fault analysis on a system classify defects and monitor and report data. In this course you ll explore the tools and best practices for carrying out these procedures.
You ll begin by identifying various fault analysis methods and tools. You ll then classify software defects and bugs with a focus on severity and priority.
Next you ll investigate strategies for monitoring APIs and explore some tools used for this task. You ll then examine in detAIl several tools for collecting analyzing and reporting metric data using a customizable dashboard including those that comprise the ELK Stack – Elasticsearch Logstash and Kibana. Furthermore you ll explore the data collection tool Beats and the beneficial use cases for Elasticsearch notifications.”