SRE Metric Management: Software Reliability Monitoring and Reporting
Once SRE metrics have been identified site reliability engineers (SREs) must know how to perform fault analysis on a system classify defects and monitor and report data. In this course youll explore the tools and best practices for carrying out these procedures. Youll begin by identifying various fault analysis methods and tools. Youll then classify software defects and bugs with a focus on severity and priority. Next youll investigate strategies for monitoring APIs and explore some tools used for this task. Youll then examine in detail several tools for collecting analyzing and reporting metric data using a customizable dashboard including those that comprise the ELK Stack – Elasticsearch Logstash and Kibana. Furthermore youll explore the data collection tool Beats and the beneficial use cases for Elasticsearch notifications.