SRE Troubleshooting: Tools

placeholder

“Site reliability engineers (SREs) are typically good problem solvers. They need to think logically to identify problems correct them and prevent them from happening again.
In this course you ll explore several built-in and open-source troubleshooting tools SREs can use for resolving system issues. You ll start by examining the techniques of logging and whitebox and blackbox monitoring used to monitor system events. You ll then work with the various built-in Windows troubleshooting tools namely the Event Viewer Resource Monitor and System Information tools.
Next you ll use Google Cloud Dataflow to process logs before outlining the purpose and benefits of the StatsD standard and the /api/search endpoint. Lastly you ll identify how Google s Dapper is used for troubleshooting distributed systems and the open standards tool Prometheus for instrumenting software and exposing metrics.”