SRE Team Management: Managing Operational Loads
To ensure and maintain a systems functional state site reliability engineers (SRE) must learn how to identify calculate and manage a systems operational load which generally falls into three categories: ongoing operation activities tickets and pages. In this course youll explore these categories in detail. Youll start by outlining methods for managing operational loads at the team level and using support ticketing systems and service level objectives. Next youll investigate toil a term used to describe the operational work associated with running and maintaining a production service. Youll outline steps for identifying calculating and eliminating toil and examine the adverse effects toil can have on a team. Additionally youll outline how to work with interrupts and distinguish between crucial metrics used for managing them. Lastly youll identify the human element factors to consider when dealing with interrupts including efficiency distractibility and respect.