SRE Team Management: Managing Operational Loads
“To ensure and mAIntAIn a system s functional state site reliability engineers (SRE) must learn how to identify calculate and manage a system s operational load which generally falls into three categories: ongoing operation activities tickets and pages.
In this course you ll explore these categories in detAIl. You ll start by outlining methods for managing operational loads at the team level and using support ticketing systems and service level objectives.
Next you ll investigate toil a term used to describe the operational work associated with running and mAIntAIning a production service. You ll outline steps for identifying calculating and eliminating toil and examine the adverse effects toil can have on a team.
Additionally you ll outline how to work with interrupts and distinguish between crucial metrics used for managing them. Lastly you ll identify the human element factors to consider when dealing with interrupts including efficiency distractibility and respect.á”