SRE Emergency & Incident Response: Responding to Emergencies

placeholder

Site Reliability Engineers (SREs) are responsible for assigning the appropriate resources and responsibilities to effectively deal with unexpected emergencies. To do this SREs should ensure the proper processes and teams are in place before an emergency occurs. In this course youll explore the different emergency types and outline how to plan for them. Youll examine the causes of and how to respond to test-induced change-induced and process-induced emergencies and whats involved in proactive approaches to emergency testing and planning. Youll then outline the critical steps to correctly documenting emergencies including the history of outages and mistakes. Youll then differentiate between business continuity and disaster recovery planning and outline how to create both types of plans and conduct a business impact analysis. Lastly youll explore some IT recovery strategies.