The Major Incident Manager is responsible for coordinating all resources called to address critical incidents requiring attention and/or resolution from all technology teams. This position will both perform the hands-on live coordination of major incident recovery efforts, as well as create and audit incident response standards and service levels for all teams who manage the technology used to deliver and support services. These teams are both internal support and third party partners and vendors. The Major Incident Manager is expected to maintain technical contacts and a responsive relationship with all the in house and vendor technical support resources needed to maintain acceptable production availability of all backbone and tier 1 systems.
Differentiating Major Incident Management from Incident Management is the likelihood that Major Incidents may be wholly unique or unanticipatable events that are not well mitigated by established procedures. The Major Incident Manager is an authoritative technical expert across multiple areas of knowledge who is able to improvise organizational response while keeping focused on service restoration, data security and timely and appropriate communication both internally and to affected client or vendor partners.
Although root cause analysis and problem resolution are activities the Major Incident Manager will contribute to, the MIM’s primary focus is on service restoration. This individual will need to exercise judgement under crisis conditions to maintain a focus on swiftest time to service restoration, and will need to determine if pursuing workarounds or temporary fixes are quicker than full root cause resolutions. The MIM will decide how best to implement fixes in ways that both allow services to continue and give technical teams the time and opportunity to research and resolve root causes properly. To do this effectively the MIM will need to be knowledgeable enough about infrastructure, network, data and software architecture to evaluate the appropriateness of recommended interventions.
The Major Incident Manager will work closely with the Business Continuity Management Office as the conditions of a Major Incident will often precede or overlap events that may be declared Disasters and invoke Disaster Recovery procedures. The MIM will be a key decision maker in the declaration of disasters and invoking Crisis Management and Disaster Recovery procedures.
The MIM will be responsible for documenting Major Incidents according to established Incident Management standards. This position will also be responsible for modifying and improving incident management procedures as necessary, with the goal of aligning the incident documentation and reporting standards to the division’s Service Management strategy. The MIM will convene and conduct post incident reviews and both make and implement recommendations for architecture and process improvements to prevent reoccurrences of major incidents and to improve response time and effectiveness.
In addition to the above responsibilities, the Major Incident Manager will also serve as a backup resource for the Incident Manager. The Incident Manager will in turn serve as a backup for the Major Incident Manager. Although the scope for each role varies, many of the skills required are overlapping and these positions are best served by working closely together to standardize processes and service levels.
Due to the 24/7 nature of operations, the Major Incident Manager may be asked to maintain on-call availability during evening and weekend hours.
As an accurate knowledge base is a key asset in timely recovery of services, the Major Incident Manager will also be asked to actively contribute to the writing and regular auditing of the Service Management system’s knowledge base and configuration management database.