Facebook

Twitter

Linked In

  • Opportunity

    our quality team can
    enhance your career

  • 1

Opportunities

Major Incident Manager

Location: Los Angeles, CA
fs 
Job Elements

 

Basic Function

 
The Major Incident Manager is responsible for coordinating all resources called to address critical incidents requiring attention and/or resolution from all technology teams. This position will both perform the hands-on live coordination of major incident recovery efforts, as well as create and audit incident response standards and service levels for all teams who manage the technology used to deliver and support services. These teams are both internal support and third party partners and vendors. The Major Incident Manager is expected to maintain technical contacts and a responsive relationship with all the in house and vendor technical support resources needed to maintain acceptable production availability of all backbone and tier 1 systems.
 
Differentiating Major Incident Management from Incident Management is the likelihood that Major Incidents may be wholly unique or unanticipatable events that are not well mitigated by established procedures. The Major Incident Manager is an authoritative technical expert across multiple areas of knowledge who is able to improvise organizational response while keeping focused on service restoration, data security and timely and appropriate communication both internally and to affected client or vendor partners.
 
Although root cause analysis and problem resolution are activities the Major Incident Manager will contribute to, the MIM’s primary focus is on service restoration. This individual will need to exercise judgement under crisis conditions to maintain a focus on swiftest time to service restoration, and will need to determine if pursuing workarounds or temporary fixes are quicker than full root cause resolutions. The MIM will decide how best to implement fixes in ways that both allow services to continue and give technical teams the time and opportunity to research and resolve root causes properly. To do this effectively the MIM will need to be knowledgeable enough about infrastructure, network, data and software architecture to evaluate the appropriateness of recommended interventions.
 
The Major Incident Manager will work closely with the Business Continuity Management Office as the conditions of a Major Incident will often precede or overlap events that may be declared Disasters and invoke Disaster Recovery procedures. The MIM will be a key decision maker in the declaration of disasters and invoking Crisis Management and Disaster Recovery procedures.
 
The MIM will be responsible for documenting Major Incidents according to established Incident Management standards. This position will also be responsible for modifying and improving incident management procedures as necessary, with the goal of aligning the incident documentation and reporting standards to the division’s Service Management strategy. The MIM will convene and conduct post incident reviews and both make and implement recommendations for architecture and process improvements to prevent reoccurrences of major incidents and to improve response time and effectiveness.
 
In addition to the above responsibilities, the Major Incident Manager will also serve as a backup resource for the Incident Manager. The Incident Manager will in turn serve as a backup for the Major Incident Manager. Although the scope for each role varies, many of the skills required are overlapping and these positions are best served by working closely together to standardize processes and service levels.
 
Due to the 24/7 nature of operations, the Major Incident Manager may be asked to maintain on-call availability during evening and weekend hours.
 
As an accurate knowledge base is a key asset in timely recovery of services, the Major Incident Manager will also be asked to actively contribute to the writing and regular auditing of the Service Management system’s knowledge base and configuration management database.

Requirements

  • Bachelor's degree in business or computer science is highly desired.
  • 5+ years demonstrated experience providing incident management or disaster recovery for large corporate applications in a networked environment that includes in-house developed systems, vendor installed systems and third party external service providers.
  • Proven analytical talent in evaluating symptoms and prioritizing appropriate troubleshooting activities in emergencies.
  • Strong knowledge of multi-platform (Windows and Unix) server operating systems and environments, both as on-premises datacenter infrastructure and in cloud service instances.
  • Good knowledge of web servers (IIS and Apache Tomcat).
  • Understanding of local and wide area networks, protocol and simple network troubleshooting preferred.
  • Understanding of administration tools for database troubleshooting (e.g. SQL Enterprise Manager) preferred.
  • Excellent communication and interpersonal skills.  Including a strong ability to create positive and professional business relationships with technical teams (both internal and third party), internal clients and business users.
  • Proven ability to communicate effectively with executives and corporate communications individuals and teams under crisis conditions to relay accurate, complete information to appropriate channels in a timely manner.
  • Strong commitment to working as a team and providing excellent customer service.
  • Exposure to banking or other financial services software systems preferred.
or
this job portal is powered by CATS