Our industry is starting to go through a transformational shift & we intend to lead it. As talent becomes the main differentiator between failure & success, organizations must attract, engage & develop their people more than ever. To do so, they need powerful & sophisticated tools, which take the pain out of HR management & empower employees & people leaders. That's where we come in.
Lifion by ADP is expanding our startup style operation in NYC in order to accelerate new technical innovation across UI, Search, Platform Technology, IaaS, Big Data, Social, etc. The concept & vision behind the strategy is "Innovate like a Startup" with the goal of delivering highly automated, intelligent & predictive solutions to the market. Our goal is to have specialized teams of superstars focused in these areas to keep pace with market trends & quickly incubate & deliver capabilities that dramatically increase the value of our solutions for clients.
The incident commander is responsible for managing incident to its resolution as quickly as possible, coordinating with teams, communicating outward & planning next steps. During a major outage or incident the IC must make decisions, delegate to appropriate teams, & create multiple backup plans in order to minimize the time to resolution.
The IC will have superb listening & delegation skills. Deferring tasks to appropriate teams & listening to their expertise as input for next steps. This person must be able to weigh alternatives & keep options for multiple paths to avoid delay in moving the restoration effort forward.
The Incident Manager is also responsible for keeping a clear communication line to senior stakeholders & those not immediately in the triage effort. Additionally the IC will work with the teams to document & analyze the issue post-mortem to prevent future incidents.
- Excellent communication skills, both verbal & written.
- A high-level knowledge of incident management best practices & systems
- Problem-solving skills
- The ability to make quick, confident decisions
- Listening & synthesis skills
- Previous experience with major incidents (either as a participant or an observer)
- Leadership skillsthe ability to take command in a high-stress situation
- Solve problems relating to mission critical services & create solutions to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
- Understand the operational complexity of a microservice architecture
- Increasing efficiency by identifying & addressing performance bottlenecks
- Define, track, review & report on Service Level Objectives (SLOs), Service Level Indicators (SLIs), System Availability, & the progress & outcomes related to reliability initiatives.
- Capable of decision making & Leadership without oversight. As well as influencing others without hierarchy (both upwards & laterally)
- Ability to manage incidents & keep everyone calm & focused on solving issues. Removing folks who distract the immediate service restoration. This should be true regardless of the level of person causing distraction.
- Planning backups, rollbacks, & next steps before & during an incident.
- At least 5 years combined of experience in software engineering & automated test engineering
- Strong production experience with cloud native services (AWS, Azure, GCP)
- Familiarity with Git SCM & one or more repository managers such as Github, Gitlab, Stash, Bitbucket, or Gerrit.
- Experience with lightweight development methodologies such as Agile - Scrum & / or Kanban