BetterCloud is a high-energy, high-growth product company seeking an enterprising individual to join our growing team as a Director of SRE. You will be lead by BetterClouds Chief Architect (VP of Platform). The ideal candidate is a strong engineering leader responsible for making decisions that will continue to evolve BetterClouds cloud strategy for our key product offering, has a deep expertise in analyzing complex systems, anticipating problems & finding ways to mitigate risk. By incorporating knowledge of SRE processes, the perfect candidate will affect change & lead development of innovative improvements & world-class practices. You are expected to have a solid understanding of Cloud Architecture (Google Cloud Platform, AWS, Azure) & are able to drive processes like system & resilience testing, service management, & enabling insights (application telemetry, tracing, & Log aggregation).
BetterClouds Platform Team handles 1B+ events a day & rapidly growing. Were big believers in continuous delivery & Scrum to keep learning & improving. Our products live on Googles Cloud Platform. Our Microservices technology stack includes React/Angular (on front-end), Java, Scala, & Go on the application side, & a mix of relational & NoSQL solutions on the back end including MySQL, BigTable, ElasticSearch, & Googles Cloud Datastore. For stream processing, we use Kafka & Flink.
Youll be leading & managing our SRE team, which manages our highly-available, high-transactional infrastructure (i.e.,Kafka, Flink, Compute, MySQL/CloudSQL, ElasticSearch, Jenkins, Terraform, Packer, Chef, Consul, Nginx, HA Proxy, etc.), using continuous delivery, devops, & Scrum principles & practices to deliver.
We dont expect anyone to have experience with all of these technologies. Were simply looking for a seasoned engineering leader who loves to lead & mentor highly functioning technically astute teams, loves to learn, & has worked in highly performant cloud environments & understands what it takes to manage infrastructure & system reliability at scale.
Site Reliability Engineering (SRE) leader at BetterCloud manages a team of tech leads & highly skilled cloud engineers responsible for the overall health & operational design of our systems. You'll use a combination of technical depth, organizational skills, & verbal agility to lead a team of some of the best engineers in the industry.
You would be a strong engineering leader responsible for making decisions that would define cloud strategy for key product offerings. You have deep expertise in analyzing complex systems, anticipating problems & finding ways to mitigate risk. By incorporating your knowledge of SRE processes, you will affect change & lead development of innovative improvements & world-class practices. You are expected to have a good understanding of Cloud Architecture (AWS, Azure, Oracle Cloud) & you should be able to drive processes like System & resilience testing, building platform software components, insights (application telemetry, tracing, & Log aggregation).
If you have these qualities & are willing to hit the ground running then were the place for you!
10+ years experience building enterprise level software solutions
5+ years (most recently) leading & managing high performance technical teams
Has current technical knowledge of leading Cloud ecosystems, & an inspiring track record of innovative DevOps & tooling development leadership
A solid understanding of Scrum or Kanban & how to apply their respective practices in a pragmatic way. Setting priorities & technical roadmaps for the team. Keeping the team focused on priorities & value of their delivery
Strong sense of ownership & accountability
Ability to establish technical strategies & roadmaps for the team, document & educate Product, Engineering, Security, & other internal stakeholders of the value of implementing projects supporting the strategy/roadmap
Experience running/operating & delivering high volume, elastic scaling solutions in public cloud(s)
Solid understanding & management of cloud & other technology COGS
Ability to mentor & coach highly skilled technical engineers of various levels of experience, providing career pathing & growth for your team
Mindset of striving to delight our Customers, external & internal, which includes the Engineering, Security, & Customer Success teams
A bachelors degree in computer science or engineering
Champion Engineering rigor in operational practices, & foster a culture of rapid innovation & operational excellence throughout the organization
Inspire engineering & operational team members to take & handle calculated risks, & provide steady leadership during the inevitable service interruptions. Help solve urgent technical problems & make tough final decisions with the Customer & risk at the forefront of decisions
Responsible for uplifting & maintaining product platforms, infrastructure & technology controls; driving development & timely delivery of interrelated, highly-technical operation services that support our rapidly growing global cloud services
Focus on evolving the technology stack to improve availability, scale, performance, security, ease of maintenance, running services optimally, & guiding the Cloud Continuous Delivery Model through automation & tooling
Identify & implement automation opportunities to drive down repetitive processes, reduce technical debt & improve system reliability; enable all environment provisioning, maintenance, deprovisioning via IaaC scripting or equivalent
Manage systems to demanding availability targets from product teams (99.99%+); standardize the measurement of SLO & SLA across the system platform
Manage system stability & overall reliability for the foundation PaaS + IaaS platform by partnering with service owners, & product teams who will handle application support
Focusing on customer success by driving consumption of cloud resources through the best possible experience (up-time, MTTR, minimal latency, optimized performance), & measuring it
Evolve & improve DR plans for the overall platform
Provide career management of team engineers; Inspire through understanding purpose of work, providing a path toward mastery, & giving autonomy to accomplish priorities
Act as a mentor, leader, & scrum master for the SRE team, working with Product Managers, Engineering Directors & the Chief Architect to prioritize work items as part of a consolidated backlog, plan out work for the team, & ensure commitments are met & value achieved as committed
Work collaboratively with engineering peers & Enterprise Architects to develop the roadmap for teams & then execute against the roadmap
Advocate for continuous delivery, constantly look for ways to make your team more efficient. Embrace KPIs like quality metrics, cycle time, & SLAs, & treat them as tools we can use to improve & work toward mastery
Actively drive a culture of quality & innovation across groups by setting a high bar for code/automation quality, processes, & standards & reinforces this through reviews & discussions to build it right
Compensation | Benefits
Competitive base salary
Full benefits package
Career growth with an industry innovator
BetterCloud is an Equal Opportunity Employer, including disabled & vets.