Events  Deals  Jobs  NY Tech Week 
    Sign in  
 
 
 
Sumo Logic // cloud-based log analysis platform
Apply To Job

Senior Site Reliability Engineer - Product Area Focus

Location

  • Remote from India.

Summary of role

Own availability, the most important product feature, by continually striving for sustained operational excellence of Sumos planet-scale observability & security products.  Work alongside your global SRE team, executing on projects in your product-area specific reliability roadmap, to optimize operations, increase efficiency in our use of cloud resources & our developers time, harden security posture, & increase feature velocity of our developers  Work closely with multiple teams to optimize the operations of their microservices - & improve the lives of the engineers within your product area engineering teams.

Responsibilities

  • Support the engineering teams within your product area by maintaining & executing a reliability roadmap of opportunities for improvement for reliability, maintainability, security, efficiency, & velocity - & help for realizing those opportunities.
  • Collaborate with development infrastructure, Global SRE, & your product area engineering teams to establish & continually refine your reliability roadmap.
  • Participate in defining, evolving, & managing SLOs for several teams within your product area.
  • Participate in on-call rotations within your product area to understand operations workload so you can continually work to improve the on-call experience & reduce operational workload for running microservices & related components.
  • Complete projects to optimize & tune on-call experience for your engineering teams.
  • Continually improve the lifecycle of microservices & architectural components from inception & design, through deployment, operation, & refinement.
  • Write code & automation to reduce operational workload, increase efficiency, improve security posture, eliminate toil, & enable Sumos developers to deliver features more rapidly.
  • Work closely with the developer infrastructure teams to expedite development infrastructure adoption of tools to advance your reliability roadmap by identifying needs for your supported engineering teams, & contributing back features & bug fixes when needed. 
  • Scale systems sustainably through mechanisms like automation, & evolve systems by pushing for changes that improve reliability & velocity.
  • Facilitate blame-free root cause analysis meetings for incidents to learn & drive improvement
  • Participate in & continually improve our global IRC (incident response coordination) for all products.
  • Drive root cause identification & issue resolution with the teams.
  • Work inside of a fast-paced iterative environment.

Required Qualifications & Skills

  • Cloud native application development experience leveraging best practices & design patterns
  • Strong debugging & trouble-shooting skills across the entire technology stack
  • Deep understanding of AWS Networking, Compute, Storage, & managed services.
  • Competency with modern CI/CD tooling like Kubernetes, Terraform, Ansible & Jenkins
  • Experience with full life cycle support of services, from creation to production support
  • Versed in Infrastructure as Code practices using technologies like Terraform or Cloud Formation
  • Ability to author production ready code in at least one the following: Java, Scala or Go.
  • Experience with Linux systems & at home on the command line
  • Understand & apply modern approaches to cloud-native software security
  • Experienced with agile frameworks, such as Scrum & Kanban, & how to operate within these frameworks to continually deliver value.
  • Flexible & willing to step into new roles & responsibilities
  • Willingness to learn & use Sumo Logic products for solving reliability & security issues
  • Bachelors or Master's Degree in Computer Science, Electrical Engineering, or another scientific or technical discipline
  • 6+ years of industry experience.

Desirable Skills

  • Experience using Sumo Logic products or other observability products for reliability & security
  • Experienced with planet scale product development
  • Running & operating SaaS products on AWS Cloud with expert level proficiency
  • Experience with streaming technologies like Kafka, Kafka Streams, or KSQL
  • Expert level experience in one or more of: Java, Go, Scala, or Python
  • Expert level experience in one or more of: Terraform, Jenkins, Kubernetes
  • Extensive experience running & tuning JVM workloads at scale

About Us

Sumo Logic empowers the people who power modern, digital business.  Through its SaaS analytics platform, Sumo Logic enables customers to deliver reliable & secure cloud-native applications. The Sumo Logic Continuous Intelligence Platform helps practitioners & developers ensure application reliability, secure & protect against modern security threats, & gain insights into their cloud infrastructures. Customers around the world rely on Sumo Logic to get powerful real-time analytics & insights across observability & security solutions for their cloud-native applications. For more information, visit www.sumologic.com.

 
 
Apply To Job
 
 
 
 
 
© 2024 GarysGuide      About    Feedback    Press    Terms