Location- Austin, TX or open to 100% remote #LI-Remote
Summary of role
Own availability, the most important product feature, by continually striving for sustained operational excellence of Sumos planet-scale observability & security products. Work with your global SRE team to optimize operations, increase efficiency in our use of cloud resources & our developers time, harden the security posture, & increase the feature velocity of our developers. Work closely with multiple teams on assisted engagements to optimize the operations of their microservices.
Responsibilities
- Continually improve the lifecycle of microservices & architectural components from inception & design, through deployment, operation, & refinement.
- Participate in defining, evolving, & managing SLOs
- Write code & automation to reduce operational workload, increase efficiency, improve security posture, eliminate toil, & enable Sumos developers to deliver features more rapidly.
- Scale systems sustainably through mechanisms like automation, & evolve systems by pushing for changes that improve reliability & velocity.
- Facilitate blame-free root cause analysis meetings for incidents to learn & drive improvement
- Participate in & continually improve our global IRC (incident response coordination) for all products.
- Drive root cause identification & issue resolution with the various teams.
- Work inside of a fast-paced iterative environment.
Required Qualifications & Skills
- Cloud native application development experience leveraging best practices & design patterns
- Strong debugging & troubleshooting skills across the entire technology stack
- Deep understanding of AWS Networking, Compute, Storage, & managed services.
- Competency with modern CI/CD tooling like Kubernetes, Terraform, Ansible & Jenkins
- Experience with full life cycle support of services, from creation to production support
- Versed in Infrastructure as Code practices using technologies like Terraform or Cloud Formation
- Ability to author production ready code in at least one the following: Java, Scala or Go.
- Experience with Linux systems & at home on the command line
- Understand & apply modern approaches to cloud-native software security
- Experienced with agile frameworks, such as Scrum & Kanban, & how to operate within these frameworks to continually deliver value.
- Flexible & willing to step into new roles & responsibilities
- Willingness to learn & use Sumo Logic products for solving reliability & security issues
- Bachelors or Master's Degree in Computer Science, Electrical Engineering, or another scientific or technical discipline
- 6+ years of industry experience.
Desirable Skills
- Experience using Sumo Logic products or other observability products for reliability & security
- Experienced with planet scale product development
- Running & operating SaaS products on AWS Cloud with expert level proficiency
- Experience with streaming technologies like Kafka, Kafka Streams, or KSQL
- Expert level experience in one or more of: Java, Go, Scala, or Python
- Expert level experience in one or more of: Terraform, Jenkins, Kubernetes
- Extensive experience running & tuning JVM workloads at scale
About Us
Sumo Logic (NASDAQ: SUMO) empowers the people who power modern, digital business. Through its SaaS analytics platform, Sumo Logic enables customers to deliver reliable & secure cloud-native applications. The Sumo Logic Continuous Intelligence Platform helps practitioners & developers ensure application reliability, secure & protect against modern security threats, & gain insights into their cloud infrastructures. Customers around the world rely on Sumo Logic to get powerful real-time analytics & insights across observability & security solutions for their cloud-native applications.
Other Details:
- Competitive base salary + bonus + RSU's
- Unlimited PTO + 12 company holidays + 4 quarterly wellness days
- 100% remote or in office- your choice
- Employee stock purchase plan- ESPP
- Medical, Dental, Vision
- Paid Parental leave
|