Senior Software Engineer I at DigitalOcean In Austin

Senior Software Engineer I
DigitalOcean // cloud infrastructure management

We want people who are passionate about building apps that you & your peers will love. As a Software Engineer in the Insights team at DigitalOcean, the software you write will power the company-wide observability platform used by engineering teams across the organization. You will help define the next generation of cloud services & create flexible & powerful observability solutions that our internal teams will leverage to support our growing customer base. Your solutions will make developers' lives (both inside DO & in the 'wild') easier by building new systems & improving the efficiency & performance of existing systems.

The Insights team is responsible for building & operating DigitalOcean's internal observability platform, handling metrics, traces, & logs at scale.

If your passion is building reliable, scalable observability systems that teams rely on every day, DigitalOcean is the right place for you.

What You'll Be Doing:

Develop & maintain our Insights observability platform, including metrics collection, storage, visualization, & alerting systems.
Own & drive initiatives independently from concept to production, with minimal oversight.
Engineer solutions to meet both internal teams' & customers' observability needs.
Create scalable services that are performant & highly reliable in a distributed environment.
Take part in an on-call rotation & lead incident response efforts when needed.
Operate complex distributed systems at scale with high reliability objectives.
Maintain & improve our observability platform with a focus on enhancing reliability & performance.
Mentor teammates & transfer knowledge through design docs, pairing sessions, & code reviews.
Technologies we use: Kubernetes, Go, gRPC, MySQL, Redis, Kafka, Prometheus, Grafana, OpenTelemetry, & others

What You'll Add to DigitalOcean:

Language: Strong proficiency in Go with at least 3 years of experience in designing, building, & shipping production-grade Go applications (required, not just desired).
Proven Site Reliability Engineering (SRE) background with experience in implementing & maintaining reliable, scalable systems.
Hands-on experience operating & managing Kafka clusters at scale.
Extensive experience in designing, building, & running distributed systems in production environments.
Demonstrable experience with observability platforms & tools (e.g., Prometheus, Grafana, OpenTelemetry).
Familiarity with SLIs/SLOs & incident response best practices.
Experience in one or more of the following areas:

Distributed Databases like Mongo, Redis, MySQL, PostgreSQL, etc.
Fully managed infrastructure solutions
Serverless components
Kubernetes
Containers & Container Registries

Demonstrated ability to navigate the complexity of distributed systems to operate them in production.
Strong ability to contribute meaningfully to discussions on architectures, implementations, design patterns, & processes with the ability to succinctly convey ideas to peers & mentees.
Effective knowledge transfer skills & ability to mentor team members.
Experience in Agile software development methodologies.
Extensive experience working within a microservice architecture, with deep knowledge of both asynchronous, event-driven processing (particularly Kafka), & synchronous gRPC/HTTP-based requests.
Comfortable with rapid execution, learning from failure, & building for scale & reliability.
Experience working effectively on teams that operate across multiple time zones.

Why Youll Like Working for DigitalOcean

We innovate with purpose. Youll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud & AI so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, & scrappy, like an owner with a bias for action & a powerful sense of responsibility for customers, products, employees, & decisions.
We prioritize career development. At DO, youll do the best work of your career. You will work with some of the smartest & most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, & education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth & development.
We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support you from our Employee Assistance Program to Local Employee Meetups to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations & preferences.
We reward our employees. The salary range for this position is $120,000 - $142,700 based on market data, relevant years of experience, & skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company & individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire & the option to participate in our Employee Stock Purchase Program.
We value diversity & inclusion. We are an equal-opportunity employer, & recognize that diversity of thought & background builds stronger teams & products to serve our customers. We approach diversity & inclusion seriously & thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

*This is a remote role

#LI-Remote

#LI-AB1