As a key member of our fleet performance team, you will optimize & enhance core system performance, ensuring our products consistently exceed customer expectations.
You will be responsible for ensuring performance metrics are both accurate & easily accessible to our users, empowering them with transparent insights into our systems' capabilities & efficiency. This commitment to data transparency & accuracy will be fundamental in building trust & driving informed decision-making across the organization.
What You'll Add to DigitalOcean:
- Develop & implement comprehensive performance metrics, analysis tools, & reporting systems
- Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
- Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, & network stack to devise optimization strategies
- Identify system bottlenecks proactively & drive optimizations across the hypervisor software stack
- Work cross-functionally to harness new performance capabilities from evolving hardware architectures
- Enhance test frameworks & pipelines to ensure robust performance validation
- Investigate & resolve virtual machine downtime & performance issues in our production environment
- Participate in on-call rotations as needed to support system reliability
What You'll Be Doing:
- Bachelor's or Master's degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
- Extensive knowledge of Linux kernel, hypervisors, & open-source operating systems
- 5+ years of experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, & NCCL
- 5+ years developing strategies for managing, monitoring, & analyzing infrastructure, applications & services
- Strong proficiency in Go, Python, and/or Ruby
- Deep understanding of kernel performance aspects, including scheduling, context switching, & hardware acceleration
- Expertise in distributed systems performance, including tracing & debugging methodologies
- Demonstrated ability to solve complex problems at scale
- Excellent cross-team collaboration & communication skills
- Leadership experience in skills development & mentorship
- Professional-level written & spoken English with strong presentation abilities
Preferred Qualifications:
- Experience with observability platforms such as Splunk, Prometheus, Grafana, Elastic, or Dynatrace
- Experience with Chef, AWX, and/or Kubernetes
- Familiarity with x86_64 and/or ARM architectures
- Successful history of upstreaming Linux kernel patches
- In-depth knowledge of at least one Linux subsystem (CPU scheduling, memory management, file system, I/O, etc.)
- Experience in developing & deploying ML-based solutions for anomaly detection & dynamic load balancing
This role offers a unique opportunity to drive performance optimization at scale, leveraging cutting-edge technologies & AI/ML solutions to tackle complex infrastructure challenges. As our systems rapidly expand, you'll play a pivotal role in ensuring our architecture remains efficient, resilient, & future-ready.
Why Youll Like Working for DigitalOcean:
- We innovate with purpose. Youll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud & AI so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, & scrappy, like an owner with a bias for action & a powerful sense of responsibility for customers, products, employees, & decisions.
- We prioritize career development. At DO, youll do the best work of your career. You will work with some of the smartest & most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, & education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth & development.
- We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support your overall well-being, from one-time work from home stipend to wellness allowance to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations & preferences.
- We reward our employees. The salary range for this position is between $165,000 - $210,000 based on market data, relevant years of experience, & skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company & individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire & the option to participate in our Employee Stock Purchase Program.
- We value diversity & inclusion. We are an equal-opportunity employer, & recognize that diversity of thought & background builds stronger teams & products to serve our customers. We approach diversity & inclusion seriously & thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
*This is a remote role
#LI-Remote
#LI-DS1
|