Sr. Site Reliability Engineer II at DoubleVerify In NYC

Sr. Site Reliability Engineer II
DoubleVerify // digital media measurement software & analytics

Sr. Site Reliability Engineer

Location: New York, NY - 3 days per week on-site (Required)

Who We Are

DoubleVerify (DV) is a leading independent provider of marketing measurement software, data, & analytics. We authenticate the quality & effectiveness of digital media for the worlds largest brands & media platforms, ensuring media transparency & accountability. Since 2008, DV has empowered hundreds of Fortune 500 companies to maximize their media investments by delivering best-in-class solutions across the digital ecosystem, contributing to a stronger, safer, & more secure digital advertising industry. Learn more at www.doubleverify.com.

Position Overview

As a Senior Site Reliability Engineer (SRE) at DoubleVerify, you will play a critical role in building & scaling our SRE team. This dual-role position requires both hands-on technical expertise & a passion for mentoring & educating team members. You will be responsible for implementing & promoting SRE best practices, including the development & monitoring of Service Level Indicators (SLIs), Service Level Objectives (SLOs), & Service Level Agreements (SLAs). Your contributions will ensure the reliability, scalability, & performance of our digital media measurement platforms, directly impacting our mission of delivering media transparency & accountability.

Responsibilities

Team Development & Mentorship: Build & grow the SRE team by recruiting, mentoring, & educating team members on SRE principles, promoting a culture of reliability & automation.

Technical Contributor: Contribute directly to the design, implementation, & maintenance of highly available infrastructure & services, with a focus on automation to minimize manual intervention.

SLA/SLO/SLI Management: Define, monitor, & report on SLIs, SLOs, & SLAs to ensure alignment with business objectives & user expectations. Use these metrics to drive reliability improvements & guide decision-making.

Incident Management & Response: Develop & implement robust incident response processes, including on-call rotations & post-incident reviews, to minimize downtime & prevent recurrence.

Collaboration & Communication: Partner closely with development, operations, & product teams to integrate reliability into the software development lifecycle, promoting cross-functional collaboration.
Continuous Improvement: Analyze system performance data to identify areas for improvement, implementing solutions to enhance reliability, scalability, & efficiency.

Requirements

Experience: 5+ years in site reliability engineering, DevOps, or a related field, with experience mentoring & educating other engineers.

Technical Proficiency: Expertise in Linux/Unix systems administration, cloud platforms (AWS, GCP, or Azure), & container orchestration tools like Kubernetes.

Programming Skills: Proficiency in scripting & programming languages such as Python, Go, or Bash for automation & tool development.

Monitoring & Observability: Experience with monitoring & logging tools such as Prometheus, Grafana, Splunk, or Nagios. Proven ability to develop & track SLIs, SLOs, & SLAs.

Automation & Infrastructure as Code: Hands-on experience automating infrastructure & deployments using tools like Terraform, Ansible, or Chef.

Communication & Mentorship: Strong verbal & written communication skills, with a passion for mentoring & educating team members on technical concepts & SRE best practices.

Problem-Solving Aptitude: Exceptional analytical skills with a proactive approach to identifying & resolving system issues.

Team Collaboration: Ability to work both independently & collaboratively within a team environment.

Preferred Qualifications

Advanced Education: Bachelors or Masters degree in Computer Science, Engineering, or a related field.

Certifications: Relevant industry certifications such as AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer, or Certified Kubernetes Administrator (CKA).

Security Awareness: Familiarity with security best practices in cloud & containerized environments.

Configuration Management: Experience with infrastructure as code & configuration management tools like Terraform, Ansible, or Chef.

Why Join Us

At DoubleVerify, we are committed to fostering an inclusive & dynamic workplace where employees can bring their authentic selves to work. We value passion, accountability, collaboration, & innovation, believing that diverse perspectives drive better business outcomes. Join us to contribute to a mission-driven company dedicated to enhancing the digital advertising ecosystem.

DoubleVerify is an Equal Opportunity Employer. We celebrate diversity & are committed to creating an inclusive environment for all employees.

The successful candidates starting salary will be determined based on a number of non-discriminating factors, including qualifications for the role, level, skills, experience, location, & balancing internal equity relative to peers at DV.
The estimated salary range for this role based on the qualifications set forth in the job description is between [$107,000- $231,000]. This role will also be eligible for bonus/commission (as applicable), equity, & benefits.
The range above is for the expectations as laid out in the job description; however, we are often open to a wide variety of profiles, & recognize that the person we hire may be more or less experienced than this job description as posted.

Not-so-fun fact: Research shows that while men apply to jobs when they meet an average of 60% of job criteria, women & other marginalized groups tend to only apply when they check every box. So if you think you have what it takes but youre not sure that you check every box, apply anyway!