WHY BOX NEEDS YOU
The Observability Platforms team provides an end-to-end experience enabling Box engineers by leveraging frameworks, tools, APIs & visualizations to better understand the behavior of features, services, & infrastructure they own & maintain. The team also helps educate product, infrastructure, & systems teams on how to appropriately monitor features & services they own, provide visualizations for monitoring distributed systems, give guidance for reducing operational overhead, & supports the delivery of unmatched availability to our customers.
We need a Sr. SRE with the experience of having designed, operated, & implemented Observability frameworks at a very large scale, & well versed in the operation of scaled architectures. You should have deep operational knowledge of distributed systems & how to avoid limitations through innovative design.
WHY BOX NEEDS YOU
The main focus of the Observability Team is to build frameworks & systems that can manage the performance of Box systems while scaling to billions of events per second. Additionally, we are responsible to standardize observability across engineering teams, drive designs for high performing services & foster great observability practices. We build, scale, & operate low-latency, high-throughput data systems that power high resiliency of Box Systems. You will help us execute on this vision & ensure that Box continues to ship scalable services that can hold against the high-performance expectation from our customers.
We are looking for big thinkers & innovators who have experience working with scalable distributed systems & have a passion for high performance & reliability. We are a small team with big ambitions that values impact & is not afraid of huge, gnarly problems. If this excites you, come join us!
WHAT YOU'LL DO
-
You're going to have the unique opportunity to build, improve, and support our Observability (o11y) platform. You will get to work with cutting-edge technologies that are defining the future of Box's cloud platforms. You will have visibility & impact across all of Engineering.
-
Provide o11y products like ELK, Splunk, Sensu, Prometheus, AppDynamics, Dynatrace, etc. to engineering teams for centralized logging, APM tooling, monitoring & alerting, & distributed tracing.
-
You'll collaborate with other engineers on the team to foster solid engineering principles & represent our engineering values
-
As a senior member of the team, you'll use both technical & relational skills to lead large scale projects to completion
-
Manage, maintain & scale the infrastructure responsible for telemetry frameworks used throughout Box's infrastructure, cloud services, & products to capture, transport, store & analyze the telemetry data. Scale the observability infrastructure to support petabytes of logs & billions of metric data points daily.
-
You'll collaborate, influence & drive for improvement across scrum teams
-
You'll provide additional support & perform various pocs on new projects, frameworks for Observability
-
Define & educate platform consumers on observability best practices from a SRE perspective.
-
Participate in deep technical design discussions within your team, across partner teams, & ensure that were building the right systems.
WHO YOU ARE
-
You take an SRE-centric approach to everything you build/manage, ensuring reliability, availability & security
-
You have 7+ years experience in a Devops type of role
-
You act like an owner & strive to do work you're proud of, both technically & in your team interactions
-
You are a self-starter & a strong supporter of self service & automation within O11y (Observability)
-
Deep knowledge of OS system fundamentals (linux) & core internet technologies, including TCP/IP, DNS, NAT, SDN
-
Proven production service troubleshooting skills that span applications, systems & network within a primarily Linux environment
-
Solid understanding of infrastructure automation tools (Puppet, Ansible, or the like)
-
Experience in using industry standard DevOps CI/CD frameworks (Jenkins/Spinnaker, or the like)
-
Solid experience in building automations, frameworks preferably with Python & Go
-
Experience in running containerized services in Private/Public Cloud (GCP, AWS)
-
Experience in building, managing metrics & data driven observability platforms & peripherals
-
Experience in managing O11y (Observability) is a plus
-
You have a fair understanding of technologies like Elasticsearch, Apache Storm or other DAG technologies, & streaming technologies like Kafka (pub/sub, or Kinesis).
-
You have built distributed, high-throughput & low-latency systems with a strong focus on availability, resilience, & durability.
-
Remote Friendly
BENEFITS
EQUAL OPPORTUNITY
We are an equal opportunity employer & value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
|
|