At Lyft, we care deeply about delivering the best transportation experience for both drivers & passengers. The Networking team is responsible for all the network traffic to make the best ride possible, from our mobile app to our internal microservice architecture. This means providing the most reliable network seamlessly so that our engineers can build platforms that scale. This also means providing tooling to either make the network easy to understand or abstract the network completely.
As a Reliability Software Engineer embedded in the Networking team, you will build creative engineering solutions to operational problems. You will help operate one of the largest Envoy-baed service meshes in the industry. Your job is to eliminate operational burdens through automation. Your job will be dynamic & different day-to-day.
- Build & deploy open-source envoy to the entire fleet & create systems to make that process faster, iterative, & reliable
- Investigate how network configurations are being tuned & figure out how to set it automatically or abstract it away
- Proactively identify potential outages & build systems to triage & fix
- Be first responders to incidents & work with the rest of the team to ensure these incidents never happen again
- Figure out how to automate & reduce the operational burden on the service mesh running on Kubernetes
- Build & foster partnerships throughout the organization with a devotion to exceptional customer experience
- Never settle for the status quo, deliver operational excellence for Networking, Service Mesh & Edge
- Contribute to designing & building configuration, testing & deployment automation frameworks
- Be the first responder to incidents. Help triage, debug & pull engineers to mitigate incidents & make our systems better
Note that these skills are not requirements. Even if you do not fulfill any of these requirements, we encourage you to apply if you are interested in the work or have other relative experience.
- Experience working with and/or operating Envoy/Linkerd/Nginx or any other networking proxy
- Be able to eliminate manual operations with automation & advanced skills in automation tooling
- Experience with monitoring & logging management products such as ELK, Wavefront, SignalFx, CloudWatch, StackDriver, etc.
- Experience debugging complex problems that span over multiple systems & expertise in incident response methodologies, planning, testing, & execution
- Proficiency in high-level programming languages & scripting languages such as Golang & Python
- Strong cloud expertise (AWS, Azure, GCP, OCI)
- Familiarity with any networking discipline, such as load balancers, API gateways, DNS management, HTTP2, GRPC, etc
- Familiarity with container technology such as Docker & Kubernetes
- Hands-on experience implementing & maintaining configuration controls through infrastructure-as-code
- Great medical , dental, & vision insurance options.
- In addition to 11 observed holidays , salaried team members have unlimited paid time off, hourly team members have 15 days paid time off.
- 401(k) plan to help save for your future
- 18 weeks of paid parental leave. Biological, adoptive, & foster parents are all eligible
- Monthly commuter subsidy to cover your transit to work & 20% of all Lyft rides
Lyft is an Equal Employment Opportunity employer that proudly pursues & hires a diverse workforce. Lyft does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender-identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy. Lyft also strives for a healthy & safe workplace & strictly prohibits harassment of any kind. Pursuant to the San Francisco Fair Chance Ordinance & other similar state laws & local ordinances, & its internal policy, Lyft will also consider for employment qualified applicants with arrest & conviction records.