Intelligent Retail Lab (IRL) is part of Walmart's Store No.8, an innovation hub formed by the world's largest retailer focused on identifying & investing in trends & technologies reshaping the shopping experience.
IRL's mission is to revolutionize in-store experiences, leveraging emerging technology to help define & deliver on evolving customer expectations. Its success requires a cross-functional, mission-based team that ishighly entrepreneurial, collaborative & passionate about solving the unsolved problems. As Walmarts Applied Artificial Intelligence incubator, we work with the bleeding edge of technology to define the future of retail shopping.
As a Site Reliability Engineer you will be working with a range of technologies that power the platform & development of IRL. Through standards, best practices & choosing the right technologies we are looking to maintain a robust environment capable of moving at the speed IRL requires to remain on the forefront of innovation.
What you'll do:
- Work as a part of a team developing production-ready applied artificial intelligence software & massively scalable distributed systems.
- Work with the global Systems & Infrastructure Platform team to implement organizational practices for software development, CI, CD, containerization, & Kubernetes operations.
- Work with your development team & the Systems & Infrastructure Platform team to analyze software & system performance & optimization opportunities.
- Work with the Systems & Infrastructure Platform team to provide real-world usage & failure scenarios in order to continually improve the reliability & stability of AI systems running in real-world environments.
- Participate as a part of an on-call rotation to ensure site availability & reliability.
Skills & Experience Required:
- 2+ years experience in a technical support, DevOps, Systems Administration, or SRE position
- BS in Computer Science or similar is desired
- Ability to distill complex technical challenges to actionable & explainable decisions in a fast-paced CI/CD environment.
- Comfortable working in a variety of programming or scripting languages (primarily Python, TypeScript, & Rust).
- Experience with creation & maintenance of CI/CD systems (for example, Azure DevOps) is desirable.
- Experience with Apache Kafka or other stream-processing platforms is desirable.
- Experience working with Microsoft Azure and/or Google Cloud Platform is desirable.
- A desire to learn, improve, & help solved unsolved problems is a must.