Custora exists to help our customers improve the relationships they have with their own customers. We do this by ingesting data about every interaction a company has with each of their customers & then making predictions using that data about how those customers will behave in the future. Our customers then use these predictions to tailor their communications.
Data engineers at Custora work on the pipelines that sit at the core of this architecture. We work to make these pipelines faster, more fault tolerant & to expand their scope.
The volume of data is large: we're working with 7 of the top 20 largest retailers in the world (+ many more not in the top 20), & are ingesting data from them both in a regular batch & in near-real time.
We've carefully selected the types of data to ingest to favor high signal data, so we care deeply about maintaining the correctness & completeness of the data being ingested as our models (and therefore the output of our product).
Your work directly impacts both the predictions we are able to make, & the day to day performance our customers experience when using our product.
Getting more specific, you will:
- Design & build complex data pipelines on the Spark platform, ingesting both batch & real time datasets
- Work with our data science team to deploy predictive models at scale
- Build tools to continuously validate incoming data & proactively identify & communicate data anomalies before they manifest into problems.
Were a small team, so youll be working on (and be able to meaningfully contribute to) high impact projects from your first day.
Sure, but whats it really like?
Inspired by Basecamp, we work in ~8 week product cycles. First, we work together (engineering + product) to identify the projects we think will have the biggest impact on our company goals. Heres an example of a recent project we conceptualized & delivered over one of these cycles:
- Migrate self-managed Spark cluster to EMR
To lower overall cost, & to be able to easily scale to handle bigger datasets & processing volumes, we recently switched from a self managed Spark cluster to Amazons Elastic Mapreduce service.
Our challenge was to move terabytes of data used by our clients while having no downtime. Some initial concerns were the performance on the EMR cluster & the migration process itself because some clients ingest a combination of live data & batch based data. The scale of our data sent meant that we had to switch several clients at a time to EMR.
After ensuring that Hive backed by S3 was performant enough, we built tools to move vast amounts of data in parallel, to redirect requests to the correct cluster as clients were being moved, & to validate the data after migration. Along the way, we also had to reshape the data (in terms of partition size) to ensure efficiency of copying & loading into the Hive database.
While we make use of a wide variety of tools, our primary web stack is ES6/React & Ruby on Rails deployed on AWS. We make extensive use of R for statistical analysis, & our primary data stores are Hive, MySQL, & Redis.
What its like to work here:
- On Monday we eat & meet as a team to chat projects & progress.
- Were 50 genuinely nice people; 25 men & 25 women. We work together & experiment with how to do things.
- We move quickly. You build something & the next day it comes to life. You see & feel an immediate impact with the collective efforts of the team.
- Were building a company & a team we love. Were in it for the long run.
- Read more about what makes us, us here.
- Find out more here or here.
Custora is an equal opportunity employer. We value diversity. We dont discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, marital status, veteran status, disability status, or socioeconomic status.
- 5 or more years of experience as a software engineer.
- Degree in Computer Science or a deep competency achieved via other means.
- Familiarity with Ruby on Rails, AWS, & SQL-based databases.
- High standards for code quality & maintainability.
Nice to Have's:
- Experience with R, Scala, Spark, and/or Chef.
- Consistent record of delivering significant features or building out platforms & services.
- Experience working in e-commerce.
- Were a flexible work environment
- Competitive salary & meaningful equity
- Health, dental & vision insurance (100% covered)
- Free lunch every day, plus free water hot & cold!
- Unlimited vacation: take as much time as you need (we recommend at least 3 weeks)
- Monthly unlimited MetroCard