Events  Classes  Jobs 
    Sign in  
 
 
PulsePoint // programmatic targeting, distribution & content marketing
 
Engineering, Full Time    New York    Posted: Wednesday, May 27, 2020
 
   
 
Apply To Job
 
 
JOB DETAILS
 

PulsePoint Data Engineering team plays a key role in our technology company thats experiencing exponential growth. Our data pipeline processes over 80 billion impressions a day (> 20TB of data, 220 TB uncompressed). This data is used to generate reports, update budgets, & drive our optimization engines. We do all this while running against extremely tight SLAs & provide stats & reports as close to real-time as possible.

The most exciting part about working at PulsePoint is the enormous potential for personal & professional growth. We are always seeking new & better tools to help us meet challenges such as adopting proven open-source technologies to make our data infrastructure more nimble, scalable & robust. Some of the cutting edge technologies we have recently implemented are Kafka, Spark Streaming, Presto, Airflow, & Kubernetes.

What you'll be doing:

  • Design, build & maintain reliable & scalable enterprise level distributed transactional data processing systems for scaling the existing business & supporting new business initiatives
  • Optimize jobs to utilize Kafka, Hadoop, Presto, Spark Streaming & Kubernetes resources in the most efficient way
  • Monitor & provide transparency into data quality across systems (accuracy, consistency, completeness, etc)
  • Increase accessibility & effectiveness of data (work with analysts, data scientists, & developers to build/deploy tools & datasets that fit their use cases)
  • Collaborate within a small team with diverse technology backgrounds
  • Provide mentorship & guidance to junior team members

Team Responsibilities:

  • Installation, upkeep, maintenance & monitoring of Kafka, Hadoop, Presto, RDBMS
  • Ingest, validate & process internal & third party data
  • Create, maintain & monitor data flows in Hive, SQL & Presto for consistency, accuracy & lag time
  • Maintain & enhance framework for jobs(primarily aggregate jobs in Hive)
  • Create different consumers for data in Kafka using Spark Streaming for near time aggregation
  • Train Developers/Analysts on tools to pull data
  • Tool evaluation/selection/implementation
  • Backups/Retention/High Availability/Capacity Planning
  • Review/Approval - DDL for database, Hive Framework jobs & Spark Streaming to make sure they meet our standards
  • 24*7 On call rotation for Production support

Technologies We Use:

  • Airflow - for job scheduling
  • Docker - Packaged container image with all dependencies
  • Graphite/Beacon - for monitoring data flows
  • Hive - SQL data warehouse layer for data in HDFS
  • Impala- faster SQL layer on top of Hive
  • Kafka- distributed commit log storage
  • Kubernetes - Distributed cluster resource manager
  • Presto - fast parallel data warehouse & data federation layer
  • Spark Streaming - Near time aggregation
  • SQL Server - Reliable OLTP RDBMS
  • Sqoop - Import/Export data to RDBMS

Required Skills:

  • BA/BS degree in Computer science or related field
  • 5+ years of software engineering experience
  • Knowledge & exposure to distributed production systems i.e Hadoop is a huge plus
  • Knowledge & exposure to Cloud migration is a plus
  • Proficiency in Linux
  • Fluency in Python, Experience in Scala/Java is a huge plus
  • Strong understanding of RDBMS, SQL;
  • Passion for engineering & computer science around data
  • Willingness to participate in 24x7 on-call rotation
 
 
 
Apply To Job
 
 
 
 
 
© 2020 GarysGuide      Terms