Events  Classes  Deals  Spaces  Jobs 
    Sign in  
 
 
Pluralsight
developer training by experts
 
Engineering, Full Time       Posted: Thursday, August 08, 2019
 
   
 
Apply To Job
 
 
 
JOB DETAILS
  Job Description

Our Data Engineering team builds & maintains a secure, scalable, flexible & user-friendly analytics hub that allows us to make informed & data-driven decisions. They also construct & curate business-critical data sets that allow us to realize the value of all the data we collect.

A Data Engineer utilizes a multidisciplinary approach to providing ETL solutions for the business, combining technical, analytical, & domain knowledge. The perfect applicant for this role has strong development skills, experience transforming & profiling data to determine risks associated with proposed analytics solutions, a willingness to continually interface with analysts in order to determine an optimal approach, & an eagerness to explore data sources to understand the availability, utility, & integrity of our data.

What you'll own:

Data pipeline / ETL development:

  • Building & enhancing data curation pipelines using tools like SQL, Python, Glue, Spark & other AWS technologies
  • Focus on data curation on top of datalake data to produce trusted datasets for analytics teams

Data Curation:

  • Processing & cleansing data from a variety of sources to transform collected data into an accessible & curated state for Analysts & Data Scientists
  • Migrating self-serve data pipeline to centrally managed ETL pipelines
  • Advanced SQL development & performance tuning
  • Some exposure to Spark, Glue or other distributed processing frameworks helpful
  • Work with business data stewards & analytics team to research & identify data quality issues to be resolved in the curation process

Data Modeling:

  • Design & build master dimensions to support analytic data requirements
  • Replacing legacy data structures with new datasets sourced from streaming data feeds from the core product & other operational systems
  • Design, build & support pipelines to deliver business critical datasets
  • Resolve complex data design issues & provide optimal solutions that meet business requirements & benefit system performance

Query Engine Expertise & Performance Tuning:

  • Assist Analytics teams with tuning efforts
  • Curated dataset design for performance

Orchestration:

  • Management of job scheduling
  • Dependency management mapping & support
  • Documentation of issue resolution procedures

Data Access

  • Design & management of data access controls mapped to curated datasets

Leveraging devops best practices, such as IAC & CI/CD to build upon a scalable & extensible data environment

Experience you'll need:

  • Strong experience designing & building end-to-end data pipelines
  • Extensive SQL development experience
  • Knowledge of data management fundamentals & data storage principles

Data modeling:

  • Normalization
  • Dimensional/OLAP design & data warehousing
  • Master data management patterns
  • Modeling trade-offs impacting data management & processing/query performance
  • Knowledge of distributed systems as it pertains to data storage, data processing & querying
  • Extensive experience in ETL & DB performance tuning
  • Hands on experience with a scripting language (Python, bash, etc.)
  • Some experience with Hadoop, Spark, Kafka, Impala, or other big data technologies helpful

Familiarity with the technology stacks available for:

  • Metadata management: Data Governance, Data Quality, MDM, Lineage, Data Catalog etc.
  • Data management, data processing & curation:
  • Postgres, Hadoop, Hive, Impala, Presto, Spark, Glue, etc.

Experience in data modeling for batch processing & streaming data feeds; structured & unstructured data

Experience in data security / access management, data cataloging & overall data environment management

Experience with cloud services such as AWS & APIs helpful

You'd be a great fit if your current track record looks like this:

  • 5+ years of progressive experience data engineering & data warehousing
  • Experience with a variety of data management platforms (e.g. RDBMS (Postgres), Hadoop (CDH, EMR))
  • Experience with high performance query engines (Hive, Impala, Presto, Athena, MPP engines like RedShift)
  • Strong capability to manipulate & analyze complex, high-volume data from a variety of sources
  • Effective communication skills with technical team members as well as business partners. Able to distill complex ideas into straightforward language
  • Ability to problem solve independently & prioritize work based on the anticipated business value
Additional Information

All your information will be kept confidential according to EEO guidelines.

 
 
 
Apply To Job
 
 
 
 
 
© 2019 GarysGuide      About   Terms   Press   Feedback