Our Data Engineering team builds & maintains a secure, scalable, flexible & user-friendly analytics hub that allows us to make informed & data-driven decisions. They also construct & curate business-critical data sets that allow us to realize the value of all the data we collect.
A Data Engineer utilizes a multidisciplinary approach to providing ETL solutions for the business, combining technical, analytical, & domain knowledge. The perfect applicant for this role has strong development skills, experience transforming & profiling data to determine risks associated with proposed analytics solutions, a willingness to continually interface with analysts in order to determine an optimal approach, & an eagerness to explore data sources to understand the availability, utility, & integrity of our data.
What you'll own:
Data pipeline / ETL development:
- Building & enhancing data curation pipelines using tools like SQL, Python, Glue, Spark & other AWS technologies
- Focus on data curation on top of datalake data to produce trusted datasets for analytics teams
- Processing & cleansing data from a variety of sources to transform collected data into an accessible & curated state for Analysts & Data Scientists
- Migrating self-serve data pipeline to centrally managed ETL pipelines
- Advanced SQL development & performance tuning
- Some exposure to Spark, Glue or other distributed processing frameworks helpful
- Work with business data stewards & analytics team to research & identify data quality issues to be resolved in the curation process
- Design & build master dimensions to support analytic data requirements
- Replacing legacy data structures with new datasets sourced from streaming data feeds from the core product & other operational systems
- Design, build & support pipelines to deliver business critical datasets
- Resolve complex data design issues & provide optimal solutions that meet business requirements & benefit system performance
Query Engine Expertise & Performance Tuning:
- Assist Analytics teams with tuning efforts
- Curated dataset design for performance
- Management of job scheduling
- Dependency management mapping & support
- Documentation of issue resolution procedures
- Design & management of data access controls mapped to curated datasets
Leveraging devops best practices, such as IAC & CI/CD to build upon a scalable & extensible data environment
Experience you'll need:
- Strong experience designing & building end-to-end data pipelines
- Extensive SQL development experience
- Knowledge of data management fundamentals & data storage principles
- Dimensional/OLAP design & data warehousing
- Master data management patterns
- Modeling trade-offs impacting data management & processing/query performance
- Knowledge of distributed systems as it pertains to data storage, data processing & querying
- Extensive experience in ETL & DB performance tuning
- Hands on experience with a scripting language (Python, bash, etc.)
- Some experience with Hadoop, Spark, Kafka, Impala, or other big data technologies helpful
Familiarity with the technology stacks available for:
- Metadata management: Data Governance, Data Quality, MDM, Lineage, Data Catalog etc.
- Data management, data processing & curation:
- Postgres, Hadoop, Hive, Impala, Presto, Spark, Glue, etc.
Experience in data modeling for batch processing & streaming data feeds; structured & unstructured data
Experience in data security / access management, data cataloging & overall data environment management
Experience with cloud services such as AWS & APIs helpful
You'd be a great fit if your current track record looks like this:
- 5+ years of progressive experience data engineering & data warehousing
- Experience with a variety of data management platforms (e.g. RDBMS (Postgres), Hadoop (CDH, EMR))
- Experience with high performance query engines (Hive, Impala, Presto, Athena, MPP engines like RedShift)
- Strong capability to manipulate & analyze complex, high-volume data from a variety of sources
- Effective communication skills with technical team members as well as business partners. Able to distill complex ideas into straightforward language
- Ability to problem solve independently & prioritize work based on the anticipated business value
Working at Pluralsight
Founded in 2004 & trusted by Fortune 500 companies, Pluralsight is the technology skills platform organizations & individuals in 150+ countries count on to create progress for the world.
Our platform helps technologists master their craft & take control of their careers. We empower businesses everywhere to build adaptable teams, speed up release cycles & become scalable, reliable & secure. We come to work everyday knowing we're helping our customers build the skills that power innovation.
And we don't let fear, egos or drama distract us from our mission. Our mission to democratize technology skills is what drives us & our values are at the helm of how we work together. It's our commitment to practicing them day in, day out that enables our performance. We're adults, & we treat each other that way. We have the autonomy to do our jobs, transparency to eliminate office politics & trust each other to do the right thing. We thrive in an environment with creativity around every corner, challenges that keep us on our toes, & peers who inspire us to be the best we can be. We bring different viewpoints, backgrounds & experiences, & united by our mission, we are one.
Bring yourself. Pluralsight is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age or veteran status.