Mission Description:
Advanced Analytics and AI are high on the agenda at our client and we are looking to strengthen our internal team of AI experts.
In this regard, we are currently searching for an outstanding data engineer proficient in Python and Spark. Your role will be crucial in advancing the development of analytics workflows for the purpose of generating insights, implementing prescriptive analytics, and creating decision support applications. The initial primary area of focus for this position will revolve around data engineering within the pharmaceutical manufacturing sector. As part of our team, you will have the opportunity to collaborate with other accomplished professionals and contribute to cutting-edge projects. Your expertise in data engineering and proficiency in Python and Spark will be instrumental in driving the progress of our analytics initiatives.
Responsibilities:
Develop and operate data pipelines processing large, complex datasets as input for analytics and machine learning
Help to define the analytical scope and data for projects, including investigating data sources, designing new features and data integration flows.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Create data tools for analytics and data scientist team members that assist them in building and optimizing their results.
Utilizing a diverse array of technologies and data science toolsets as needed, primarily Python, Spark and Pandas, but also Jupyter, Denodo, Azure ML, Azure DevOps, Docker, Databricks, GIT, SQL, …
Communicate ideas, approaches and results with peers and stakeholders
Requirements:
Mastery of Python, Spark and Pandas to create ETL pipelines for data scientists to use; knowledge of one or more data pipelines frameworks is a plus
At least 3 years of intensive hands-on experience as a full-stack Python data engineer: Python, Spark, Pandas, NumPy, SciPy, visualization (matplotlib), data pipeline orchestration (e.g. kedro)
Good knowledge and experience with versioning systems (GIT)
Good knowledge and experience with databases
Experience in extracting, cleaning, preparing and modeling data. Experience with command-line scripting, data structures, and algorithms.
Advanced degree in a relevant discipline such as: Statistics, Applied Mathematics, Operations Research/Optimization, Computer Science, Computational/Theoretical Physics, Data Science/visualization, Machine Learning, Electrical/Computer Engineering or Health Sciences (e.g. Bioengineering /Bioinformatics)
Ability to work across structured, semi-structured, and unstructured data
Strong presentation and communication skills towards peer data engineers and data scientists and non-technical stakeholders.
Ability to work individually and in teams (agile).
Experience with the healthcare / pharmaceutical industry is a plus