Our client is a top consulting firm and is looking for a Python expert.
The consultant will be responsible for building, optimizing, maintaining data pipelines into and within RL-based personalization engine environment (e.g. creating API connection between RL-based personalization engine and other systems)
More details will be provided by the client during discussions.
• Strong code development practices in Python >=3.7 with high amount of rigor and high code standards
• Experience in quality assuring data engineering code, e.g., by reviewing pull requests
• Strong capabilities in data management using – Relational methods/systems (SQL),
– Object storage/big data approaches (AWS S3/HDFS/Azure Data Lake),
– Distributed computing frameworks (such as Apache Spark)
• Strong capabilities in data storage layer design, in the physical (data asset organization, data type choice, data compression, data formats) and logical sense (data cardinality and normal forms, primary/foreign key relationships, integrity constraints)
• Strong capabilities in PySpark, covering data management and performance tuning, at data scale >1TB
• Experienced in code versioning and release management through git, e.g., following the gitFlow approach
• Experienced in unit testing, static code analysis/code linting, using e.g., pytest, flake8, black, isort
• Hands-on experience with a workflow orchestrator, preferrably Apache Airflow
• Basic knowledge of DevOps/cloud native approaches–
minimally Docker, ideally Kubernetes, Terraform
• Experienced in Python based machine learning, using sci-kit learn, preferably also Spark ML