You will build large-scale data pipelines with data processing frameworks such as Scio, Spark on Google Cloud Platform and using Scala, Python and SQL. - Write testable, efficient and reusable code while maximizing standard methodologies in continuous integration and delivery - Help drive optimization, testing, monitoring and tooling to improve data quality Collaborate with other Software Engineers, Data Scientists and other partners, tackling learning and leadership opportunities that will arise every single day - Contribute to modeling, crafting and maintaining data solutions to improve usability and accessibility to data # Requirements - You have some Data Engineering experience and you know how to work with high- volume, heterogeneous data, preferably with distributed systems such as BigTable, Cassandra in cloud environments like GCP or AWS - You have experience with one or more higher-level JVM-based data processing frameworks such as Beam, Dataflow, Crunch, Scalding, Storm, Spark, Flink, or something we didn't list â but not just BigQuery, Pig, Hive or other SQL-like abstractions - You have worked with Kubernetes, Docker as well as Luigi, Airflow, or similar tools - You are familiar with data modelling concepts (conceptualizing schemas, using top-down and bottom-up methodologies) and are able to empathize with users trying to understand how to find and use data - You are passionate about crafting clean code and have experience in coding and building data pipelines while being pragmatic and understanding the tradeoffs between the perfect solution and a working solution - You care about agile software processes, reliability, and responsible experimentation - You understand the value of collaboration and partnership within teams and are comfortable working both independently and collaboratively (pairing and mobbing

