Project Description

The goal for the long-term project was to migrate the data warehouse from PostgreSQL basis to the Apache Hadoop stack. To achieve all objectives, an ETL process with more than 30 data sources, incl. Google Analytics 360, Facebook API, Mixpanel, and Webtrekk were maintained. With this in mind, it became possible to extract and calculate business-critical key performance indicators irrespective of data source borders. 

Technologies

Python, Scala, Apache Spark, Pentaho, Apache Hadoop, PostgreSQL

Timeframe

07/2015 - 06/2016