Project Description
The goal for the long-term project was to migrate the data warehouse from PostgreSQL basis to the Apache Hadoop stack. To achieve all objectives, an ETL process with more than 30 data sources, incl. Google Analytics 360, Facebook API, Mixpanel, and Webtrekk were maintained. With this in mind, it became possible to extract and calculate business-critical key performance indicators irrespective of data source borders.
Technologies
Python, Scala, Apache Spark, Pentaho, Apache Hadoop, PostgreSQL
Timeframe
07/2015 - 06/2016