Fallou Tall

Resume created on DoYouBuzz

Experiences

Profile and analyze relevant data
Develop data processing pipelines
Profile and optimize SQL queries
Put the data processing pipelines into production
Automate quality assurance testing
Monitor production status
Develop a Pyspark library for quality assurance testing
Stack: Azure; AWS, Databricks, Python, Snowflake, Spark, SQL

Migrate Flume, Pig, Spark1 and Sqoop workflows to Spark2.
Orchestrate workflows on the new cluster with Oozie.
Migrate Oracle databases to Hive.
Developed a Spark library to boost the data engineers productivity.
Stack: Hadoop (Hive, HBase, HDFS, Oozie...), Scala, Spark, Oracle

Extract raw QoE (NPS, resp.) data from HDFS, then transform it and finally save it to the Hive data warehouse via Spark
Explore then prepare QoE data for modeling with Pyspark and Pandas
Predict customer QoE by machine learning with Scikit-learn
Deploy the QoE prediction model as a Flask API and store the results in SQL Server
Automate the retraining of the QoE prediction model at regular monthly intervals (experimentally defined)
Extract Churn data from Hive via Spark, then transform it and finally save it to the Hive data warehouse via Spark
Perform correlation analysis between NPS and QoE (Churn and QoE, resp.) with Pyspark and Pandas, then visualization as a Dashboard with Tableau
Explore then prepare Churn data for modeling with Spark
Develop customer churn prediction models (base, recharge and data) by machine learning with SparkML
Deploy churn prediction models in batch mode via Spark and store results in SQL Server
Automate the retraining of the model at regular monthly intervals (experimentally defined) with Oozie
Report traffic alarms and alerts in real time for the dynamic management of Orange sites
Predict the outages of these sites by machine learning based on the alarms and alerts data
Extract Fibers data from Hive, then transform it and finally save it to the Hive data warehouse via Spark
Develop algorithms for recommending fiber to customers and recommending areas to fiber to Orange with SparkML
Deploy recommendation models in batch mode via Spark and store results in SQL Server
Orchestrate data processing pipelines with Oozie
Stack: Hadoop (HDFS, Hive, Oozie), Scala, Spark, SQL Server, Tableau, Python, Scikit-Learn, Flask

Set up the micro-services architecture composed by the stack Kubernetes, Kafka, Cassandra, Spark and Node.js
Extract CDRs and Probs data from Kafka, then transform and store in Cassandra via Spark-Streaming
Develop a model for locating the living and working places of Orange customers in Dakar from CDRs data with Spark
Develop a model for determining the origin-destination matrix of Orange customers in Dakar from probs data with Spark
Validate algorithms with the urban transportation service data and demographic data
Extrapolate the results obtained on the entire population of Dakar
Predict population movements in Dakar by machine learning with SparkML based on origin-destination data combined with probes data
Dockerize and deploy the Spark applications on Kubernetes
Stack: Cassandra, Kafka, Kubernetes, Scala, Spark

Design and implement the architecture of the application
Scrape HR data from the HR platform with Beautiful Soop
Store scraped data into Google Cloud Storage
Clean and prepare training data for modeling
Define the intents and entities then manually create some dialog flows
Automatically generate dialog flows with Rasa Interactive
Develop an Intent Classification Model with Rasa-NLU and TensorFlow
Develop an entity recognition model with Rasa-NLU and Spacy
Develop a chatbot response prediction model with Rasa-Core and TensorFlow
Dockerize then connect the app to Facebook Messenger API by setting up a webhook
Deploy the chatbot on Google Cloud Platform via App-Engine Flex
Stack: Beautiful Soop, GCP, Messenger API, Python, Rasa-Core, Rasa-NLU, Spacy, TensorFlow

Literature review of job scheduling algorithms
Extract Slurm log history from MySQL with Pandas
Explore then prepare data for modeling with Pandas
Develop a clustering model of applications that run on the system by machine learning with Scikit-Learn
Develop a supercomputer user classification model by machine learning with Scikit-Learn
Deploy models as REST APIs
Develop an energy-efficient job scheduling algorithm in Python based on the prediction of the resource consumption of jobs and their owners
Stack: Anaconda, MySQL, Python, Scikit-Learn

Education

Skills

Apache Hadoop

Advanced
Apache Spark

Advanced
Cloudera/Hortonworks

Good
Kafka

Good

AWS

Advanced
Azure

Advanced
Databricks

Advanced
GCP

Advanced
Snowflake

Advanced

Machine Learning

Advanced
Deep Learning

Good
Statistics

Advanced
Data Strutures & Algorithms

Advanced
Problem Solving

Good

English

Advanced
French

Expert
Wolof

Expert

Python

Advanced
R

Good
Scala

Advanced
SQL

Advanced

Agile methodology

Advanced
Atlassian

Advanced
Github

Advanced

Certifications

AWS Certified Data Analytics - Speciality (on going...)

- April 2022

(View certification)

Databricks Developper Essentials

- January 2022

(View certification)

Databricks Certified Associate Developper for Apache Spark

- January 2022

(View certification)

Databricks Certified Associate Developper for Apache Spark

- October 2021

(View certification)

Azure Data Engineer Associate - MCID: 991749803

- October 2021

(View certification)

Google Cloud Pofessional Data Engineer

- December 2020

(View certification)

Your browser is outdated!

Lead Data Engineer - Azure, Databricks and Google Cloud Certified

Lead Data Engineer: View 360 project

Senior Data Engineer (consultant): Hadoop Cluster Migration

Data Engineer & Data Scientist (Consultant): Customer Experience Managment

Data Engineer & Data Scientist (Consultant): Implementation of a digital data monetization platform

Data Scientist: Development of an HR chatbot

Data Scientist: Optimization of supercomputer energy consumption by artificial intelligence

Cooperative Master in Mathematical Sciences - Major Big Data

African Institute for Mathematical Sciences (AIMS - Senegal)

Bachelor in Applied Mathematics

Cheikh Anta Diop University (UCAD - Senegal)

Deep Learning Specialization

deeplearning.ai

Big Data

Cloud

Data Science

Langages

Programming

Project Management

AWS Certified Data Analytics - Speciality (on going...)

Databricks Developper Essentials

Databricks Certified Associate Developper for Apache Spark

Databricks Certified Associate Developper for Apache Spark

Azure Data Engineer Associate - MCID: 991749803

Google Cloud Pofessional Data Engineer