Development and integration of NLP/LLM/AI features under AWS environment * Automatic answering of questions by querying a large documentary database * Timeline of publication summaries related to user defined parameters (topic and location) * Document deduplication * Named entity recognition * Document classification
Translation of customer needs into Data Science issues
Abstraction and modeling of business problems
Production of prototypes of functionalities/models
Sodexo (6 months) * Construction of data processing pipeline in distributed environment * Construction of a restaurant attendance forecasting model in order to reduce waste tools: Dataiku, PySpark, Azure, time series, LSTM, Prophet, SARIMA
BNP Paribas (6 months) * Development of a disasters clustering model * Topic extraction * Text classification * Text translation tools: Python, Scikit-Learn, NLP
Airbus (one year) * Setting up a data processing pipeline under Palantir's Foundry environment * Data engineering tools: Palantir Foundry, PySpark, Hive, Scala, Spark
Computer Science Researcher
Laboratoire Bordelais de Recherche en Informatique
January 2014
to February 2017
optimize in time and space the calculation of queries called Skyline within relational databases
estimation of the size of the query result
approximate calculation
identification of relationships (in particular functional dependencies) between columns
pre-computation, data structure
Multidimensional data analysis and correlation detection
tools: Java, C++, BigData
Lecturer in statistics, SPSS
Université de Bordeaux
September 2015
to September 2016
Introduction to statistics with SPSS
Student assessment
Education
Engineer Statistician
Ecole Nationale de la Statistique et de l'Analyse de l'Information (ENSAI - Rennes)
September 2011
to November 2013
Data processing and analysis Statistical Information System
Skills
Data Science
Data processing
Data analysis
Decision support models
Classification, Clustering
Machine Learning
Data Mining
AI, LLMs
Tools
SAS (certification)
R, SPSS, Matlab, Spad
Scikit-learn, TensorFlow, Pytorch, Keras, MLOPS,
Tableau, PowerBI
AWS (EC2, EBS, Sagemaker, OpenSearch, ...)
GCP, Microsoft Azure, Dataiku, Palantir
Jupyter notebook, Jupyter Lab, Pycharm
ElasticSearch
Computer Science
Python, JAVA, C++, C, VBA
HTML5, Javascript, PHP, CSS
Base de données, SQL, NoSQL, Postgresql, MySQL, Oracle