Pierre Canavelli

machine learning scientist & data engineer

En télétravail depuis Paris

Localisation et déplacement

Paris, France
Effectue ses missions majoritairement à distance


Compétences (25)

  • NLP
  • Frameworks
Pierre en quelques mots

I a PhD machine learning scientist and graduate from the Ecole Normale Supérieure of Paris with 7 years' of experience applying AI to academic research and real-world problems. I am used to working with difficult data (e.g. terabyte-sized, extremely imbalanced or dirty datasets) spanning a variety of industries, including financial timeseries; highly sensitive, anonymized health records; and noisy, low-resolution video. I have experience working with tasks covering fraud detection, predictive maintenance, rare disease diagnosis, video processing, and customer sentiment analysis.

I do most of my ML work in Python, SQL, and Julia, and implement models using TensorFlow, SKLearn, and a handful of common libraries such as XGBoost, LightGBM, and CatBoost. I am adamant about following software engineering best practices: all production code must be modular, unit-tested, documented, human-readable, and adhere to the Google coding standards.

Having spent most of my career doubling as a data engineer, I am also well-versed in setting up, expanding, and maintaining data mining, ETL, feature engineering, and warehousing pipelines using PySpark and Dask frameworks. I also have a strong experience in working with real-time data streaming pipelines using Kafka, Faust and ClickHouse.

Due to my past experience working as a technical consultant for major management consulting firms, I am a clear and concise communicator, and welcome client-facing activities. I particularly enjoy working in diverse, small-to-medium-sized teams alongside colleagues hailing from different backgrounds and fields of expertise.



High tech

Machine Learning Scientist & Engineer


janvier 2020 - Aujourd'hui (1 an et 10 mois)

Shadow is building the future of personal computers. I joined them with a double mission: to expand their data engineering stack, and to bring machine learning to the company. Current activities include:

- Expanding, improving, and maintaining our data streaming pipelines using Kafka, Faust and ClickHouse.

- Democratising access to data by creating interactive dashboards and accessible endpoints using Metabase and FastAPI.

- Leading AI/ML projects spanning anomaly detection, NLP, causal analysis and supervised learning.

- Providing expertise in deep learning and engineering best practices on R&D projects focused on real-time video data.

- Mentoring analysts, data scientists and R&D engineers.

- Writing and maintaining Spellbook, our in-house ETL and ML/DL library.


Industrie pharmaceutique

Data Scientist

London, United Kingdom

mai 2019 - octobre 2019 (5 mois)

IQVIA is the world leader in healthcare information technologies. I had the pleasure to join their Predictive Analytics team, where I worked on developing new machine learning models for the early diagnosis of rare diseases. It was all going great. But then, Brexit happened.

- Designed and delivered machine learning models for the predictive diagnosis of rare diseases among the US population.

- Wrote and maintained Python packages automating the ETL, feature engineering and QC of terabyte-scale datasets.

- Provided ad hoc statistical data analysis for US- and UK-based healthcare consulting teams.



Data Scientist

London, United Kingdom

novembre 2018 - mai 2019 (6 mois)

Resonate's mission is to improve the efficiency, safety and sustainability of the rail industry using data science and machine learning. I worked with Resonate's Data team as a full-stack data scientist, splitting my time between designing, training and deploying predictive algorithms, and building up our data engineering stack.

- Trained and deployed machine learning models for the forecasting and prevention of disruptive events on the National Rail network.

- Designed and implemented end-to-end machine learning pipelines allowing for the automated data preparation, training and evaluation of ML models on EC2 instances.

- Initiated a 4-people effort to refactor, optimise and document 5,000+ lines of legacy code, leading to performance improvements up to 200x and full compliance with Google coding standards.

- Implemented and maintained data ingestion pipelines using AWS S3, Glue, Athena, QuickSight, and SPICE, delivering customer-facing dashboards and static reports.


Data Science Fellow

London, United Kingdom

septembre 2018 - novembre 2018 (2 mois)

