I - Developed ML (NLP) based product features from the ground up i.e. from data acquisition to production-ready API.
- Goal: Automatically extract key data from scanned contract documents - main dates and durations, parties.
- Data Acquisition: Managed remote teams to create several datasets of labeled contracts used to train ML models.
- Cleaning: detection and correction of labels, classes balancing.
- Machine learning: Built a python package that:
1) Extracts a collection of potential key data from a contracts database.
2) Trains binary classifiers (Random Forest) to select the right value from that collection. For example train a model to choose a contract’s effect date among all its detected dates on a large database of contracts.
3) Apply the “data extraction + trained model” pipeline on customers
II) Microservice delivering ML
Built an AWS based pipeline with terraform that delivers machine learning predictions as an API and stores predictions on CockroachDB.
- POC involved deep learning with keras.
- Built a project workflow to conciliate data scientists’ R&D processes with agile software development.
- Built a clean data science environment based on versioning of code, experiments with github, and data and models with DVC and Amazon S3.
Data wrangling, visualization, and machine learning for several companies -mainly banks. Worked mostly on recommendation systems.
Main tools involved during missions:
- R: tidyverse, RMarkdown, RShiny.
- Python: pandas, scikit-learn, Flask with PostgreSQL.
- Algorithms used for machine learning: Linear Regressions, CART, Random Forest or XGBoost most often.
Worked under the supervision of Matthew Dailey at AIT Vision and Graphics Lab. Created a deep learning software with Caffe. Given a database of surveillance videos and images of a person, the software tells in which videos and at which time the person appears. Purpose: Detection of criminals.