A spark application is made of:

  • several execution processes…

In software engineering, continuous deployment and integration is a growing trend consisting of frequently updating and releasing code via automation. Every change to the codebase is processed trough an automated pipeline: it is tested, and when merged with the main branch, deployed (in this case a new version of your code is released).

The principle behind CI is: by testing every new addition to your codebase, you can catch bugs early and improve the quality of your code before it being deployed.

The principle behind CD is: an efficient deployment pipeline will allow you to release more often with little…

Containers have a great advantage compared to traditional virtual machines is that they are lightweight as they do…

The recent developement in data storage and processing have been motivated by the increasing amount and complexity of data available to individuals and companies. Most of these recent advancements require sophisticated and powerful hardware. Aspiring data scientists must be able to understand and master these new tools.

For instance, given the computer power required to carry out training of algorithms, these are usually carried out “on the cloud” (i.e. by remote access to a virtual machine) which avoid to buy and maintain very expensive hardware especially if the peak usage is only occasional. …

Large data set with low variance:

original dataset:

k-nearest neighbours:

OpenCV is an open source C++ library focusing on computer vision launched in 1999 by Intel research. It is written in C++ but bindings in Python and Matlab are available. The project has been supported by Willow Garage since 2008 and is under active development. OpenCV provides tools for many computer vision applications such as image/gesture recognition, motion tracking, mobile robotics… Computer vision is closely related to machine learning thus OpenCV has a module that implements many traditional algorithms. And more recently, OpenCV 3 added support for deep learning algorithms.

I decided to do some experiments quite close to the…

Setting up Xcode for C++ projects is a four step process:

a. defining the build…

Ce tutoriel couvre l’installation de cette librarie (sur macOS seulement pour le moment), l’execution de simple code et un tour de sa riche interface. Une alternative à Armadillo est Eigen.

Armadillo est developpé et maintenu par la NITCA (National Information and Communications Technology of Australia). Leur but était de fournir une interface aussi facile d’utilisation que celle de Matlab pour les utilisateurs du language de bas-niveau qu’est…

René-Jean Corneille

Principal Data scientist. I write about Machine learning, C++ and Python coding.

