An Introduction to Machine Learning

So. What is Machine Learning? As a Data Scientist, I want to give you a clear answer to this question, while explaining the concept of Artificial Intelligence. Here is an introduction to Machine Learning that will allow you to understand how this technology works and its various applications.

Definition

Machine Learning is a field of study in AI that aims to give machines the ability to learn. This very powerful technology has enabled the development of autonomous cars, voice recognition, and all so-called "intelligent" systems for the last 10 years.

Machine Learning was invented by Arthur Samuel in 1959, after he developed the first checkers program with artificial intelligence. This program had learned to play checkers by itself, without receiving any instruction from its developer. "How did he do it?" We'll see later in this article.

Following his invention, Arthur Samuel formulated the historical definition of Machine Learning:

"Field of study that gives computers the ability to learn without being explicitly programmed"

Statistical Learning

Today, we have a more mathematical way to define Machine Learning. We call it Statistical Learning because most of the algorithms used in Machine Learning are actually statistical models developed from data. Among these models, we find for example decision trees, linear regression and Bayesian models.

How Does Machine Learning Work?

To understand how a machine can learn from data, we need to look at the 3 learning paradigms of Machine Learning:

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

Defintion

Supervised Learning is used to develop predictive models, i.e. models that can predict a certain value y according to variables x_1, x_2, etc.

To develop such models, we must first provide the machine with a large quantity of data (x, y). This is called a dataset. Then, we ask the machine to develop an approximation function that best represents the relationship $x \rightarrow y$ present in our data. For this, we use an optimization algorithm that minimizes the differences between the function and the data in the dataset.

The applications of Supervised Learning are many, very many! We can divide them into 2 categories of problems: Regressions, and Classifications.

Regression Problems

Regression problems correspond to situations in which the machine must predict the value of a quantitative variable (continuous variable)

Examples of y (target) variables:

the price of an apartment
the evolution of the climate
the price of the stock market

Classification Problems

Classification problems correspond to situations in which the machine must predict the value of a qualitative variable (discrete variable). In other words, the machine must classify what it is given into classes.

Examples of y (target) variables:

Spam / non Spam email
Cancer / no Cancer
Cat / Dog picture

Unsupervised Learning: No Variable y

Another learning method for developing Machine Learning programs is Unsupervised Learning. This method is used when our dataset does not contain examples that indicate what we are looking for. Wait... let me give you an example!

Look at these 6 pictures. Can you group them into 2 families according to their similarity?

Of course you can! It's actually quite simple. You don't need to know whether they are animal cells, bacteria or proteins to learn how to classify these images. Your brain actually recognized common structures in the data shown to you.

In unsupervised learning, we have a dataset x with no variable y, and the machine learns to recognize structures in the data x that we show it.

We can thus group data into clusters (this is called clustering), to detect anomalies, or to reduce the dimension of very rich data by compressing the dimensions together.

Reinforcement Learning: Learning in The Environment

We develop a program that forces the machine to maximize its bonus, and the machine then analyzes its own past mistakes in order to improve over time. It's a little bit like when we learn to ride a bike: at first we don't succeed at all, but as we ride, we intrinsically develop our balance so as to avoid the mistakes of the past that made us fall!

Conclusion: Machine Learning is The New Electricity, It Gives Everything a New Direction.

In 2022, Machine Learning is already all around us. In fact, you probably use it hundreds of times a day without even realizing it. Every time you perform a search in Google, it's a Machine Learning model that has learned how to rank the most relevant results on the first page from millions of possible web pages.

When you post a photo of yourself on Facebook, there is a Machine Learning algorithm that manages to identify you because it has learned to recognize faces in photos.

Machine Learning has already started to change the face of our world. It is revolutionizing the transportation industry with the autonomous car and powering our connected objects: iPhone, voice recognition and computer vision. It diagnoses cancer better than a team of doctors and improves banking security, computer security, makes fairer judicial decisions... Even the agricultural industry and the art world are affected by machine learning.

If you have read all this article, it means that you are very interested in Machine Learning and I congratulate you! Don't hesitate to contact me for any question.

Data World