Machine learning


Machine learning (ML) is the science of getting computers to take decisions without telling them how to take them. It is an expansive collection of theory and techniques that explores the mathematical foundations of learning, modeling and making decisions entirely from data and entirely by machines, without being directly instructed by humans. It is a very wide field connecting many fields, like statistics, mathematics, physics, psychology, neuroscience, computer science and more.

Formalism

At heart, machine learning is about making decisions. Or rather, getting a machine to do them for you. There is some input question and you want the computer to answer for you. Let's formalize this idea: call xx the observation1, which is what's known to the model in order to take a decision. Then, the machine must follow some procedure to compute the response or output yy; this procedure is represented by some function f(x)f(x). Generally speaking, the decision-making process is, very simply

y=f(x)y=f(x)

meaning, applying some procedure ff on the observation xx gives a response yy. xx belongs to the space of all possible observations XX and yy to the space of all possible responses YY. Then the procedure links these two spaces: f:XYf:X\mapsto Y.

The end goal of machine learning is to provide the machine an algorithmic method to figure out ff by itself. We provide the observations, sometimes we also provide responses as examples, and then the machine learns how to connect XX to YY. In other words, it invents ff autonomously. It's common to call this function fpredictf_\text{predict}, because it predicts a response from an observation.

The fundamental problem of machine learning then isn't to predict yy from xx, but rather getting the machine to autonomously learn a procedure fpredictf_\text{predict} that will do so. In mathematical terms, the problems is to find an fpredictf_\text{predict} that does what we want in the space FXY\mathcal{F}_{X\mapsto Y} of all functions that go from XX to YY.

Now that we have a formal definition of the problem, it's time to look for a solution. How do we get a machine to learn? How do we explore the space FXY\mathcal{F}_{X\mapsto Y} to find our fpredictf_\text{predict}?

Well, as it happens, there's a lot of ways. This shouldn't be that surprising considering that, if you think about it, the fundamental problem is actually incredibly vague and broad. As such, it accepts many, many solutions, each with their pros and cons. But the most straight-forward and intuitive one is to get the model to learn by example: this is called supervised machine learning, referring to the fact that the machine learns from curated, supervised examples. Another major paradigm is its opposite: unsupervised machine learning, where the model is just given a lot of observations with no desired responses and is expected to draw conclusions about them by itself. These two are, by-and-large, the two major branches of machine learning.

Footnotes

  1. It goes by many names: observation, input, instance, data point...