An Equation-Free Explanation of Machine Learning

Over the past few years there’s been a growing interest in artificial intelligence with many articles about its accomplishments, its potential and its implications for the future. These articles often reference how the software “learns” or how it “chooses” the best solution but few of us understand what these terms actually mean when applied to a machine. Machine learning to many people is just a magical black box.

My interest in AI grew substantially after reading the Artificial Intelligence Revolution article on Wait But Why. I then looked further into DeepMind and found some of the amazing things machine learning can do today. It was all very interesting but I needed to know what was happening inside the black box. I knew that you could feed in a bunch of data at one end and then answers start popping out of the other but at the time I had no idea how it worked.

I’ve now just finished week 8 of 11 of a Machine Learning course from Coursera and I feel I have at good understanding of at least some machine learning algorithms. This is probably still nothing compared to the highly developed AIs like DeepMind or Watson but it’s definitely a start.

So, now that I have this introductory knowledge I hope to show others a peek inside the black box. The explanation below is my attempt to explain logistic regression, one type of machine learning, without using any mathematic equations. With this simplification I will be glossing over some elements but I hope to retain the most important core concepts.

On with the learning!

Logistic regression is used to group data into pre-defined categories. It first learns from an initial set of data with group labels and then applies that knowledge to more unlabelled examples. One common example would be grouping emails into spam or not spam. The example below (definitely not a typical use case) will hopefully help you understand how that learning takes place.

Imagine a large park full people. Each of these people supports either the Red Team or the Blue Team. These teams could represent football clubs, Trump/Clinton supporters, pick your poison, but I’ll just stick to calling them Red and Blue. A few of the supporters on the Red Team are wearing Red Hats and a few of the supporters on the Blue Team are wearing Blue Hats, but most of the supporters are hatless. Each team has a favourite pub, the Red Pub and the Blue Pub. The pubs are located at opposite ends of the park, and for the most part the Red and Blue teams will flock towards their favourite pub.

We have the task of distributing Red or Blue Hats to every hatless person. Each hatless person is already a Red or Blue Team supporter, but we have to guess which team they support before giving out the hats. To do this we send out an autonomous drone to fly over the park with a giant curtain.

The drone will need to learn on its own where the Red and Blue Team supporters are standing and then use the curtain to divide the two groups. The drone has no data about the pubs or the layout of the park so it has no way of knowing how the supporters will be distributed. It also has no cameras so it can’t even see the lay of the land. It does however have a GPS and it knows the location of the park so it starts out by just hovering in a random location above the supporters.

Even though the drone cannot see the supporters it can hear them. Each hat is equipped with a microphone that can record only the sounds of the person wearing it. The supporters are very vocal so if they are on the wrong side of the curtain they will yell and complain. The microphones pick up only the complaints of the supporters with hats (our labelled data in machine learning terms) while the hatless supporters (unlabelled data) remain silent to our drone.

Conveniently the volume of each yelling supporter will be directly related to how far away they are from the curtain: the further they are from the curtain, the louder they will yell. The total volume from all of the hat-wearing, yelling supporters is a measurement of the amount of error in the drone’s location and orientation.

Next the drone makes a few very small movements. First it moves a tiny amount east and then west and records the difference in the error (the volume of the angry supporters). It then moves north and south and tests again. Finally it rotates clockwise and counter-clockwise and makes further measurements. Each test helps to identify a direction to move that will lower the overall error. The drone then raises the curtain, moves to a new position based on its findings, lowers the curtain again and repeats the test.

With each of the drone’s moves, the error drops…

…until the lowest level of error is found and no movement in any direction would improve it.

At this point the groups have been split, the hats are distributed to each side and hopefully most supporters get the correct coloured hat. The drone has successfully used a logistic regression algorithm to find the best straight line placement to separate the data (Red and Blue Team supporters).

The key thing to note is that the drone was never given any specific instructions; it just listens to the labelled data (the people with the hats) and makes adjustments until it fits. Because of this simple ruleset (continue moving in a direction that lowers the volume of yelling supporters until the lowest level is reached), the drone could be dropped in any number of different parks with different arrangements of people and each time it would learn the best way to divide the groups.

This example only explains a very simple problem where the data can be separated into 2 groups by a straight line, but logistic regression can be used to split multiple groups with much more complex curves. If you’re interested to learn more, let me know in the comments and I might extend this analogy in another post.

If you would like to read more about machine learning, I would recommend this article which needs to be read on a large screen to take full advantage of the interactive animations.