I wondered about this for the longest time. I couldn't wrap my head around it. But eventually, after reading the math, I have a very good sense of what actually is happening behind the scene. Unfortunately math behind is too complicated for laymen, and hence the fundamentals of machine learning is inaccessible and Math-walled.
So, I'm going to use very little math to explain how machines actually learn, and trust me, by the end of it, you'll know exactly what's going on. But please bear with the simple math, for as Einstein said:
Everything should be made as simple as possible, but no simpler.
Let's start with a simple problem. Multiplication. Let assume, we have a multiplication table for an unknown number like so:
? x 1 = 3
? x 2 = 6
? x 3 = 9
? x 4 = 12
And we need a machine to figure out what the unknown number is.
The entire exercise is a very simple machine learning problem. We have a list of input numbers 1,2,3,4 and a list of output numbers 3,6,9,12. The "model" we're assuming is that there exists a hidden number, that, when multiplied by the input, returns the output. We just need to find what the hidden number "?" is.
Looks simple if you know division, but let's say you don't know division. You only know how to multiply. What are you going to do?
Well, you could try plugging in different numbers instead of ? and seeing which one fits.
1 wouldn't fit, 2 wouldn't fit, and 3 would fit perfectly. But imagine if you have arbitrary .unknown multiplication table, and imagine having to plugin every number from 1 to the "answer". This could be thousands or millions or billions of trials depending on how large the unknown number is...
We need a systematic way, or an "algorithm" that the machine can use to solve the problem. So we start with a number, say 1, and we plugin 1 instead of ? and see the results.
1 x 1 = 1
1 x 2 = 2
1 x 3 = 3
1 x 4 = 4
Now let's see how "off" we are from the target. For that, lets take the difference between the "target" and the results from the current guess, 1. We do this for each multiplication in the table.
? x 1 = 3 and 1 x 1 = 1. So the difference between them is 3 - 1 = 2.
? x 2 = 6 and 1 x 2 = 2. So the difference between them is 6 - 2 = 4.
Doing this for the entire table, we get:
3 - 1 = 2
6 - 2 = 4
9 - 3 = 6
12 - 4 = 8
Ok, we have a list of numbers, (2,4,6,8). Now what? Well we need a single number to denote how "off" we are from the target. So let's just add them up. 2 + 4 + 6 + 8 = 20.
Now this "20"is an important number. It says that with our current guess we're off by "20". In machine learning terminologies, it's called "cost". Ideally we want our cost to be 0. Meaning the difference between the result from our guess and the target is negligible or non-existent. In other words, if the cost is 0, the guess is perfect.
Now, based on this, we need to guess the next number we're going to try. We could try a number above 1, or a number below 1. So the direction in which we move towards needs to be determined.
For this, we do a small in-place experiment. We take a small increment of 0.01 and a small decrement of -0.01 and see what happens to the "cost".
So, we're going to plugin 2 numbers, 1.01 and 0.99.
With 1.01:
1.01 x 1 = 1.01
1.01 x 2 = 2.02
1.01 x 3 = 3.03
1.01 x 4 = 4.04
With 0.99:
0.99 x 1 = 0.99
0.99 x 2 = 1.98
0.99 x 3 = 2.97
0.99 x 4 = 3.96
Cost with 1.01:
3 - 1.01 + 6 - 2.02 + 9 - 3.03 + 12 - 4.04 = 19.9
Cost with 0.99
3 - 0.99 + 6 - 1.98 + 9 - 2.37 + 12 - 3.96 = 20.1
When our guess was 1, our cost was 20. With 1.01, our cost became 19.9 and hence reduced by "0.1". Since our cost is a substitute for how "off" we are from the target number, reduction in cost means we're getting closer to the target number.
So based on the above experiment, we conclude that we need to increase our guess number. What's a good number above 1? Well.. 2 is good.
What's the cost with 2? It comes to 10. Getting closer! We then repeat the same experiment, and see if we need to increase or decrease. Repeating the process over and over, we will finally get the guess number as 3.
What happens if we overshoot 3 and go from 2 straight to 4? Well, let see what happens to the cost.
? x 1 - 4 x 1 = 3 - 4 = -1
? x 2 - 4 x 2 = 6 - 8 = -2
... so on.
What's our cost? -10. Since we need our cost to be as close to 0. Calculating the direction from the guess values of 4, with the in-place experiment 4.01 and 3.99, we get the following costs.
with 4.01 => cost is -10.1
with 3.99 => cost is -9.9
Since -9.9 is closer to 0 than -10.1, decreasing the guess brings the cost down towards 0, and so our next guess should be a number below 4.
Seems trivial, but essentially, the experiment we did by adding and subtracting 0.01 and checking the impact on the cost, is actually called gradient analysis.
The method by which we calculate cost of a guess, i.e., taking the the result from the guess, subtracting it from the target, and summing up the difference is called the "cost function".
With these 2 concepts in mind, we can now proceed to the heart of the mechanical brain.
The learning machine and how it learns
Our brain is a complex machine, but at the heart of it lies the humble neuron. The neuron is a very simple thing. It has multiple "incoming" telephone cables from other "neurons" and it has a single "outgoing" telephone cable.
So, it can receive from multiple other neurons, but can only send to 1 other neuron. The neuron fires, or "sends" when enough incoming connections are firing into it. What is "enough"?
That's where the "learning" is. The neuron adjusts the "volume" on incoming telephone cables, and then listens.
If the total sum of the incoming "sound" reaches a "threshold", it will fire.
Let's say we have a simple neuron, which has 3 incoming connections from 3 different neurons. On each incoming line, we have a volume dial, where we can set the volume for the incoming connection. Meaning even the incoming signal from a input-neuron is as loud as possible, but the neuron's volume dial for that telephone line is set to 0, it's not going to create an sound in the neuron.
Mathematically, it can be represented as the following:
Incoming Signal 1 x Preset Volume for Signal 1 = Final Signal from neuron 1
Incoming Signal 2 x Preset Volume for Signal 2 = Final Signal from neuron 2
Incoming Signal 3 x Preset Volume for Signal 3 = Final Signal from neuron 3
Please note that the "Preset Volume" for each signal is learned by the neuron. It can increase or decrease over time.
Notice how it's similarly structured to our multiplication table problem. The hidden number that is needed to be calculated is the Preset Volume for each incoming signal.
Next we need to sum all the volumes, meaning:
Final Signal From Neuron 1 + Final Signal From Neuron 2 + Final Signal From Neuron 3 = The total signal volume that is incoming.
Next, the neuron needs to fire if the total signal volume is greater than a threshold amount, and that too is a property of the neuron that it will learn.
Stay tuned for the next part of this article, where we'll tie everything together and create a neural network. Then we'll see how it's actually trained by data.
Core Maitri is an enterprise software consultancy specializing in Excel-to-Web, AI Integration, and Enterprise Application Development services. Our approach is deeply consultative, rooted in understanding problems at their core and then validating our solutions through iterative feedback.