“When we don’t know exactly how something works we can try to estimate it with a model which includes parameters which we can adjust. If we didn’t know how to convert kilometres to miles, we might use a linear function as a model, with an adjustable gradient.

A good way of refining these models is to adjust the parameters based on how wrong the model is compared to known true examples.

Classifying is Not Very Different from Predicting

We called the above simple machine a predictor , because it takes an input and makes a prediction of what the output should be. We refined that prediction by adjusting an internal parameter, informed by the error we saw when comparing with a known-true example.

Now look at the following graph showing the measured widths and lengths of garden bugs.

You can clearly see two groups. The caterpillars are thin and long, and the ladybirds are wide and short.

Remember the predictor that tried to work out the correct number of miles given kilometres? That predictor had an adjustable linear function at it’s heart. Remember, linear functions give straight lines when you plot their output against input. The adjustable parameter c changed the slope of that straight line.

What happens if we place a straight line over that plot?

We can’t use the line in the same way we did before – to convert one number (kilometres) into another (miles), but perhaps we can use the line to separate different kinds of things.

In the plot above, if the line was dividing the caterpillars from the ladybirds, then it could be used to classify an unknown bug based on its measurements. The line above doesn’t do this yet because half the caterpillars are on the same side of the dividing line as the ladybirds.

Let’s try a different line, by adjusting the slope again, and see what happens.

This time the line is even less useful! It doesn’t separate the two kinds of bugs at all.

Let’s have another go:

That’s much better! This line neatly separates caterpillars from ladybirds. We can now use this line as a classifier of bugs.

We are assuming that there are no other kinds of bugs that we haven’t seen – but that’s ok for now, we’re simply trying to illustrate the idea of a simple classifier.

Imagine next time our computer used a robot arm to pick up a new bug and measured its width and height, it could then use the above line to classify it correctly as a caterpillar or a ladybird.

Look at the following plot, you can see the unknown bug is a caterpillar because it lies above the line. This classification is simple but pretty powerful already!

We’ve seen how a linear function inside our simple predictors can be used to classify previously unseen data.

But we’ve skipped over a crucial element. How do we get the right slope? How do we improve a line we know isn’t a good divider between the two kinds of bugs?

The answer to that is again at the very heart of how neural networks learn, and we’ll look at this next.

Training A Simple Classifier

We want to train our linear classifier to correctly classify bugs as ladybirds or caterpillars. We saw above this is simply about refining the slope of the dividing line that separates the two groups of points on a plot of big width and height.

How do we do this?

Rather than develop some mathematical theory upfront, let’s try to feel our way forward by trying to do it. We’ll understand the mathematics better that way.

We do need some examples to learn from. The following table shows two examples, just to keep this exercise simple.

Example

Width

Length

Bug

1

3.0

1.0

ladybird

2

1.0

3.0

caterpillar

We have an example of a bug which has width 3.0 and length 1.0, which we know is a ladybird. We also have an example of a bug which is longer at 3.0 and thinner at 1.0, which is a caterpillar.

This is a set of examples which we know to be the truth. It is these examples which will help refine the slope of the classifier function. Examples of truth used to teach a predictor or a classifier are called the training data .”