02. Neural Network Classification with TensorFlow¶

Okay, we've seen how to deal with a regression problem in TensorFlow, let's look at how we can approach a classification problem.

A classification problem involves predicting whether something is one thing or another.

For example, you might want to:

Predict whether or not someone has heart disease based on their health parameters. This is called binary classification since there are only two options.
Decide whether a photo of is of food, a person or a dog. This is called multi-class classification since there are more than two options.
Predict what categories should be assigned to a Wikipedia article. This is called multi-label classification since a single article could have more than one category assigned.

In this notebook, we're going to work through a number of different classification problems with TensorFlow. In other words, taking a set of inputs and predicting what class those set of inputs belong to.

What we're going to cover¶

Specifically, we're going to go through doing the following with TensorFlow:

Architecture of a classification model
Input shapes and output shapes
- X: features/data (inputs)
- y: labels (outputs)
  - "What class do the inputs belong to?"
Creating custom data to view and fit
Steps in modelling for binary and mutliclass classification
- Creating a model
- Compiling a model
  - Defining a loss function
  - Setting up an optimizer
    - Finding the best learning rate
  - Creating evaluation metrics
- Fitting a model (getting it to find patterns in our data)
- Improving a model
The power of non-linearity
Evaluating classification models
- Visualizng the model ("visualize, visualize, visualize")
- Looking at training curves
- Compare predictions to ground truth (using our evaluation metrics)

How you can use this notebook¶

You can read through the descriptions and the code (it should all run, except for the cells which error on purpose), but there's a better option.

Write all of the code yourself.

Yes. I'm serious. Create a new notebook, and rewrite each line by yourself. Investigate it, see if you can break it, why does it break?

You don't have to write the text descriptions but writing the code yourself is a great way to get hands-on experience.

Don't worry if you make mistakes, we all do. The way to get better and make less mistakes is to write more code.

Typical architecture of a classification neural network¶

The word typical is on purpose.

Because the architecture of a classification neural network can widely vary depending on the problem you're working on.

However, there are some fundamentals all deep neural networks contain:

An input layer.
Some hidden layers.
An output layer.

Much of the rest is up to the data analyst creating the model.

The following are some standard values you'll often use in your classification neural networks.

Hyperparameter	Binary Classification	Multiclass classification
Input layer shape	Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction)	Same as binary classification
Hidden layer(s)	Problem specific, minimum = 1, maximum = unlimited	Same as binary classification
Neurons per hidden layer	Problem specific, generally 10 to 100	Same as binary classification
Output layer shape	1 (one class or the other)	1 per class (e.g. 3 for food, person or dog photo)
Hidden activation	Usually ReLU (rectified linear unit)	Same as binary classification
Output activation	Sigmoid	Softmax
Loss function	Cross entropy (`tf.keras.losses.BinaryCrossentropy` in TensorFlow)	Cross entropy (`tf.keras.losses.CategoricalCrossentropy` in TensorFlow)
Optimizer	SGD (stochastic gradient descent), Adam	Same as binary classification

Table 1: Typical architecture of a classification network. Source: Adapted from page 295 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow Book by Aurélien Géron

Don't worry if not much of the above makes sense right now, we'll get plenty of experience as we go through this notebook.

Let's start by importing TensorFlow as the common alias tf. For this notebook, make sure you're using version 2.x+.

In [1]:

Copied!

import tensorflow as tf
print(tf.__version__)

import datetime
print(f"Notebook last run (end-to-end): {datetime.datetime.now()}")
import tensorflow as tf
print(tf.__version__)

import datetime
print(f"Notebook last run (end-to-end): {datetime.datetime.now()}")

2.13.0
Notebook last run (end-to-end): 2023-10-12 04:07:12.774646

Creating data to view and fit¶

We could start by importing a classification dataset but let's practice making some of our own classification data.

🔑 Note: It's a common practice to get you and model you build working on a toy (or simple) dataset before moving to your actual problem. Treat it as a rehersal experiment before the actual experiment(s).

Since classification is predicting whether something is one thing or another, let's make some data to reflect that.

To do so, we'll use Scikit-Learn's make_circles() function.

In [2]:

Copied!





from sklearn.datasets import make_circles

# Make 1000 examples
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)
from sklearn.datasets import make_circles

# Make 1000 examples
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)

Wonderful, now we've created some data, let's look at the features (X) and labels (y).

In [3]:

Copied!

# Check out the features
X
# Check out the features
X

Out[3]:

array([[ 0.75424625,  0.23148074],
       [-0.75615888,  0.15325888],
       [-0.81539193,  0.17328203],
       ...,
       [-0.13690036, -0.81001183],
       [ 0.67036156, -0.76750154],
       [ 0.28105665,  0.96382443]])

In [4]:

Copied!

# See the first 10 labels
y[:10]
# See the first 10 labels
y[:10]

Out[4]:

array([1, 1, 1, 1, 0, 1, 1, 1, 1, 0])

Okay, we've seen some of our data and labels, how about we move towards visualizing?

🔑 Note: One important step of starting any kind of machine learning project is to become one with the data. And one of the best ways to do this is to visualize the data you're working with as much as possible. The data explorer's motto is "visualize, visualize, visualize".

We'll start with a DataFrame.

In [5]:

Copied!





# Make dataframe of features and labels
import pandas as pd
circles = pd.DataFrame({"X0":X[:, 0], "X1":X[:, 1], "label":y})
circles.head()
# Make dataframe of features and labels
import pandas as pd
circles = pd.DataFrame({"X0":X[:, 0], "X1":X[:, 1], "label":y})
circles.head()

Out[5]:

	X0	X1	label
0	0.754246	0.231481	1
1	-0.756159	0.153259	1
2	-0.815392	0.173282	1
3	-0.393731	0.692883	1
4	0.442208	-0.896723	0

What kind of labels are we dealing with?

In [6]:

Copied!

# Check out the different labels
circles.label.value_counts()
# Check out the different labels
circles.label.value_counts()

Out[6]:

1    500
0    500
Name: label, dtype: int64

Alright, looks like we're dealing with a binary classification problem. It's binary because there are only two labels (0 or 1).

If there were more label options (e.g. 0, 1, 2, 3 or 4), it would be called multiclass classification.

Let's take our visualization a step further and plot our data.

In [7]:

Copied!

# Visualize with a plot
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu);
# Visualize with a plot
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu);

No description has been provided for this image

Nice! From the plot, can you guess what kind of model we might want to build?

How about we try and build one to classify blue or red dots? As in, a model which is able to distinguish blue from red dots.

🛠 Practice: Before pushing forward, you might want to spend 10 minutes playing around with the TensorFlow Playground. Try adjusting the different hyperparameters you see and click play to see a neural network train. I think you'll find the data very similar to what we've just created.

Input and output shapes¶

One of the most common issues you'll run into when building neural networks is shape mismatches.

More specifically, the shape of the input data and the shape of the output data.

In our case, we want to input X and get our model to predict y.

So let's check out the shapes of X and y.

In [8]:

Copied!

# Check the shapes of our features and labels
X.shape, y.shape
# Check the shapes of our features and labels
X.shape, y.shape

Out[8]:

((1000, 2), (1000,))

Hmm, where do these numbers come from?

In [9]:

Copied!

# Check how many samples we have
len(X), len(y)
# Check how many samples we have
len(X), len(y)

Out[9]:

(1000, 1000)

So we've got as many X values as we do y values, that makes sense.

Let's check out one example of each.

In [10]:

Copied!

# View the first example of features and labels
X[0], y[0]
# View the first example of features and labels
X[0], y[0]

Out[10]:

(array([0.75424625, 0.23148074]), 1)

Alright, so we've got two X features which lead to one y value.

This means our neural network input shape will has to accept a tensor with at least one dimension being two and output a tensor with at least one value.

🤔 Note: y having a shape of (1000,) can seem confusing. However, this is because all y values are actually scalars (single values) and therefore don't have a dimension. For now, think of your output shape as being at least the same value as one example of y (in our case, the output from our neural network has to be at least one value).

Steps in modelling¶

Now we know what data we have as well as the input and output shapes, let's see how we'd build a neural network to model it.

In TensorFlow, there are typically 3 fundamental steps to creating and training a model.

Creating a model - piece together the layers of a neural network yourself (using the functional or sequential API) or import a previously built model (known as transfer learning).
Compiling a model - defining how a model's performance should be measured (loss/metrics) as well as defining how it should improve (optimizer).
Fitting a model - letting the model try to find patterns in the data (how does X get to y).

Let's see these in action using the Sequential API to build a model for our regression data. And then we'll step through each.

In [11]:

Copied!





# Set random seed
tf.random.set_seed(42)

# 1. Create the model using the Sequential API
model_1 = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_1.compile(loss=tf.keras.losses.BinaryCrossentropy(), # binary since we are working with 2 clases (0 & 1)
                optimizer=tf.keras.optimizers.SGD(),
                metrics=['accuracy'])

# 3. Fit the model
model_1.fit(X, y, epochs=5)
# Set random seed
tf.random.set_seed(42)

# 1. Create the model using the Sequential API
model_1 = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_1.compile(loss=tf.keras.losses.BinaryCrossentropy(), # binary since we are working with 2 clases (0 & 1)
                optimizer=tf.keras.optimizers.SGD(),
                metrics=['accuracy'])

# 3. Fit the model
model_1.fit(X, y, epochs=5)

Epoch 1/5
32/32 [==============================] - 5s 3ms/step - loss: 5.9177 - accuracy: 0.4800
Epoch 2/5
32/32 [==============================] - 0s 3ms/step - loss: 5.1146 - accuracy: 0.4620
Epoch 3/5
32/32 [==============================] - 0s 3ms/step - loss: 4.6022 - accuracy: 0.4720
Epoch 4/5
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 5/5
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000

Out[11]:

<keras.src.callbacks.History at 0x7a25080c58d0>

Looking at the accuracy metric, our model performs poorly (50% accuracy on a binary classification problem is the equivalent of guessing), but what if we trained it for longer?

In [12]:

Copied!

# Train our model for longer (more chances to look at the data)
model_1.fit(X, y, epochs=200, verbose=0) # set verbose=0 to remove training updates
model_1.evaluate(X, y)
# Train our model for longer (more chances to look at the data)
model_1.fit(X, y, epochs=200, verbose=0) # set verbose=0 to remove training updates
model_1.evaluate(X, y)

32/32 [==============================] - 0s 2ms/step - loss: 7.7125 - accuracy: 0.5000

Out[12]:

[7.712474346160889, 0.5]

Even after 200 passes of the data, it's still performing as if it's guessing.

What if we added an extra layer and trained for a little longer?

In [13]:

Copied!





# Set random seed
tf.random.set_seed(42)

# 1. Create the model (same as model_1 but with an extra layer)
model_2 = tf.keras.Sequential([
  tf.keras.layers.Dense(1), # add an extra layer
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_2.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.SGD(),
                metrics=['accuracy'])

# 3. Fit the model
model_2.fit(X, y, epochs=100, verbose=0) # set verbose=0 to make the output print less
# Set random seed
tf.random.set_seed(42)

# 1. Create the model (same as model_1 but with an extra layer)
model_2 = tf.keras.Sequential([
  tf.keras.layers.Dense(1), # add an extra layer
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_2.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.SGD(),
                metrics=['accuracy'])

# 3. Fit the model
model_2.fit(X, y, epochs=100, verbose=0) # set verbose=0 to make the output print less

Out[13]:

<keras.src.callbacks.History at 0x7a24b4b323e0>

In [14]:

Copied!

# Evaluate the model
model_2.evaluate(X, y)
# Evaluate the model
model_2.evaluate(X, y)

32/32 [==============================] - 0s 2ms/step - loss: 0.6934 - accuracy: 0.5000

Out[14]:

[0.6933949589729309, 0.5]

Still not even as good as guessing (~50% accuracy)... hmm...?

Let's remind ourselves of a couple more ways we can use to improve our models.

Improving a model¶

To improve our model, we can alter almost every part of the 3 steps we went through before.

Creating a model - here you might want to add more layers, increase the number of hidden units (also called neurons) within each layer, change the activation functions of each layer.
Compiling a model - you might want to choose a different optimization function (such as the Adam optimizer, which is usually pretty good for many problems) or perhaps change the learning rate of the optimization function.
Fitting a model - perhaps you could fit a model for more epochs (leave it training for longer).

various options you can use to improve a neural network model There are many different ways to potentially improve a neural network. Some of the most common include: increasing the number of layers (making the network deeper), increasing the number of hidden units (making the network wider) and changing the learning rate. Because these values are all human-changeable, they're referred to as hyperparameters) and the practice of trying to find the best hyperparameters is referred to as hyperparameter tuning.

How about we try adding more neurons, an extra layer and our friend the Adam optimizer?

Surely doing this will result in predictions better than guessing...

Note: The following message (below this one) can be ignored if you're running TensorFlow 2.8.0+, the error seems to have been fixed.

Note: If you're using TensorFlow 2.7.0+ (but not 2.8.0+) the original code from the following cells may have caused some errors. They've since been updated to fix those errors. You can see explanations on what happened at the following resources:

In [15]:

Copied!





# Set random seed
tf.random.set_seed(42)

# 1. Create the model (this time 3 layers)
model_3 = tf.keras.Sequential([
  # Before TensorFlow 2.7.0
  # tf.keras.layers.Dense(100), # add 100 dense neurons

  # With TensorFlow 2.7.0
  # tf.keras.layers.Dense(100, input_shape=(None, 1)), # add 100 dense neurons

  ## After TensorFlow 2.8.0 ##
  tf.keras.layers.Dense(100), # add 100 dense neurons
  tf.keras.layers.Dense(10), # add another layer with 10 neurons
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(), # use Adam instead of SGD
                metrics=['accuracy'])

# 3. Fit the model
model_3.fit(X, y, epochs=100, verbose=1) # fit for 100 passes of the data
# Set random seed
tf.random.set_seed(42)

# 1. Create the model (this time 3 layers)
model_3 = tf.keras.Sequential([
  # Before TensorFlow 2.7.0
  # tf.keras.layers.Dense(100), # add 100 dense neurons

  # With TensorFlow 2.7.0
  # tf.keras.layers.Dense(100, input_shape=(None, 1)), # add 100 dense neurons

  ## After TensorFlow 2.8.0 ##
  tf.keras.layers.Dense(100), # add 100 dense neurons
  tf.keras.layers.Dense(10), # add another layer with 10 neurons
  tf.keras.layers.Dense(1)
])

# 2. Compile the model
model_3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(), # use Adam instead of SGD
                metrics=['accuracy'])

# 3. Fit the model
model_3.fit(X, y, epochs=100, verbose=1) # fit for 100 passes of the data

Epoch 1/100
32/32 [==============================] - 2s 3ms/step - loss: 3.5433 - accuracy: 0.4520
Epoch 2/100
32/32 [==============================] - 0s 3ms/step - loss: 1.0533 - accuracy: 0.4910
Epoch 3/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7210 - accuracy: 0.5000
Epoch 4/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6998 - accuracy: 0.5000
Epoch 5/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6952 - accuracy: 0.4830
Epoch 6/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6941 - accuracy: 0.4910
Epoch 7/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6938 - accuracy: 0.4940
Epoch 8/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6945 - accuracy: 0.4990
Epoch 9/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6943 - accuracy: 0.4880
Epoch 10/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6945 - accuracy: 0.4550
Epoch 11/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6954 - accuracy: 0.4490
Epoch 12/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6946 - accuracy: 0.4860
Epoch 13/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6958 - accuracy: 0.4920
Epoch 14/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6949 - accuracy: 0.5150
Epoch 15/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6961 - accuracy: 0.4720
Epoch 16/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6943 - accuracy: 0.4880
Epoch 17/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6983 - accuracy: 0.4930
Epoch 18/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6945 - accuracy: 0.4730
Epoch 19/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6954 - accuracy: 0.5030
Epoch 20/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6957 - accuracy: 0.4600
Epoch 21/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6956 - accuracy: 0.4790
Epoch 22/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6948 - accuracy: 0.4440
Epoch 23/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6943 - accuracy: 0.4850
Epoch 24/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6962 - accuracy: 0.4690
Epoch 25/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6973 - accuracy: 0.5070
Epoch 26/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6978 - accuracy: 0.4870
Epoch 27/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.5010
Epoch 28/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6947 - accuracy: 0.4690
Epoch 29/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6974 - accuracy: 0.4860
Epoch 30/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6997 - accuracy: 0.4870
Epoch 31/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6947 - accuracy: 0.5060
Epoch 32/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.4710
Epoch 33/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6955 - accuracy: 0.4590
Epoch 34/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6972 - accuracy: 0.4770
Epoch 35/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6961 - accuracy: 0.5020
Epoch 36/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6946 - accuracy: 0.4680
Epoch 37/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6952 - accuracy: 0.4980
Epoch 38/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6976 - accuracy: 0.4930
Epoch 39/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6950 - accuracy: 0.4750
Epoch 40/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6966 - accuracy: 0.4970
Epoch 41/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6978 - accuracy: 0.4840
Epoch 42/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6994 - accuracy: 0.4770
Epoch 43/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6954 - accuracy: 0.5060
Epoch 44/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6982 - accuracy: 0.4900
Epoch 45/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6960 - accuracy: 0.5040
Epoch 46/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6948 - accuracy: 0.4810
Epoch 47/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6974 - accuracy: 0.5120
Epoch 48/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6967 - accuracy: 0.4930
Epoch 49/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6975 - accuracy: 0.4820
Epoch 50/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6970 - accuracy: 0.4640
Epoch 51/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6977 - accuracy: 0.4810
Epoch 52/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6963 - accuracy: 0.5080
Epoch 53/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6969 - accuracy: 0.5070
Epoch 54/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6997 - accuracy: 0.5120
Epoch 55/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6952 - accuracy: 0.5180
Epoch 56/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.4890
Epoch 57/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6980 - accuracy: 0.4730
Epoch 58/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.5080
Epoch 59/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7003 - accuracy: 0.4970
Epoch 60/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7009 - accuracy: 0.4930
Epoch 61/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6997 - accuracy: 0.4710
Epoch 62/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.4950
Epoch 63/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6948 - accuracy: 0.4840
Epoch 64/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6961 - accuracy: 0.4920
Epoch 65/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6993 - accuracy: 0.4830
Epoch 66/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6966 - accuracy: 0.4980
Epoch 67/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6977 - accuracy: 0.4490
Epoch 68/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6958 - accuracy: 0.5060
Epoch 69/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6949 - accuracy: 0.5280
Epoch 70/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6987 - accuracy: 0.4720
Epoch 71/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6976 - accuracy: 0.4720
Epoch 72/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.5010
Epoch 73/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6960 - accuracy: 0.4890
Epoch 74/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6958 - accuracy: 0.5050
Epoch 75/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6960 - accuracy: 0.5070
Epoch 76/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6980 - accuracy: 0.4810
Epoch 77/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6972 - accuracy: 0.4990
Epoch 78/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6972 - accuracy: 0.4710
Epoch 79/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7007 - accuracy: 0.5130
Epoch 80/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6978 - accuracy: 0.5000
Epoch 81/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6975 - accuracy: 0.5020
Epoch 82/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6966 - accuracy: 0.4880
Epoch 83/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7003 - accuracy: 0.4510
Epoch 84/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6964 - accuracy: 0.5010
Epoch 85/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6968 - accuracy: 0.4660
Epoch 86/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7005 - accuracy: 0.4940
Epoch 87/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6976 - accuracy: 0.4540
Epoch 88/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6981 - accuracy: 0.4570
Epoch 89/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6982 - accuracy: 0.4740
Epoch 90/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6978 - accuracy: 0.4560
Epoch 91/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6980 - accuracy: 0.4880
Epoch 92/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6984 - accuracy: 0.4730
Epoch 93/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6982 - accuracy: 0.4710
Epoch 94/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7001 - accuracy: 0.4790
Epoch 95/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6972 - accuracy: 0.4560
Epoch 96/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6979 - accuracy: 0.4860
Epoch 97/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6972 - accuracy: 0.4580
Epoch 98/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6987 - accuracy: 0.4800
Epoch 99/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6975 - accuracy: 0.5080
Epoch 100/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6967 - accuracy: 0.4810

Out[15]:

<keras.src.callbacks.History at 0x7a249bd86620>

Still!

We've pulled out a few tricks but our model isn't even doing better than guessing.

Let's make some visualizations to see what's happening.

🔑 Note: Whenever your model is performing strangely or there's something going on with your data you're not quite sure of, remember these three words: visualize, visualize, visualize. Inspect your data, inspect your model, inpsect your model's predictions.

To visualize our model's predictions we're going to create a function plot_decision_boundary() which:

Takes in a trained model, features (X) and labels (y).
Creates a meshgrid of the different X values.
Makes predictions across the meshgrid.
Plots the predictions as well as a line between the different zones (where each unique class falls).

If this sounds confusing, let's see it in code and then see the output.

🔑 Note: If you're ever unsure of what a function does, try unraveling it and writing it line by line for yourself to see what it does. Break it into small parts and see what each part outputs.

In [16]:

Copied!





import numpy as np

def plot_decision_boundary(model, X, y):
  """
  Plots the decision boundary created by a model predicting on X.
  This function has been adapted from two phenomenal resources:
   1. CS231n - https://cs231n.github.io/neural-networks-case-study/
   2. Made with ML basics - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
  """
  # Define the axis boundaries of the plot and create a meshgrid
  x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
  y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
  xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))

  # Create X values (we're going to predict on all of these)
  x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together: https://numpy.org/devdocs/reference/generated/numpy.c_.html

  # Make predictions using the trained model
  y_pred = model.predict(x_in)

  # Check for multi-class
  if model.output_shape[-1] > 1: # checks the final dimension of the model's output shape, if this is > (greater than) 1, it's multi-class
    print("doing multiclass classification...")
    # We have to reshape our predictions to get them ready for plotting
    y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
  else:
    print("doing binary classifcation...")
    y_pred = np.round(np.max(y_pred, axis=1)).reshape(xx.shape)

  # Plot decision boundary
  plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())
import numpy as np

def plot_decision_boundary(model, X, y):
  """
  Plots the decision boundary created by a model predicting on X.
  This function has been adapted from two phenomenal resources:
   1. CS231n - https://cs231n.github.io/neural-networks-case-study/
   2. Made with ML basics - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
  """
  # Define the axis boundaries of the plot and create a meshgrid
  x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
  y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
  xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))

  # Create X values (we're going to predict on all of these)
  x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together: https://numpy.org/devdocs/reference/generated/numpy.c_.html

  # Make predictions using the trained model
  y_pred = model.predict(x_in)

  # Check for multi-class
  if model.output_shape[-1] > 1: # checks the final dimension of the model's output shape, if this is > (greater than) 1, it's multi-class
    print("doing multiclass classification...")
    # We have to reshape our predictions to get them ready for plotting
    y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
  else:
    print("doing binary classifcation...")
    y_pred = np.round(np.max(y_pred, axis=1)).reshape(xx.shape)

  # Plot decision boundary
  plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())

Now we've got a function to plot our model's decision boundary (the cut off point its making between red and blue dots), let's try it out.

In [17]:

Copied!

# Check out the predictions our model is making
plot_decision_boundary(model_3, X, y)
# Check out the predictions our model is making
plot_decision_boundary(model_3, X, y)

313/313 [==============================] - 0s 1ms/step
doing binary classifcation...

Looks like our model is trying to draw a straight line through the data.

What's wrong with doing this?

The main issue is our data isn't separable by a straight line.

In a regression problem, our model might work. In fact, let's try it.

In [18]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create some regression data
X_regression = np.arange(0, 1000, 5)
y_regression = np.arange(100, 1100, 5)

# Split it into training and test sets
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]

# Fit our model to the data
# Note: Before TensorFlow 2.7.0, this line would work
# model_3.fit(X_reg_train, y_reg_train, epochs=100)

# After TensorFlow 2.7.0, see here for more: https://github.com/mrdbourke/tensorflow-deep-learning/discussions/278
model_3.fit(tf.expand_dims(X_reg_train, axis=-1),
            y_reg_train,
            epochs=100)
# Set random seed
tf.random.set_seed(42)

# Create some regression data
X_regression = np.arange(0, 1000, 5)
y_regression = np.arange(100, 1100, 5)

# Split it into training and test sets
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]

# Fit our model to the data
# Note: Before TensorFlow 2.7.0, this line would work
# model_3.fit(X_reg_train, y_reg_train, epochs=100)

# After TensorFlow 2.7.0, see here for more: https://github.com/mrdbourke/tensorflow-deep-learning/discussions/278
model_3.fit(tf.expand_dims(X_reg_train, axis=-1),
            y_reg_train,
            epochs=100)

Epoch 1/100

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-2f683d96fa34> in <cell line: 19>()
     17 
     18 # After TensorFlow 2.7.0, see here for more: https://github.com/mrdbourke/tensorflow-deep-learning/discussions/278
---> 19 model_3.fit(tf.expand_dims(X_reg_train, axis=-1),
     20             y_reg_train,
     21             epochs=100)

/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py in tf__train_function(iterator)
     13                 try:
     14                     do_return = True
---> 15                     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16                 except:
     17                     do_return = False

ValueError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1338, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1322, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1303, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1080, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py", line 280, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer 'sequential_2' (type Sequential).
    
    Input 0 of layer "dense_3" is incompatible with the layer: expected axis -1 of input shape to have value 2, but received input with shape (None, 1)
    
    Call arguments received by layer 'sequential_2' (type Sequential):
      • inputs=tf.Tensor(shape=(None, 1), dtype=int64)
      • training=True
      • mask=None

In [19]:

Copied!

model_3.summary()
model_3.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_3 (Dense)             (None, 100)               300       
                                                                 
 dense_4 (Dense)             (None, 10)                1010      
                                                                 
 dense_5 (Dense)             (None, 1)                 11        
                                                                 
=================================================================
Total params: 1321 (5.16 KB)
Trainable params: 1321 (5.16 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Oh wait... we compiled our model for a binary classification problem.

No trouble, we can recreate it for a regression problem.

In [20]:

Copied!





# Setup random seed
tf.random.set_seed(42)

# Recreate the model
model_3 = tf.keras.Sequential([
  tf.keras.layers.Dense(100),
  tf.keras.layers.Dense(10),
  tf.keras.layers.Dense(1)
])

# Change the loss and metrics of our compiled model
model_3.compile(loss=tf.keras.losses.mae, # change the loss function to be regression-specific
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['mae']) # change the metric to be regression-specific

# Fit the recompiled model
model_3.fit(tf.expand_dims(X_reg_train, axis=-1),
            y_reg_train,
            epochs=100)
# Setup random seed
tf.random.set_seed(42)

# Recreate the model
model_3 = tf.keras.Sequential([
  tf.keras.layers.Dense(100),
  tf.keras.layers.Dense(10),
  tf.keras.layers.Dense(1)
])

# Change the loss and metrics of our compiled model
model_3.compile(loss=tf.keras.losses.mae, # change the loss function to be regression-specific
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['mae']) # change the metric to be regression-specific

# Fit the recompiled model
model_3.fit(tf.expand_dims(X_reg_train, axis=-1),
            y_reg_train,
            epochs=100)

Epoch 1/100
5/5 [==============================] - 1s 5ms/step - loss: 447.6102 - mae: 447.6102
Epoch 2/100
5/5 [==============================] - 0s 4ms/step - loss: 313.3108 - mae: 313.3108
Epoch 3/100
5/5 [==============================] - 0s 4ms/step - loss: 182.3118 - mae: 182.3118
Epoch 4/100
5/5 [==============================] - 0s 4ms/step - loss: 58.6317 - mae: 58.6317
Epoch 5/100
5/5 [==============================] - 0s 4ms/step - loss: 83.0025 - mae: 83.0025
Epoch 6/100
5/5 [==============================] - 0s 4ms/step - loss: 86.4671 - mae: 86.4671
Epoch 7/100
5/5 [==============================] - 0s 3ms/step - loss: 49.6151 - mae: 49.6151
Epoch 8/100
5/5 [==============================] - 0s 4ms/step - loss: 57.7703 - mae: 57.7703
Epoch 9/100
5/5 [==============================] - 0s 3ms/step - loss: 50.1641 - mae: 50.1641
Epoch 10/100
5/5 [==============================] - 0s 4ms/step - loss: 47.7698 - mae: 47.7698
Epoch 11/100
5/5 [==============================] - 0s 4ms/step - loss: 48.6194 - mae: 48.6194
Epoch 12/100
5/5 [==============================] - 0s 4ms/step - loss: 43.2044 - mae: 43.2044
Epoch 13/100
5/5 [==============================] - 0s 4ms/step - loss: 42.6293 - mae: 42.6293
Epoch 14/100
5/5 [==============================] - 0s 4ms/step - loss: 42.4557 - mae: 42.4557
Epoch 15/100
5/5 [==============================] - 0s 4ms/step - loss: 41.9446 - mae: 41.9446
Epoch 16/100
5/5 [==============================] - 0s 4ms/step - loss: 41.7290 - mae: 41.7290
Epoch 17/100
5/5 [==============================] - 0s 4ms/step - loss: 41.5669 - mae: 41.5669
Epoch 18/100
5/5 [==============================] - 0s 4ms/step - loss: 41.2955 - mae: 41.2955
Epoch 19/100
5/5 [==============================] - 0s 4ms/step - loss: 41.1274 - mae: 41.1274
Epoch 20/100
5/5 [==============================] - 0s 4ms/step - loss: 41.1121 - mae: 41.1121
Epoch 21/100
5/5 [==============================] - 0s 4ms/step - loss: 41.1519 - mae: 41.1519
Epoch 22/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0565 - mae: 41.0565
Epoch 23/100
5/5 [==============================] - 0s 4ms/step - loss: 41.1133 - mae: 41.1133
Epoch 24/100
5/5 [==============================] - 0s 60ms/step - loss: 41.0074 - mae: 41.0074
Epoch 25/100
5/5 [==============================] - 0s 4ms/step - loss: 40.9870 - mae: 40.9870
Epoch 26/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0249 - mae: 41.0249
Epoch 27/100
5/5 [==============================] - 0s 3ms/step - loss: 40.8403 - mae: 40.8403
Epoch 28/100
5/5 [==============================] - 0s 4ms/step - loss: 40.9965 - mae: 40.9965
Epoch 29/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0225 - mae: 41.0225
Epoch 30/100
5/5 [==============================] - 0s 4ms/step - loss: 40.8058 - mae: 40.8058
Epoch 31/100
5/5 [==============================] - 0s 4ms/step - loss: 41.3589 - mae: 41.3589
Epoch 32/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0084 - mae: 41.0084
Epoch 33/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0701 - mae: 41.0701
Epoch 34/100
5/5 [==============================] - 0s 4ms/step - loss: 41.2035 - mae: 41.2035
Epoch 35/100
5/5 [==============================] - 0s 3ms/step - loss: 40.5885 - mae: 40.5885
Epoch 36/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0615 - mae: 41.0615
Epoch 37/100
5/5 [==============================] - 0s 4ms/step - loss: 40.6438 - mae: 40.6438
Epoch 38/100
5/5 [==============================] - 0s 4ms/step - loss: 40.3412 - mae: 40.3412
Epoch 39/100
5/5 [==============================] - 0s 4ms/step - loss: 40.6498 - mae: 40.6498
Epoch 40/100
5/5 [==============================] - 0s 4ms/step - loss: 40.4421 - mae: 40.4421
Epoch 41/100
5/5 [==============================] - 0s 4ms/step - loss: 40.3558 - mae: 40.3558
Epoch 42/100
5/5 [==============================] - 0s 4ms/step - loss: 40.3041 - mae: 40.3041
Epoch 43/100
5/5 [==============================] - 0s 4ms/step - loss: 40.5277 - mae: 40.5277
Epoch 44/100
5/5 [==============================] - 0s 4ms/step - loss: 40.1808 - mae: 40.1808
Epoch 45/100
5/5 [==============================] - 0s 4ms/step - loss: 40.6292 - mae: 40.6292
Epoch 46/100
5/5 [==============================] - 0s 3ms/step - loss: 40.4382 - mae: 40.4382
Epoch 47/100
5/5 [==============================] - 0s 4ms/step - loss: 40.1801 - mae: 40.1801
Epoch 48/100
5/5 [==============================] - 0s 4ms/step - loss: 40.2386 - mae: 40.2386
Epoch 49/100
5/5 [==============================] - 0s 3ms/step - loss: 40.7914 - mae: 40.7914
Epoch 50/100
5/5 [==============================] - 0s 3ms/step - loss: 40.1259 - mae: 40.1259
Epoch 51/100
5/5 [==============================] - 0s 4ms/step - loss: 40.4617 - mae: 40.4617
Epoch 52/100
5/5 [==============================] - 0s 4ms/step - loss: 40.8686 - mae: 40.8686
Epoch 53/100
5/5 [==============================] - 0s 4ms/step - loss: 41.0441 - mae: 41.0441
Epoch 54/100
5/5 [==============================] - 0s 4ms/step - loss: 41.1022 - mae: 41.1022
Epoch 55/100
5/5 [==============================] - 0s 4ms/step - loss: 42.1498 - mae: 42.1498
Epoch 56/100
5/5 [==============================] - 0s 4ms/step - loss: 42.3732 - mae: 42.3732
Epoch 57/100
5/5 [==============================] - 0s 4ms/step - loss: 40.9868 - mae: 40.9868
Epoch 58/100
5/5 [==============================] - 0s 4ms/step - loss: 40.4022 - mae: 40.4022
Epoch 59/100
5/5 [==============================] - 0s 4ms/step - loss: 41.1762 - mae: 41.1762
Epoch 60/100
5/5 [==============================] - 0s 4ms/step - loss: 40.0280 - mae: 40.0280
Epoch 61/100
5/5 [==============================] - 0s 4ms/step - loss: 39.4377 - mae: 39.4377
Epoch 62/100
5/5 [==============================] - 0s 4ms/step - loss: 40.2227 - mae: 40.2227
Epoch 63/100
5/5 [==============================] - 0s 4ms/step - loss: 39.7379 - mae: 39.7379
Epoch 64/100
5/5 [==============================] - 0s 4ms/step - loss: 39.4851 - mae: 39.4851
Epoch 65/100
5/5 [==============================] - 0s 4ms/step - loss: 39.8735 - mae: 39.8735
Epoch 66/100
5/5 [==============================] - 0s 4ms/step - loss: 39.5498 - mae: 39.5498
Epoch 67/100
5/5 [==============================] - 0s 3ms/step - loss: 39.5944 - mae: 39.5944
Epoch 68/100
5/5 [==============================] - 0s 3ms/step - loss: 39.5087 - mae: 39.5087
Epoch 69/100
5/5 [==============================] - 0s 3ms/step - loss: 39.3758 - mae: 39.3758
Epoch 70/100
5/5 [==============================] - 0s 4ms/step - loss: 39.8543 - mae: 39.8543
Epoch 71/100
5/5 [==============================] - 0s 4ms/step - loss: 41.2924 - mae: 41.2924
Epoch 72/100
5/5 [==============================] - 0s 4ms/step - loss: 39.0309 - mae: 39.0309
Epoch 73/100
5/5 [==============================] - 0s 3ms/step - loss: 39.7582 - mae: 39.7582
Epoch 74/100
5/5 [==============================] - 0s 4ms/step - loss: 39.1621 - mae: 39.1621
Epoch 75/100
5/5 [==============================] - 0s 4ms/step - loss: 39.9158 - mae: 39.9158
Epoch 76/100
5/5 [==============================] - 0s 4ms/step - loss: 40.2419 - mae: 40.2419
Epoch 77/100
5/5 [==============================] - 0s 4ms/step - loss: 38.9030 - mae: 38.9030
Epoch 78/100
5/5 [==============================] - 0s 3ms/step - loss: 39.5400 - mae: 39.5400
Epoch 79/100
5/5 [==============================] - 0s 3ms/step - loss: 39.2044 - mae: 39.2044
Epoch 80/100
5/5 [==============================] - 0s 3ms/step - loss: 38.7893 - mae: 38.7893
Epoch 81/100
5/5 [==============================] - 0s 3ms/step - loss: 38.8879 - mae: 38.8879
Epoch 82/100
5/5 [==============================] - 0s 4ms/step - loss: 38.9441 - mae: 38.9441
Epoch 83/100
5/5 [==============================] - 0s 4ms/step - loss: 38.6721 - mae: 38.6721
Epoch 84/100
5/5 [==============================] - 0s 4ms/step - loss: 38.7601 - mae: 38.7601
Epoch 85/100
5/5 [==============================] - 0s 4ms/step - loss: 39.0045 - mae: 39.0045
Epoch 86/100
5/5 [==============================] - 0s 4ms/step - loss: 38.9378 - mae: 38.9378
Epoch 87/100
5/5 [==============================] - 0s 3ms/step - loss: 38.3988 - mae: 38.3988
Epoch 88/100
5/5 [==============================] - 0s 3ms/step - loss: 38.5840 - mae: 38.5840
Epoch 89/100
5/5 [==============================] - 0s 3ms/step - loss: 38.4868 - mae: 38.4868
Epoch 90/100
5/5 [==============================] - 0s 4ms/step - loss: 38.3730 - mae: 38.3730
Epoch 91/100
5/5 [==============================] - 0s 3ms/step - loss: 38.2209 - mae: 38.2209
Epoch 92/100
5/5 [==============================] - 0s 4ms/step - loss: 38.3540 - mae: 38.3540
Epoch 93/100
5/5 [==============================] - 0s 4ms/step - loss: 38.6931 - mae: 38.6931
Epoch 94/100
5/5 [==============================] - 0s 4ms/step - loss: 37.9931 - mae: 37.9931
Epoch 95/100
5/5 [==============================] - 0s 4ms/step - loss: 38.0585 - mae: 38.0585
Epoch 96/100
5/5 [==============================] - 0s 4ms/step - loss: 38.4031 - mae: 38.4031
Epoch 97/100
5/5 [==============================] - 0s 4ms/step - loss: 38.0610 - mae: 38.0610
Epoch 98/100
5/5 [==============================] - 0s 4ms/step - loss: 38.3810 - mae: 38.3810
Epoch 99/100
5/5 [==============================] - 0s 4ms/step - loss: 38.4900 - mae: 38.4900
Epoch 100/100
5/5 [==============================] - 0s 4ms/step - loss: 37.9673 - mae: 37.9673

Out[20]:

<keras.src.callbacks.History at 0x7a24c5b179a0>

Okay, it seems like our model is learning something (the mae value trends down with each epoch), let's plot its predictions.

In [21]:

Copied!





# Make predictions with our trained model
y_reg_preds = model_3.predict(X_reg_test)

# Plot the model's predictions against our regression data
plt.figure(figsize=(10, 7))
plt.scatter(X_reg_train, y_reg_train, c='b', label='Training data')
plt.scatter(X_reg_test, y_reg_test, c='g', label='Testing data')
plt.scatter(X_reg_test, y_reg_preds.squeeze(), c='r', label='Predictions')
plt.legend();
# Make predictions with our trained model
y_reg_preds = model_3.predict(X_reg_test)

# Plot the model's predictions against our regression data
plt.figure(figsize=(10, 7))
plt.scatter(X_reg_train, y_reg_train, c='b', label='Training data')
plt.scatter(X_reg_test, y_reg_test, c='g', label='Testing data')
plt.scatter(X_reg_test, y_reg_preds.squeeze(), c='r', label='Predictions')
plt.legend();

2/2 [==============================] - 0s 4ms/step

Okay, the predictions aren't perfect (if the predictions were perfect, the red would line up with the green), but they look better than complete guessing.

So this means our model must be learning something...

There must be something we're missing out on for our classification problem.

The missing piece: Non-linearity¶

Okay, so we saw our neural network can model straight lines (with ability a little bit better than guessing).

What about non-straight (non-linear) lines?

If we're going to model our classification data (the red and blue circles), we're going to need some non-linear lines.

🔨 Practice: Before we get to the next steps, I'd encourage you to play around with the TensorFlow Playground (check out what the data has in common with our own classification data) for 10-minutes. In particular the tab which says "activation". Once you're done, come back.

Did you try out the activation options? If so, what did you find?

If you didn't, don't worry, let's see it in code.

We're going to replicate the neural network you can see at this link: TensorFlow Playground.

simple neural net created with TensorFlow playground The neural network we're going to recreate with TensorFlow code. See it live at TensorFlow Playground.

The main change we'll add to models we've built before is the use of the activation keyword.

In [22]:

Copied!





# Set the random seed
tf.random.set_seed(42)

# Create the model
model_4 = tf.keras.Sequential([
  tf.keras.layers.Dense(1, activation=tf.keras.activations.linear), # 1 hidden layer with linear activation
  tf.keras.layers.Dense(1) # output layer
])

# Compile the model
model_4.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # note: "lr" used to be what was used, now "learning_rate" is favoured
                metrics=["accuracy"])

# Fit the model
history = model_4.fit(X, y, epochs=100)
# Set the random seed
tf.random.set_seed(42)

# Create the model
model_4 = tf.keras.Sequential([
  tf.keras.layers.Dense(1, activation=tf.keras.activations.linear), # 1 hidden layer with linear activation
  tf.keras.layers.Dense(1) # output layer
])

# Compile the model
model_4.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # note: "lr" used to be what was used, now "learning_rate" is favoured
                metrics=["accuracy"])

# Fit the model
history = model_4.fit(X, y, epochs=100)

Epoch 1/100
32/32 [==============================] - 1s 3ms/step - loss: 4.2584 - accuracy: 0.5000
Epoch 2/100
32/32 [==============================] - 0s 3ms/step - loss: 4.0332 - accuracy: 0.5000
Epoch 3/100
32/32 [==============================] - 0s 3ms/step - loss: 3.8635 - accuracy: 0.4970
Epoch 4/100
32/32 [==============================] - 0s 3ms/step - loss: 3.6277 - accuracy: 0.4670
Epoch 5/100
32/32 [==============================] - 0s 3ms/step - loss: 3.3772 - accuracy: 0.4490
Epoch 6/100
32/32 [==============================] - 0s 3ms/step - loss: 3.0635 - accuracy: 0.4460
Epoch 7/100
32/32 [==============================] - 0s 3ms/step - loss: 2.7472 - accuracy: 0.4450
Epoch 8/100
32/32 [==============================] - 0s 3ms/step - loss: 2.1925 - accuracy: 0.4470
Epoch 9/100
32/32 [==============================] - 0s 3ms/step - loss: 1.1191 - accuracy: 0.4790
Epoch 10/100
32/32 [==============================] - 0s 3ms/step - loss: 0.9511 - accuracy: 0.4900
Epoch 11/100
32/32 [==============================] - 0s 3ms/step - loss: 0.9207 - accuracy: 0.4880
Epoch 12/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8991 - accuracy: 0.4850
Epoch 13/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8814 - accuracy: 0.4770
Epoch 14/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8662 - accuracy: 0.4720
Epoch 15/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8530 - accuracy: 0.4620
Epoch 16/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8421 - accuracy: 0.4510
Epoch 17/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8320 - accuracy: 0.4470
Epoch 18/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8227 - accuracy: 0.4420
Epoch 19/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8146 - accuracy: 0.4330
Epoch 20/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8070 - accuracy: 0.4290
Epoch 21/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8001 - accuracy: 0.4210
Epoch 22/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7937 - accuracy: 0.4140
Epoch 23/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7877 - accuracy: 0.4090
Epoch 24/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7823 - accuracy: 0.4090
Epoch 25/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7771 - accuracy: 0.4100
Epoch 26/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7722 - accuracy: 0.4140
Epoch 27/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7677 - accuracy: 0.4220
Epoch 28/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7633 - accuracy: 0.4320
Epoch 29/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7594 - accuracy: 0.4350
Epoch 30/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7557 - accuracy: 0.4460
Epoch 31/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7522 - accuracy: 0.4480
Epoch 32/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7489 - accuracy: 0.4470
Epoch 33/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7460 - accuracy: 0.4540
Epoch 34/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7432 - accuracy: 0.4520
Epoch 35/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7406 - accuracy: 0.4590
Epoch 36/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7381 - accuracy: 0.4620
Epoch 37/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7357 - accuracy: 0.4640
Epoch 38/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7335 - accuracy: 0.4640
Epoch 39/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7314 - accuracy: 0.4660
Epoch 40/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7295 - accuracy: 0.4670
Epoch 41/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7276 - accuracy: 0.4720
Epoch 42/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7259 - accuracy: 0.4740
Epoch 43/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7241 - accuracy: 0.4780
Epoch 44/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7227 - accuracy: 0.4770
Epoch 45/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7212 - accuracy: 0.4770
Epoch 46/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7198 - accuracy: 0.4760
Epoch 47/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7186 - accuracy: 0.4760
Epoch 48/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7173 - accuracy: 0.4780
Epoch 49/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7161 - accuracy: 0.4810
Epoch 50/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7150 - accuracy: 0.4780
Epoch 51/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7139 - accuracy: 0.4800
Epoch 52/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7129 - accuracy: 0.4820
Epoch 53/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7119 - accuracy: 0.4850
Epoch 54/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7110 - accuracy: 0.4850
Epoch 55/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7101 - accuracy: 0.4880
Epoch 56/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7092 - accuracy: 0.4870
Epoch 57/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7085 - accuracy: 0.4860
Epoch 58/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7078 - accuracy: 0.4910
Epoch 59/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7072 - accuracy: 0.4870
Epoch 60/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7064 - accuracy: 0.4900
Epoch 61/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7058 - accuracy: 0.4890
Epoch 62/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7053 - accuracy: 0.4900
Epoch 63/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7046 - accuracy: 0.4900
Epoch 64/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7041 - accuracy: 0.4900
Epoch 65/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7036 - accuracy: 0.4910
Epoch 66/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7031 - accuracy: 0.4870
Epoch 67/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7026 - accuracy: 0.4880
Epoch 68/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7022 - accuracy: 0.4880
Epoch 69/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7017 - accuracy: 0.4860
Epoch 70/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7013 - accuracy: 0.4860
Epoch 71/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7009 - accuracy: 0.4870
Epoch 72/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7005 - accuracy: 0.4900
Epoch 73/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7002 - accuracy: 0.4890
Epoch 74/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6999 - accuracy: 0.4890
Epoch 75/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6996 - accuracy: 0.4900
Epoch 76/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6993 - accuracy: 0.4900
Epoch 77/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6990 - accuracy: 0.4890
Epoch 78/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6987 - accuracy: 0.4890
Epoch 79/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6984 - accuracy: 0.4890
Epoch 80/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6981 - accuracy: 0.4870
Epoch 81/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6978 - accuracy: 0.4850
Epoch 82/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6976 - accuracy: 0.4850
Epoch 83/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6974 - accuracy: 0.4860
Epoch 84/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6971 - accuracy: 0.4870
Epoch 85/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6969 - accuracy: 0.4880
Epoch 86/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6968 - accuracy: 0.4880
Epoch 87/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6965 - accuracy: 0.4900
Epoch 88/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6964 - accuracy: 0.4860
Epoch 89/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6963 - accuracy: 0.4840
Epoch 90/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6962 - accuracy: 0.4900
Epoch 91/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6959 - accuracy: 0.4890
Epoch 92/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6958 - accuracy: 0.4880
Epoch 93/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6957 - accuracy: 0.4910
Epoch 94/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6956 - accuracy: 0.4860
Epoch 95/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6954 - accuracy: 0.4880
Epoch 96/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6953 - accuracy: 0.4850
Epoch 97/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6952 - accuracy: 0.4880
Epoch 98/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6951 - accuracy: 0.4880
Epoch 99/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6950 - accuracy: 0.4850
Epoch 100/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6948 - accuracy: 0.4900

Okay, our model performs a little worse than guessing.

Let's remind ourselves what our data looks like.

In [23]:

Copied!

# Check out our data
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu);
# Check out our data
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu);

And let's see how our model is making predictions on it.

In [24]:

Copied!

# Check the deicison boundary (blue is blue class, yellow is the crossover, red is red class)
plot_decision_boundary(model_4, X, y)
# Check the deicison boundary (blue is blue class, yellow is the crossover, red is red class)
plot_decision_boundary(model_4, X, y)

313/313 [==============================] - 0s 1ms/step
doing binary classifcation...

Well, it looks like we're getting a straight (linear) line prediction again.

But our data is non-linear (not a straight line)...

What we're going to have to do is add some non-linearity to our model.

To do so, we'll use the activation parameter in on of our layers.

In [25]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create a model with a non-linear activation
model_5 = tf.keras.Sequential([
  tf.keras.layers.Dense(1, activation=tf.keras.activations.relu), # can also do activation='relu'
  tf.keras.layers.Dense(1) # output layer
])

# Compile the model
model_5.compile(loss=tf.keras.losses.binary_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# Fit the model
history = model_5.fit(X, y, epochs=100)
# Set random seed
tf.random.set_seed(42)

# Create a model with a non-linear activation
model_5 = tf.keras.Sequential([
  tf.keras.layers.Dense(1, activation=tf.keras.activations.relu), # can also do activation='relu'
  tf.keras.layers.Dense(1) # output layer
])

# Compile the model
model_5.compile(loss=tf.keras.losses.binary_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# Fit the model
history = model_5.fit(X, y, epochs=100)

Epoch 1/100
32/32 [==============================] - 1s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 2/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 3/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 4/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 5/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 6/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 7/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 8/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 9/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 10/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 11/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 12/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 13/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 14/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 15/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 16/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 17/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 18/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 19/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 20/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 21/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 22/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 23/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 24/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 25/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 26/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 27/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 28/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 29/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 30/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 31/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 32/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 33/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 34/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 35/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 36/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 37/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 38/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 39/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 40/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 41/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 42/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 43/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 44/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 45/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 46/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 47/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 48/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 49/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 50/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 51/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 52/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 53/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 54/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 55/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 56/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 57/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 58/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 59/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 60/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 61/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 62/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 63/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 64/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 65/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 66/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 67/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 68/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 69/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 70/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 71/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 72/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 73/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 74/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 75/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 76/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 77/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 78/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 79/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 80/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 81/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 82/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 83/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 84/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 85/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 86/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 87/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 88/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 89/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 90/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 91/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 92/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 93/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 94/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 95/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 96/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 97/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 98/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 99/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000
Epoch 100/100
32/32 [==============================] - 0s 3ms/step - loss: 7.7125 - accuracy: 0.5000

Hmm... still not learning...

What we if increased the number of neurons and layers?

Say, 2 hidden layers, with ReLU, pronounced "rel-u", (short for rectified linear unit), activation on the first one, and 4 neurons each?

To see this network in action, check out the TensorFlow Playground demo.

multi-layer neural net created with TensorFlow playground The neural network we're going to recreate with TensorFlow code. See it live at TensorFlow Playground.

Let's try.

Note: in the course, Daniel used lr instead of learning_rate. But for the update, we had changed to learning_rate instead of lr.

In [26]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create a model
model_6 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 1, 4 neurons, ReLU activation
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 2, 4 neurons, ReLU activation
  tf.keras.layers.Dense(1) # ouput layer
])

# Compile the model
model_6.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # Adam's default learning rate is 0.001
                metrics=['accuracy'])

# Fit the model
history = model_6.fit(X, y, epochs=100)
# Set random seed
tf.random.set_seed(42)

# Create a model
model_6 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 1, 4 neurons, ReLU activation
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 2, 4 neurons, ReLU activation
  tf.keras.layers.Dense(1) # ouput layer
])

# Compile the model
model_6.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # Adam's default learning rate is 0.001
                metrics=['accuracy'])

# Fit the model
history = model_6.fit(X, y, epochs=100)

Epoch 1/100
32/32 [==============================] - 2s 3ms/step - loss: 4.3069 - accuracy: 0.5000
Epoch 2/100
32/32 [==============================] - 0s 3ms/step - loss: 4.0916 - accuracy: 0.5000
Epoch 3/100
32/32 [==============================] - 0s 3ms/step - loss: 3.9820 - accuracy: 0.4520
Epoch 4/100
32/32 [==============================] - 0s 3ms/step - loss: 3.8302 - accuracy: 0.4150
Epoch 5/100
32/32 [==============================] - 0s 3ms/step - loss: 3.7048 - accuracy: 0.4500
Epoch 6/100
32/32 [==============================] - 0s 3ms/step - loss: 3.5944 - accuracy: 0.4650
Epoch 7/100
32/32 [==============================] - 0s 3ms/step - loss: 3.1921 - accuracy: 0.4650
Epoch 8/100
32/32 [==============================] - 0s 3ms/step - loss: 2.6196 - accuracy: 0.4680
Epoch 9/100
32/32 [==============================] - 0s 3ms/step - loss: 1.3332 - accuracy: 0.4740
Epoch 10/100
32/32 [==============================] - 0s 3ms/step - loss: 0.9951 - accuracy: 0.4750
Epoch 11/100
32/32 [==============================] - 0s 3ms/step - loss: 0.9489 - accuracy: 0.4750
Epoch 12/100
32/32 [==============================] - 0s 3ms/step - loss: 0.9082 - accuracy: 0.4730
Epoch 13/100
32/32 [==============================] - 0s 3ms/step - loss: 0.8345 - accuracy: 0.4720
Epoch 14/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7435 - accuracy: 0.4690
Epoch 15/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7288 - accuracy: 0.4550
Epoch 16/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7245 - accuracy: 0.4620
Epoch 17/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7213 - accuracy: 0.4630
Epoch 18/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7186 - accuracy: 0.4670
Epoch 19/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7166 - accuracy: 0.4600
Epoch 20/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7143 - accuracy: 0.4560
Epoch 21/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7127 - accuracy: 0.4600
Epoch 22/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7107 - accuracy: 0.4580
Epoch 23/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7090 - accuracy: 0.4590
Epoch 24/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7075 - accuracy: 0.4640
Epoch 25/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7061 - accuracy: 0.4690
Epoch 26/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7046 - accuracy: 0.4640
Epoch 27/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7033 - accuracy: 0.4720
Epoch 28/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7012 - accuracy: 0.4650
Epoch 29/100
32/32 [==============================] - 0s 3ms/step - loss: 0.7002 - accuracy: 0.4660
Epoch 30/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6987 - accuracy: 0.4650
Epoch 31/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6973 - accuracy: 0.4660
Epoch 32/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6961 - accuracy: 0.4710
Epoch 33/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6944 - accuracy: 0.4760
Epoch 34/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6935 - accuracy: 0.4690
Epoch 35/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6922 - accuracy: 0.4850
Epoch 36/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6907 - accuracy: 0.4740
Epoch 37/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6895 - accuracy: 0.4940
Epoch 38/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6887 - accuracy: 0.4770
Epoch 39/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6875 - accuracy: 0.4680
Epoch 40/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6867 - accuracy: 0.4800
Epoch 41/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6862 - accuracy: 0.4850
Epoch 42/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6847 - accuracy: 0.4640
Epoch 43/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6837 - accuracy: 0.4730
Epoch 44/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6841 - accuracy: 0.4590
Epoch 45/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6820 - accuracy: 0.4520
Epoch 46/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6812 - accuracy: 0.4500
Epoch 47/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6813 - accuracy: 0.4730
Epoch 48/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6798 - accuracy: 0.4570
Epoch 49/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6791 - accuracy: 0.5190
Epoch 50/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6784 - accuracy: 0.5360
Epoch 51/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6779 - accuracy: 0.5270
Epoch 52/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6773 - accuracy: 0.5030
Epoch 53/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6766 - accuracy: 0.5350
Epoch 54/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6757 - accuracy: 0.5400
Epoch 55/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6756 - accuracy: 0.5410
Epoch 56/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6749 - accuracy: 0.5430
Epoch 57/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6744 - accuracy: 0.5420
Epoch 58/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6741 - accuracy: 0.5430
Epoch 59/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6737 - accuracy: 0.5420
Epoch 60/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6729 - accuracy: 0.5400
Epoch 61/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6727 - accuracy: 0.5410
Epoch 62/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6722 - accuracy: 0.5420
Epoch 63/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6715 - accuracy: 0.5390
Epoch 64/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6712 - accuracy: 0.5410
Epoch 65/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6709 - accuracy: 0.5400
Epoch 66/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6704 - accuracy: 0.5390
Epoch 67/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6701 - accuracy: 0.5400
Epoch 68/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6698 - accuracy: 0.5400
Epoch 69/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6691 - accuracy: 0.5420
Epoch 70/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6691 - accuracy: 0.5420
Epoch 71/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6684 - accuracy: 0.5410
Epoch 72/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6682 - accuracy: 0.5440
Epoch 73/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6680 - accuracy: 0.5480
Epoch 74/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6673 - accuracy: 0.5400
Epoch 75/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6675 - accuracy: 0.5500
Epoch 76/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6671 - accuracy: 0.5420
Epoch 77/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6666 - accuracy: 0.5530
Epoch 78/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6661 - accuracy: 0.5440
Epoch 79/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6657 - accuracy: 0.5490
Epoch 80/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6648 - accuracy: 0.5570
Epoch 81/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6647 - accuracy: 0.5580
Epoch 82/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6643 - accuracy: 0.5510
Epoch 83/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6637 - accuracy: 0.5460
Epoch 84/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6628 - accuracy: 0.5590
Epoch 85/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6625 - accuracy: 0.5540
Epoch 86/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6624 - accuracy: 0.5600
Epoch 87/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6615 - accuracy: 0.5560
Epoch 88/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6606 - accuracy: 0.5590
Epoch 89/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6605 - accuracy: 0.5650
Epoch 90/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6604 - accuracy: 0.5480
Epoch 91/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6588 - accuracy: 0.5580
Epoch 92/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6587 - accuracy: 0.5580
Epoch 93/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6586 - accuracy: 0.5490
Epoch 94/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6574 - accuracy: 0.5610
Epoch 95/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6567 - accuracy: 0.5560
Epoch 96/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6559 - accuracy: 0.5560
Epoch 97/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6551 - accuracy: 0.5590
Epoch 98/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6535 - accuracy: 0.5580
Epoch 99/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6538 - accuracy: 0.5760
Epoch 100/100
32/32 [==============================] - 0s 3ms/step - loss: 0.6523 - accuracy: 0.5640

In [27]:

Copied!

# Evaluate the model
model_6.evaluate(X, y)
# Evaluate the model
model_6.evaluate(X, y)

32/32 [==============================] - 0s 2ms/step - loss: 0.6508 - accuracy: 0.5640

Out[27]:

[0.6507635116577148, 0.5640000104904175]

We're still hitting 50% accuracy, our model is still practically as good as guessing.

How do the predictions look?

In [28]:

Copied!

# Check out the predictions using 2 hidden layers
plot_decision_boundary(model_6, X, y)
# Check out the predictions using 2 hidden layers
plot_decision_boundary(model_6, X, y)

313/313 [==============================] - 0s 1ms/step
doing binary classifcation...

What gives?

It seems like our model is the same as the one in the TensorFlow Playground but model it's still drawing straight lines...

Ideally, the yellow lines go on the inside of the red circle and the blue circle.

Okay, okay, let's model this circle once and for all.

One more model (I promise... actually, I'm going to have to break that promise... we'll be building plenty more models).

This time we'll change the activation function on our output layer too. Remember the architecture of a classification model? For binary classification, the output layer activation is usually the Sigmoid activation function.

In [29]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create a model
model_7 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 1, ReLU activation
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 2, ReLU activation
  tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid) # ouput layer, sigmoid activation
])

# Compile the model
model_7.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

# Fit the model
history = model_7.fit(X, y, epochs=100, verbose=0)
# Set random seed
tf.random.set_seed(42)

# Create a model
model_7 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 1, ReLU activation
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 2, ReLU activation
  tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid) # ouput layer, sigmoid activation
])

# Compile the model
model_7.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

# Fit the model
history = model_7.fit(X, y, epochs=100, verbose=0)

In [30]:

Copied!

# Evaluate our model
model_7.evaluate(X, y)
# Evaluate our model
model_7.evaluate(X, y)

32/32 [==============================] - 0s 2ms/step - loss: 0.2082 - accuracy: 0.9950

Out[30]:

[0.20816797018051147, 0.9950000047683716]

Woah! It looks like our model is getting some incredible results, let's check them out.

In [31]:

Copied!

# View the predictions of the model with relu and sigmoid activations
plot_decision_boundary(model_7, X, y)
# View the predictions of the model with relu and sigmoid activations
plot_decision_boundary(model_7, X, y)

313/313 [==============================] - 0s 1ms/step
doing binary classifcation...

Nice! It looks like our model is almost perfectly (apart from a few examples) separating the two circles.

🤔 Question: What's wrong with the predictions we've made? Are we really evaluating our model correctly here? Hint: what data did the model learn on and what did we predict on?

Before we answer that, it's important to recognize what we've just covered.

🔑 Note: The combination of linear (straight lines) and non-linear (non-straight lines) functions is one of the key fundamentals of neural networks.

Think of it like this:

If I gave you an unlimited amount of straight lines and non-straight lines, what kind of patterns could you draw?

That's essentially what neural networks do to find patterns in data.

Now you might be thinking, "but I haven't seen a linear function or a non-linear function before..."

Oh but you have.

We've been using them the whole time.

They're what power the layers in the models we just built.

To get some intuition about the activation functions we've just used, let's create them and then try them on some toy data.

In [32]:

Copied!

# Create a toy tensor (similar to the data we pass into our model)
A = tf.cast(tf.range(-10, 10), tf.float32)
A
# Create a toy tensor (similar to the data we pass into our model)
A = tf.cast(tf.range(-10, 10), tf.float32)
A

Out[32]:

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([-10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
         1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
      dtype=float32)>

How does this look?

In [33]:

Copied!

# Visualize our toy tensor
plt.plot(A);
# Visualize our toy tensor
plt.plot(A);

A straight (linear) line!

Nice, now let's recreate the sigmoid function and see what it does to our data. You can also find a pre-built sigmoid function at tf.keras.activations.sigmoid.

In [34]:

Copied!





# Sigmoid - https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid
def sigmoid(x):
  return 1 / (1 + tf.exp(-x))

# Use the sigmoid function on our tensor
sigmoid(A)
# Sigmoid - https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid
def sigmoid(x):
  return 1 / (1 + tf.exp(-x))

# Use the sigmoid function on our tensor
sigmoid(A)

Out[34]:

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([4.5397868e-05, 1.2339458e-04, 3.3535014e-04, 9.1105117e-04,
       2.4726230e-03, 6.6928510e-03, 1.7986210e-02, 4.7425874e-02,
       1.1920292e-01, 2.6894143e-01, 5.0000000e-01, 7.3105854e-01,
       8.8079703e-01, 9.5257413e-01, 9.8201376e-01, 9.9330717e-01,
       9.9752742e-01, 9.9908900e-01, 9.9966466e-01, 9.9987662e-01],
      dtype=float32)>

And how does it look?

In [35]:

Copied!

# Plot sigmoid modified tensor
plt.plot(sigmoid(A));
# Plot sigmoid modified tensor
plt.plot(sigmoid(A));

A non-straight (non-linear) line!

Okay, how about the ReLU function (ReLU turns all negatives to 0 and positive numbers stay the same)?

In [36]:

Copied!





# ReLU - https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu
def relu(x):
  return tf.maximum(0, x)

# Pass toy tensor through ReLU function
relu(A)
# ReLU - https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu
def relu(x):
  return tf.maximum(0, x)

# Pass toy tensor through ReLU function
relu(A)

Out[36]:

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 2., 3., 4., 5., 6.,
       7., 8., 9.], dtype=float32)>

How does the ReLU-modified tensor look?

In [37]:

Copied!

# Plot ReLU-modified tensor
plt.plot(relu(A));
# Plot ReLU-modified tensor
plt.plot(relu(A));

Another non-straight line!

Well, how about TensorFlow's linear activation function?

In [38]:

Copied!

# Linear - https://www.tensorflow.org/api_docs/python/tf/keras/activations/linear (returns input non-modified...)
tf.keras.activations.linear(A)
# Linear - https://www.tensorflow.org/api_docs/python/tf/keras/activations/linear (returns input non-modified...)
tf.keras.activations.linear(A)

Out[38]:

<tf.Tensor: shape=(20,), dtype=float32, numpy=
array([-10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
         1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
      dtype=float32)>

Hmm, it looks like our inputs are unmodified...

In [39]:

Copied!

# Does the linear activation change anything?
A == tf.keras.activations.linear(A)
# Does the linear activation change anything?
A == tf.keras.activations.linear(A)

Out[39]:

<tf.Tensor: shape=(20,), dtype=bool, numpy=
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True])>

Okay, so it makes sense now the model doesn't really learn anything when using only linear activation functions, because the linear activation function doesn't change our input data in anyway.

Where as, with our non-linear functions, our data gets manipulated. A neural network uses these kind of transformations at a large scale to figure draw patterns between its inputs and outputs.

Now rather than dive into the guts of neural networks, we're going to keep coding applying what we've learned to different problems but if you want a more in-depth look at what's going on behind the scenes, check out the Extra Curriculum section below.

📖 Resource: For more on activation functions, check out the machine learning cheatsheet page on them.

Evaluating and improving our classification model¶

If you answered the question above, you might've picked up what we've been doing wrong.

We've been evaluating our model on the same data it was trained on.

A better approach would be to split our data into training, validation (optional) and test sets.

Once we've done that, we'll train our model on the training set (let it find patterns in the data) and then see how well it learned the patterns by using it to predict values on the test set.

Let's do it.

In [40]:

Copied!

# How many examples are in the whole dataset?
len(X)
# How many examples are in the whole dataset?
len(X)

Out[40]:

In [41]:

Copied!





# Split data into train and test sets
X_train, y_train = X[:800], y[:800] # 80% of the data for the training set
X_test, y_test = X[800:], y[800:] # 20% of the data for the test set

# Check the shapes of the data
X_train.shape, X_test.shape # 800 examples in the training set, 200 examples in the test set
# Split data into train and test sets
X_train, y_train = X[:800], y[:800] # 80% of the data for the training set
X_test, y_test = X[800:], y[800:] # 20% of the data for the test set

# Check the shapes of the data
X_train.shape, X_test.shape # 800 examples in the training set, 200 examples in the test set

Out[41]:

((800, 2), (200, 2))

Great, now we've got training and test sets, let's model the training data and evaluate what our model has learned on the test set.

In [42]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create the model (same as model_7)
model_8 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"), # hidden layer 1, using "relu" for activation (same as tf.keras.activations.relu)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid") # output layer, using 'sigmoid' for the output
])

# Compile the model
model_8.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.01), # increase learning rate from 0.001 to 0.01 for faster learning
                metrics=['accuracy'])

# Fit the model
history = model_8.fit(X_train, y_train, epochs=25)
# Set random seed
tf.random.set_seed(42)

# Create the model (same as model_7)
model_8 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"), # hidden layer 1, using "relu" for activation (same as tf.keras.activations.relu)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid") # output layer, using 'sigmoid' for the output
])

# Compile the model
model_8.compile(loss=tf.keras.losses.binary_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.01), # increase learning rate from 0.001 to 0.01 for faster learning
                metrics=['accuracy'])

# Fit the model
history = model_8.fit(X_train, y_train, epochs=25)

Epoch 1/25
25/25 [==============================] - 1s 3ms/step - loss: 0.6902 - accuracy: 0.5075
Epoch 2/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6878 - accuracy: 0.5325
Epoch 3/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6862 - accuracy: 0.5200
Epoch 4/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6830 - accuracy: 0.5462
Epoch 5/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6770 - accuracy: 0.5700
Epoch 6/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6676 - accuracy: 0.5450
Epoch 7/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6511 - accuracy: 0.6650
Epoch 8/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6341 - accuracy: 0.7075
Epoch 9/25
25/25 [==============================] - 0s 3ms/step - loss: 0.6085 - accuracy: 0.7588
Epoch 10/25
25/25 [==============================] - 0s 3ms/step - loss: 0.5821 - accuracy: 0.7538
Epoch 11/25
25/25 [==============================] - 0s 3ms/step - loss: 0.5565 - accuracy: 0.7763
Epoch 12/25
25/25 [==============================] - 0s 3ms/step - loss: 0.5284 - accuracy: 0.7962
Epoch 13/25
25/25 [==============================] - 0s 3ms/step - loss: 0.5035 - accuracy: 0.8037
Epoch 14/25
25/25 [==============================] - 0s 3ms/step - loss: 0.4585 - accuracy: 0.8450
Epoch 15/25
25/25 [==============================] - 0s 3ms/step - loss: 0.4156 - accuracy: 0.8838
Epoch 16/25
25/25 [==============================] - 0s 3ms/step - loss: 0.3831 - accuracy: 0.8975
Epoch 17/25
25/25 [==============================] - 0s 3ms/step - loss: 0.3542 - accuracy: 0.9125
Epoch 18/25
25/25 [==============================] - 0s 3ms/step - loss: 0.3288 - accuracy: 0.9312
Epoch 19/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2976 - accuracy: 0.9588
Epoch 20/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2850 - accuracy: 0.9450
Epoch 21/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2571 - accuracy: 0.9563
Epoch 22/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2353 - accuracy: 0.9613
Epoch 23/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2209 - accuracy: 0.9725
Epoch 24/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2108 - accuracy: 0.9588
Epoch 25/25
25/25 [==============================] - 0s 3ms/step - loss: 0.2010 - accuracy: 0.9700

In [43]:

Copied!





# Evaluate our model on the test set
loss, accuracy = model_8.evaluate(X_test, y_test)
print(f"Model loss on the test set: {loss}")
print(f"Model accuracy on the test set: {100*accuracy:.2f}%")
# Evaluate our model on the test set
loss, accuracy = model_8.evaluate(X_test, y_test)
print(f"Model loss on the test set: {loss}")
print(f"Model accuracy on the test set: {100*accuracy:.2f}%")

7/7 [==============================] - 0s 3ms/step - loss: 0.2052 - accuracy: 0.9700
Model loss on the test set: 0.20516908168792725
Model accuracy on the test set: 97.00%

100% accuracy? Nice!

Now, when we started to create model_8 we said it was going to be the same as model_7 but you might've found that to be a little lie.

That's because we changed a few things:

The activation parameter - We used strings ("relu" & "sigmoid") instead of using library paths (tf.keras.activations.relu), in TensorFlow, they both offer the same functionality.
The learning_rate (also lr) parameter - We increased the learning rate parameter in the Adam optimizer to 0.01 instead of 0.001 (an increase of 10x).
- You can think of the learning rate as how quickly a model learns. The higher the learning rate, the faster the model's capacity to learn, however, there's such a thing as a too high learning rate, where a model tries to learn too fast and doesn't learn anything. We'll see a trick to find the ideal learning rate soon.
The number of epochs - We lowered the number of epochs (using the epochs parameter) from 100 to 25 but our model still got an incredible result on both the training and test sets.
- One of the reasons our model performed well in even less epochs (remember a single epoch is the model trying to learn patterns in the data by looking at it once, so 25 epochs means the model gets 25 chances) than before is because we increased the learning rate.

We know our model is performing well based on the evaluation metrics but let's see how it performs visually.

In [44]:

Copied!





# Plot the decision boundaries for the training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_8, X=X_train, y=y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_8, X=X_test, y=y_test)
plt.show()
# Plot the decision boundaries for the training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_8, X=X_train, y=y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_8, X=X_test, y=y_test)
plt.show()

313/313 [==============================] - 0s 1ms/step
doing binary classifcation...
313/313 [==============================] - 0s 1ms/step
doing binary classifcation...

Check that out! How cool. With a few tweaks, our model is now predicting the blue and red circles almost perfectly.

Plot the loss curves¶

Looking at the plots above, we can see the outputs of our model are very good.

But how did our model go whilst it was learning?

As in, how did the performance change everytime the model had a chance to look at the data (once every epoch)?

To figure this out, we can check the loss curves (also referred to as the learning curves).

You might've seen we've been using the variable history when calling the fit() function on a model (fit() returns a History object).

This is where we'll get the information for how our model is performing as it learns.

Let's see how we might use it.

In [45]:

Copied!

# You can access the information in the history variable using the .history attribute
pd.DataFrame(history.history)
# You can access the information in the history variable using the .history attribute
pd.DataFrame(history.history)

Out[45]:

	loss	accuracy
0	0.690183	0.50750
1	0.687798	0.53250
2	0.686171	0.52000
3	0.683011	0.54625
4	0.677036	0.57000
5	0.667617	0.54500
6	0.651129	0.66500
7	0.634132	0.70750
8	0.608484	0.75875
9	0.582073	0.75375
10	0.556544	0.77625
11	0.528435	0.79625
12	0.503492	0.80375
13	0.458543	0.84500
14	0.415571	0.88375
15	0.383102	0.89750
16	0.354211	0.91250
17	0.328809	0.93125
18	0.297566	0.95875
19	0.285039	0.94500
20	0.257121	0.95625
21	0.235265	0.96125
22	0.220926	0.97250
23	0.210754	0.95875
24	0.201047	0.97000

Inspecting the outputs, we can see the loss values going down and the accuracy going up.

How's it look (visualize, visualize, visualize)?

In [46]:

Copied!

# Plot the loss curves
pd.DataFrame(history.history).plot()
plt.title("Model_8 training curves")
# Plot the loss curves
pd.DataFrame(history.history).plot()
plt.title("Model_8 training curves")

Out[46]:

Text(0.5, 1.0, 'Model_8 training curves')

Beautiful. This is the ideal plot we'd be looking for when dealing with a classification problem, loss going down, accuracy going up.

🔑 Note: For many problems, the loss function going down means the model is improving (the predictions it's making are getting closer to the ground truth labels).

Finding the best learning rate¶

Aside from the architecture itself (the layers, number of neurons, activations, etc), the most important hyperparameter you can tune for your neural network models is the learning rate.

In model_8 you saw we lowered the Adam optimizer's learning rate from the default of 0.001 (default) to 0.01.

And you might be wondering why we did this.

Put it this way, it was a lucky guess.

I just decided to try a lower learning rate and see how the model went.

Now you might be thinking, "Seriously? You can do that?"

And the answer is yes. You can change any of the hyperparamaters of your neural networks.

With practice, you'll start to see what kind of hyperparameters work and what don't.

That's an important thing to understand about machine learning and deep learning in general. It's very experimental. You build a model and evaluate it, build a model and evaluate it.

That being said, I want to introduce you a trick which will help you find the optimal learning rate (at least to begin training with) for your models going forward.

To do so, we're going to use the following:

A learning rate callback.
- You can think of a callback as an extra piece of functionality you can add to your model while its training.
Another model (we could use the same ones as above, we we're practicing building models here).
A modified loss curves plot.

We'll go through each with code, then explain what's going on.

🔑 Note: The default hyperparameters of many neural network building blocks in TensorFlow are setup in a way which usually work right out of the box (e.g. the Adam optimizer's default settings can usually get good results on many datasets). So it's a good idea to try the defaults first, then adjust as needed.

In [47]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create a model (same as model_8)
model_9 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model
model_9.compile(loss="binary_crossentropy", # we can use strings here too
              optimizer="Adam", # same as tf.keras.optimizers.Adam() with default settings
              metrics=["accuracy"])

# Create a learning rate scheduler callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20)) # traverse a set of learning rate values starting from 1e-4, increasing by 10**(epoch/20) every epoch

# Fit the model (passing the lr_scheduler callback)
history = model_9.fit(X_train,
                      y_train,
                      epochs=100,
                      callbacks=[lr_scheduler])
# Set random seed
tf.random.set_seed(42)

# Create a model (same as model_8)
model_9 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model
model_9.compile(loss="binary_crossentropy", # we can use strings here too
              optimizer="Adam", # same as tf.keras.optimizers.Adam() with default settings
              metrics=["accuracy"])

# Create a learning rate scheduler callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20)) # traverse a set of learning rate values starting from 1e-4, increasing by 10**(epoch/20) every epoch

# Fit the model (passing the lr_scheduler callback)
history = model_9.fit(X_train,
                      y_train,
                      epochs=100,
                      callbacks=[lr_scheduler])

Epoch 1/100
25/25 [==============================] - 1s 3ms/step - loss: 0.6918 - accuracy: 0.5088 - lr: 1.0000e-04
Epoch 2/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6918 - accuracy: 0.5038 - lr: 1.1220e-04
Epoch 3/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6917 - accuracy: 0.5038 - lr: 1.2589e-04
Epoch 4/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6917 - accuracy: 0.5025 - lr: 1.4125e-04
Epoch 5/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6917 - accuracy: 0.5063 - lr: 1.5849e-04
Epoch 6/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6916 - accuracy: 0.5050 - lr: 1.7783e-04
Epoch 7/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6915 - accuracy: 0.5088 - lr: 1.9953e-04
Epoch 8/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6915 - accuracy: 0.5075 - lr: 2.2387e-04
Epoch 9/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6914 - accuracy: 0.5088 - lr: 2.5119e-04
Epoch 10/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6913 - accuracy: 0.5088 - lr: 2.8184e-04
Epoch 11/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6912 - accuracy: 0.5075 - lr: 3.1623e-04
Epoch 12/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6912 - accuracy: 0.5100 - lr: 3.5481e-04
Epoch 13/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6910 - accuracy: 0.5113 - lr: 3.9811e-04
Epoch 14/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6909 - accuracy: 0.5138 - lr: 4.4668e-04
Epoch 15/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6907 - accuracy: 0.5150 - lr: 5.0119e-04
Epoch 16/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6905 - accuracy: 0.5188 - lr: 5.6234e-04
Epoch 17/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6903 - accuracy: 0.5225 - lr: 6.3096e-04
Epoch 18/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6901 - accuracy: 0.5238 - lr: 7.0795e-04
Epoch 19/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6899 - accuracy: 0.5250 - lr: 7.9433e-04
Epoch 20/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6895 - accuracy: 0.5288 - lr: 8.9125e-04
Epoch 21/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6891 - accuracy: 0.5300 - lr: 0.0010
Epoch 22/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6885 - accuracy: 0.5375 - lr: 0.0011
Epoch 23/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6880 - accuracy: 0.5350 - lr: 0.0013
Epoch 24/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6868 - accuracy: 0.5412 - lr: 0.0014
Epoch 25/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6856 - accuracy: 0.5350 - lr: 0.0016
Epoch 26/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6844 - accuracy: 0.5387 - lr: 0.0018
Epoch 27/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6830 - accuracy: 0.5425 - lr: 0.0020
Epoch 28/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6811 - accuracy: 0.5437 - lr: 0.0022
Epoch 29/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6794 - accuracy: 0.5475 - lr: 0.0025
Epoch 30/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6773 - accuracy: 0.5587 - lr: 0.0028
Epoch 31/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6743 - accuracy: 0.5600 - lr: 0.0032
Epoch 32/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6712 - accuracy: 0.5788 - lr: 0.0035
Epoch 33/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6668 - accuracy: 0.5938 - lr: 0.0040
Epoch 34/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6611 - accuracy: 0.6300 - lr: 0.0045
Epoch 35/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6541 - accuracy: 0.6513 - lr: 0.0050
Epoch 36/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6456 - accuracy: 0.6400 - lr: 0.0056
Epoch 37/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6316 - accuracy: 0.6862 - lr: 0.0063
Epoch 38/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6141 - accuracy: 0.6850 - lr: 0.0071
Epoch 39/100
25/25 [==============================] - 0s 3ms/step - loss: 0.5986 - accuracy: 0.7287 - lr: 0.0079
Epoch 40/100
25/25 [==============================] - 0s 3ms/step - loss: 0.5662 - accuracy: 0.7588 - lr: 0.0089
Epoch 41/100
25/25 [==============================] - 0s 3ms/step - loss: 0.5288 - accuracy: 0.7663 - lr: 0.0100
Epoch 42/100
25/25 [==============================] - 0s 3ms/step - loss: 0.5150 - accuracy: 0.7588 - lr: 0.0112
Epoch 43/100
25/25 [==============================] - 0s 3ms/step - loss: 0.5029 - accuracy: 0.7837 - lr: 0.0126
Epoch 44/100
25/25 [==============================] - 0s 3ms/step - loss: 0.4472 - accuracy: 0.8213 - lr: 0.0141
Epoch 45/100
25/25 [==============================] - 0s 3ms/step - loss: 0.3744 - accuracy: 0.8662 - lr: 0.0158
Epoch 46/100
25/25 [==============================] - 0s 3ms/step - loss: 0.2755 - accuracy: 0.9312 - lr: 0.0178
Epoch 47/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1886 - accuracy: 0.9600 - lr: 0.0200
Epoch 48/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1305 - accuracy: 0.9812 - lr: 0.0224
Epoch 49/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1161 - accuracy: 0.9762 - lr: 0.0251
Epoch 50/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0870 - accuracy: 0.9900 - lr: 0.0282
Epoch 51/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1532 - accuracy: 0.9325 - lr: 0.0316
Epoch 52/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0809 - accuracy: 0.9775 - lr: 0.0355
Epoch 53/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1048 - accuracy: 0.9600 - lr: 0.0398
Epoch 54/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0923 - accuracy: 0.9600 - lr: 0.0447
Epoch 55/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1136 - accuracy: 0.9538 - lr: 0.0501
Epoch 56/100
25/25 [==============================] - 0s 3ms/step - loss: 0.2233 - accuracy: 0.9200 - lr: 0.0562
Epoch 57/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1119 - accuracy: 0.9563 - lr: 0.0631
Epoch 58/100
25/25 [==============================] - 0s 3ms/step - loss: 0.2204 - accuracy: 0.9125 - lr: 0.0708
Epoch 59/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0781 - accuracy: 0.9725 - lr: 0.0794
Epoch 60/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0548 - accuracy: 0.9787 - lr: 0.0891
Epoch 61/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0916 - accuracy: 0.9688 - lr: 0.1000
Epoch 62/100
25/25 [==============================] - 0s 3ms/step - loss: 0.4185 - accuracy: 0.8838 - lr: 0.1122
Epoch 63/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1851 - accuracy: 0.9350 - lr: 0.1259
Epoch 64/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0893 - accuracy: 0.9688 - lr: 0.1413
Epoch 65/100
25/25 [==============================] - 0s 3ms/step - loss: 0.0730 - accuracy: 0.9775 - lr: 0.1585
Epoch 66/100
25/25 [==============================] - 0s 3ms/step - loss: 0.3190 - accuracy: 0.9050 - lr: 0.1778
Epoch 67/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6285 - accuracy: 0.6950 - lr: 0.1995
Epoch 68/100
25/25 [==============================] - 0s 3ms/step - loss: 0.4084 - accuracy: 0.7812 - lr: 0.2239
Epoch 69/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1803 - accuracy: 0.9350 - lr: 0.2512
Epoch 70/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1364 - accuracy: 0.9500 - lr: 0.2818
Epoch 71/100
25/25 [==============================] - 0s 3ms/step - loss: 0.3171 - accuracy: 0.9013 - lr: 0.3162
Epoch 72/100
25/25 [==============================] - 0s 3ms/step - loss: 0.2477 - accuracy: 0.9150 - lr: 0.3548
Epoch 73/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1354 - accuracy: 0.9525 - lr: 0.3981
Epoch 74/100
25/25 [==============================] - 0s 3ms/step - loss: 0.3741 - accuracy: 0.8712 - lr: 0.4467
Epoch 75/100
25/25 [==============================] - 0s 3ms/step - loss: 0.3701 - accuracy: 0.8550 - lr: 0.5012
Epoch 76/100
25/25 [==============================] - 0s 3ms/step - loss: 0.2392 - accuracy: 0.9125 - lr: 0.5623
Epoch 77/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1684 - accuracy: 0.9312 - lr: 0.6310
Epoch 78/100
25/25 [==============================] - 0s 3ms/step - loss: 0.1817 - accuracy: 0.9312 - lr: 0.7079
Epoch 79/100
25/25 [==============================] - 0s 3ms/step - loss: 0.4512 - accuracy: 0.8250 - lr: 0.7943
Epoch 80/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6677 - accuracy: 0.5938 - lr: 0.8913
Epoch 81/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6764 - accuracy: 0.5400 - lr: 1.0000
Epoch 82/100
25/25 [==============================] - 0s 3ms/step - loss: 0.6678 - accuracy: 0.5775 - lr: 1.1220
Epoch 83/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7416 - accuracy: 0.4900 - lr: 1.2589
Epoch 84/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7138 - accuracy: 0.5063 - lr: 1.4125
Epoch 85/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7237 - accuracy: 0.5038 - lr: 1.5849
Epoch 86/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7751 - accuracy: 0.5063 - lr: 1.7783
Epoch 87/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7665 - accuracy: 0.5063 - lr: 1.9953
Epoch 88/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7566 - accuracy: 0.5163 - lr: 2.2387
Epoch 89/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7552 - accuracy: 0.4938 - lr: 2.5119
Epoch 90/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7816 - accuracy: 0.5238 - lr: 2.8184
Epoch 91/100
25/25 [==============================] - 0s 3ms/step - loss: 0.8113 - accuracy: 0.5213 - lr: 3.1623
Epoch 92/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7351 - accuracy: 0.4888 - lr: 3.5481
Epoch 93/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7429 - accuracy: 0.5063 - lr: 3.9811
Epoch 94/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7601 - accuracy: 0.5063 - lr: 4.4668
Epoch 95/100
25/25 [==============================] - 0s 3ms/step - loss: 0.8247 - accuracy: 0.4863 - lr: 5.0119
Epoch 96/100
25/25 [==============================] - 0s 3ms/step - loss: 0.7877 - accuracy: 0.4737 - lr: 5.6234
Epoch 97/100
25/25 [==============================] - 0s 3ms/step - loss: 0.8081 - accuracy: 0.5013 - lr: 6.3096
Epoch 98/100
25/25 [==============================] - 0s 3ms/step - loss: 0.9653 - accuracy: 0.4963 - lr: 7.0795
Epoch 99/100
25/25 [==============================] - 0s 3ms/step - loss: 0.9762 - accuracy: 0.4913 - lr: 7.9433
Epoch 100/100
25/25 [==============================] - 0s 3ms/step - loss: 0.8582 - accuracy: 0.4613 - lr: 8.9125

Now our model has finished training, let's have a look at the training history.

In [48]:

Copied!

# Checkout the history
pd.DataFrame(history.history).plot(figsize=(10,7), xlabel="epochs");
# Checkout the history
pd.DataFrame(history.history).plot(figsize=(10,7), xlabel="epochs");

As you you see the learning rate exponentially increases as the number of epochs increases.

And you can see the model's accuracy goes up (and loss goes down) at a specific point when the learning rate slowly increases.

To figure out where this infliction point is, we can plot the loss versus the log-scale learning rate.

In [49]:

Copied!





# Plot the learning rate versus the loss
lrs = 1e-4 * (10 ** (np.arange(100)/20))
plt.figure(figsize=(10, 7))
plt.semilogx(lrs, history.history["loss"]) # we want the x-axis (learning rate) to be log scale
plt.xlabel("Learning Rate")
plt.ylabel("Loss")
plt.title("Learning rate vs. loss");
# Plot the learning rate versus the loss
lrs = 1e-4 * (10 ** (np.arange(100)/20))
plt.figure(figsize=(10, 7))
plt.semilogx(lrs, history.history["loss"]) # we want the x-axis (learning rate) to be log scale
plt.xlabel("Learning Rate")
plt.ylabel("Loss")
plt.title("Learning rate vs. loss");

To figure out the ideal value of the learning rate (at least the ideal value to begin training our model), the rule of thumb is to take the learning rate value where the loss is still decreasing but not quite flattened out (usually about 10x smaller than the bottom of the curve).

In this case, our ideal learning rate ends up between 0.01 ($10^{-2}$) and 0.02.

finding the ideal learning rate by plotting learning rate vs. loss

The ideal learning rate at the start of model training is somewhere just before the loss curve bottoms out (a value where the loss is still decreasing).

In [50]:

Copied!

# Example of other typical learning rate values
10**0, 10**-1, 10**-2, 10**-3, 1e-4
# Example of other typical learning rate values
10**0, 10**-1, 10**-2, 10**-3, 1e-4

Out[50]:

(1, 0.1, 0.01, 0.001, 0.0001)

Now we've estimated the ideal learning rate (we'll use 0.02) for our model, let's refit it.

In [51]:

Copied!





# Set the random seed
tf.random.set_seed(42)

# Create the model
model_10 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model with the ideal learning rate
model_10.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), # to adjust the learning rate, you need to use tf.keras.optimizers.Adam (not "adam")
                metrics=["accuracy"])

# Fit the model for 20 epochs (5 less than before)
history = model_10.fit(X_train, y_train, epochs=20)
# Set the random seed
tf.random.set_seed(42)

# Create the model
model_10 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model with the ideal learning rate
model_10.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), # to adjust the learning rate, you need to use tf.keras.optimizers.Adam (not "adam")
                metrics=["accuracy"])

# Fit the model for 20 epochs (5 less than before)
history = model_10.fit(X_train, y_train, epochs=20)

Epoch 1/20
25/25 [==============================] - 2s 3ms/step - loss: 0.6844 - accuracy: 0.5650
Epoch 2/20
25/25 [==============================] - 0s 3ms/step - loss: 0.6692 - accuracy: 0.6612
Epoch 3/20
25/25 [==============================] - 0s 3ms/step - loss: 0.6408 - accuracy: 0.7250
Epoch 4/20
25/25 [==============================] - 0s 3ms/step - loss: 0.5839 - accuracy: 0.7812
Epoch 5/20
25/25 [==============================] - 0s 3ms/step - loss: 0.5135 - accuracy: 0.8250
Epoch 6/20
25/25 [==============================] - 0s 3ms/step - loss: 0.4106 - accuracy: 0.9187
Epoch 7/20
25/25 [==============================] - 0s 3ms/step - loss: 0.3194 - accuracy: 0.9513
Epoch 8/20
25/25 [==============================] - 0s 3ms/step - loss: 0.2379 - accuracy: 0.9762
Epoch 9/20
25/25 [==============================] - 0s 3ms/step - loss: 0.1835 - accuracy: 0.9850
Epoch 10/20
25/25 [==============================] - 0s 3ms/step - loss: 0.1439 - accuracy: 0.9925
Epoch 11/20
25/25 [==============================] - 0s 3ms/step - loss: 0.1122 - accuracy: 0.9950
Epoch 12/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0928 - accuracy: 0.9937
Epoch 13/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0849 - accuracy: 0.9937
Epoch 14/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0818 - accuracy: 0.9875
Epoch 15/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0714 - accuracy: 0.9925
Epoch 16/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0624 - accuracy: 0.9950
Epoch 17/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0535 - accuracy: 0.9912
Epoch 18/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0501 - accuracy: 0.9975
Epoch 19/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0588 - accuracy: 0.9875
Epoch 20/20
25/25 [==============================] - 0s 3ms/step - loss: 0.0470 - accuracy: 0.9887

Nice! With a little higher learning rate (0.02 instead of 0.01) we reach a higher accuracy than model_8 in less epochs (20 instead of 25).

🛠 Practice: Now you've seen an example of what can happen when you change the learning rate, try changing the learning rate value in the TensorFlow Playground and see what happens. What happens if you increase it? What happens if you decrease it?

In [52]:

Copied!

# Evaluate model on the test dataset
model_10.evaluate(X_test, y_test)
# Evaluate model on the test dataset
model_10.evaluate(X_test, y_test)

7/7 [==============================] - 0s 3ms/step - loss: 0.0425 - accuracy: 1.0000

Out[52]:

[0.042508091777563095, 1.0]

Let's see how the predictions look.

In [53]:

Copied!





# Plot the decision boundaries for the training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_10, X=X_train, y=y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_10, X=X_test, y=y_test)
plt.show()
# Plot the decision boundaries for the training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_10, X=X_train, y=y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_10, X=X_test, y=y_test)
plt.show()

313/313 [==============================] - 0s 1ms/step
doing binary classifcation...
313/313 [==============================] - 0s 1ms/step
doing binary classifcation...

And as we can see, almost perfect again.

These are the kind of experiments you'll be running often when building your own models.

Start with default settings and see how they perform on your data.

And if they don't perform as well as you'd like, improve them.

Let's look at a few more ways to evaluate our classification models.

More classification evaluation methods¶

Alongside the visualizations we've been making, there are a number of different evaluation metrics we can use to evaluate our classification models.

Metric name/Evaluation method	Defintion	Code
Accuracy	Out of 100 predictions, how many does your model get correct? E.g. 95% accuracy means it gets 95/100 predictions correct.	`sklearn.metrics.accuracy_score()` or `tf.keras.metrics.Accuracy()`
Precision	Proportion of true positives over total number of samples. Higher precision leads to less false positives (model predicts 1 when it should've been 0).	`sklearn.metrics.precision_score()` or `tf.keras.metrics.Precision()`
Recall	Proportion of true positives over total number of true positives and false negatives (model predicts 0 when it should've been 1). Higher recall leads to less false negatives.	`sklearn.metrics.recall_score()` or `tf.keras.metrics.Recall()`
F1-score	Combines precision and recall into one metric. 1 is best, 0 is worst.	`sklearn.metrics.f1_score()`
Confusion matrix	Compares the predicted values with the true values in a tabular way, if 100% correct, all values in the matrix will be top left to bottom right (diagnol line).	Custom function or `sklearn.metrics.plot_confusion_matrix()`
Classification report	Collection of some of the main classification metrics such as precision, recall and f1-score.	`sklearn.metrics.classification_report()`

🔑 Note: Every classification problem will require different kinds of evaluation methods. But you should be familiar with at least the ones above.

Let's start with accuracy.

Because we passed ["accuracy"] to the metrics parameter when we compiled our model, calling evaluate() on it will return the loss as well as accuracy.

In [54]:

Copied!





# Check the accuracy of our model
loss, accuracy = model_10.evaluate(X_test, y_test)
print(f"Model loss on test set: {loss}")
print(f"Model accuracy on test set: {(accuracy*100):.2f}%")
# Check the accuracy of our model
loss, accuracy = model_10.evaluate(X_test, y_test)
print(f"Model loss on test set: {loss}")
print(f"Model accuracy on test set: {(accuracy*100):.2f}%")

7/7 [==============================] - 0s 3ms/step - loss: 0.0425 - accuracy: 1.0000
Model loss on test set: 0.042508091777563095
Model accuracy on test set: 100.00%

How about a confusion matrix?

Anatomy of a confusion matrix (what we're going to be creating). Correct predictions appear down the diagonal (from top left to bottom right).

We can make a confusion matrix using Scikit-Learn's confusion_matrix method.

In [58]:

Copied!





# Create a confusion matrix
from sklearn.metrics import confusion_matrix

# Make predictions
y_preds = model_10.predict(X_test)

# Create confusion matrix
confusion_matrix(y_test, y_preds)
# Create a confusion matrix
from sklearn.metrics import confusion_matrix

# Make predictions
y_preds = model_10.predict(X_test)

# Create confusion matrix
confusion_matrix(y_test, y_preds)

7/7 [==============================] - 0s 2ms/step

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-f9843efd97f5> in <cell line: 8>()
      6 
      7 # Create confusion matrix
----> 8 confusion_matrix(y_test, y_preds)

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    315     (0, 2, 1, 1)
    316     """
--> 317     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    318     if y_type not in ("binary", "multiclass"):
    319         raise ValueError("%s is not supported" % y_type)

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     93 
     94     if len(y_type) > 1:
---> 95         raise ValueError(
     96             "Classification metrics can't handle a mix of {0} and {1} targets".format(
     97                 type_true, type_pred

ValueError: Classification metrics can't handle a mix of binary and continuous targets

Ahh, it seems our predictions aren't in the format they need to be.

Let's check them out.

In [59]:

Copied!

# View the first 10 predictions
y_preds[:10]
# View the first 10 predictions
y_preds[:10]

Out[59]:

array([[0.9740965 ],
       [0.9740965 ],
       [0.9740965 ],
       [0.9740965 ],
       [0.47072026],
       [0.00771922],
       [0.9740965 ],
       [0.00127994],
       [0.9740965 ],
       [0.00113649]], dtype=float32)

What about our test labels?

In [60]:

Copied!

# View the first 10 test labels
y_test[:10]
# View the first 10 test labels
y_test[:10]

Out[60]:

array([1, 1, 1, 1, 0, 0, 1, 0, 1, 0])

It looks like we need to get our predictions into the binary format (0 or 1).

But you might be wondering, what format are they currently in?

In their current format (9.8526537e-01), they're in a form called prediction probabilities.

You'll see this often with the outputs of neural networks. Often they won't be exact values but more a probability of how likely they are to be one value or another.

So one of the steps you'll often see after making predicitons with a neural network is converting the prediction probabilities into labels.

In our case, since our ground truth labels (y_test) are binary (0 or 1), we can convert the prediction probabilities using to their binary form using tf.round().

In [61]:

Copied!

# Convert prediction probabilities to binary format and view the first 10
tf.round(y_preds)[:10]
# Convert prediction probabilities to binary format and view the first 10
tf.round(y_preds)[:10]

Out[61]:

<tf.Tensor: shape=(10, 1), dtype=float32, numpy=
array([[1.],
       [1.],
       [1.],
       [1.],
       [0.],
       [0.],
       [1.],
       [0.],
       [1.],
       [0.]], dtype=float32)>

Wonderful! Now we can use the confusion_matrix function.

In [62]:

Copied!

# Create a confusion matrix
confusion_matrix(y_test, tf.round(y_preds))
# Create a confusion matrix
confusion_matrix(y_test, tf.round(y_preds))

Out[62]:

array([[101,   0],
       [  0,  99]])

Alright, we can see the highest numbers are down the diagonal (from top left to bottom right) so this a good sign, but the rest of the matrix doesn't really tell us much.

How about we make a function to make our confusion matrix a little more visual?

In [63]:

Copied!





# Note: The following confusion matrix code is a remix of Scikit-Learn's
# plot_confusion_matrix function - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
# and Made with ML's introductory notebook - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
import itertools

figsize = (10, 10)

# Create the confusion matrix
cm = confusion_matrix(y_test, tf.round(y_preds))
cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize it
n_classes = cm.shape[0]

# Let's prettify it
fig, ax = plt.subplots(figsize=figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap=plt.cm.Blues) # https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.axes.Axes.matshow.html
fig.colorbar(cax)

# Create classes
classes = False

if classes:
  labels = classes
else:
  labels = np.arange(cm.shape[0])

# Label the axes
ax.set(title="Confusion Matrix",
       xlabel="Predicted label",
       ylabel="True label",
       xticks=np.arange(n_classes),
       yticks=np.arange(n_classes),
       xticklabels=labels,
       yticklabels=labels)

# Set x-axis labels to bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()

# Adjust label size
ax.xaxis.label.set_size(20)
ax.yaxis.label.set_size(20)
ax.title.set_size(20)

# Set threshold for different colors
threshold = (cm.max() + cm.min()) / 2.

# Plot the text on each cell
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
           horizontalalignment="center",
           color="white" if cm[i, j] > threshold else "black",
           size=15)
# Note: The following confusion matrix code is a remix of Scikit-Learn's
# plot_confusion_matrix function - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
# and Made with ML's introductory notebook - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
import itertools

figsize = (10, 10)

# Create the confusion matrix
cm = confusion_matrix(y_test, tf.round(y_preds))
cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize it
n_classes = cm.shape[0]

# Let's prettify it
fig, ax = plt.subplots(figsize=figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap=plt.cm.Blues) # https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.axes.Axes.matshow.html
fig.colorbar(cax)

# Create classes
classes = False

if classes:
  labels = classes
else:
  labels = np.arange(cm.shape[0])

# Label the axes
ax.set(title="Confusion Matrix",
       xlabel="Predicted label",
       ylabel="True label",
       xticks=np.arange(n_classes),
       yticks=np.arange(n_classes),
       xticklabels=labels,
       yticklabels=labels)

# Set x-axis labels to bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()

# Adjust label size
ax.xaxis.label.set_size(20)
ax.yaxis.label.set_size(20)
ax.title.set_size(20)

# Set threshold for different colors
threshold = (cm.max() + cm.min()) / 2.

# Plot the text on each cell
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
           horizontalalignment="center",
           color="white" if cm[i, j] > threshold else "black",
           size=15)

That looks much better. It seems our model has made almost perfect predictions on the test set except for two false positives (top right corner).

In [64]:

Copied!





# What does itertools.product do? Combines two things into each combination
import itertools
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  print(i, j)
# What does itertools.product do? Combines two things into each combination
import itertools
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  print(i, j)

Working with a larger example (multiclass classification)¶

We've seen a binary classification example (predicting if a data point is part of a red circle or blue circle) but what if you had multiple different classes of things?

For example, say you were a fashion company and you wanted to build a neural network to predict whether a piece of clothing was a shoe, a shirt or a jacket (3 different options).

When you have more than two classes as an option, this is known as multiclass classification.

The good news is, the things we've learned so far (with a few tweaks) can be applied to multiclass classification problems as well.

Let's see it in action.

To start, we'll need some data. The good thing for us is TensorFlow has a multiclass classication dataset known as Fashion MNIST built-in. Meaning we can get started straight away.

We can import it using the tf.keras.datasets module.

📖 Resource: The following multiclass classification problem has been adapted from the TensorFlow classification guide. A good exercise would be to once you've gone through the following example, replicate the TensorFlow guide.

In [65]:

Copied!

import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

# The data has already been sorted into training and test sets for us
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

# The data has already been sorted into training and test sets for us
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 [==============================] - 0s 0us/step

Now let's check out an example.

In [66]:

Copied!

# Show the first training example
print(f"Training sample:\n{train_data[0]}\n")
print(f"Training label: {train_labels[0]}")
# Show the first training example
print(f"Training sample:\n{train_data[0]}\n")
print(f"Training label: {train_labels[0]}")

Training sample:
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0  13  73   0
    0   1   4   0   0   0   0   1   1   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   3   0  36 136 127  62
   54   0   0   0   1   3   4   0   0   3]
 [  0   0   0   0   0   0   0   0   0   0   0   0   6   0 102 204 176 134
  144 123  23   0   0   0   0  12  10   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0 155 236 207 178
  107 156 161 109  64  23  77 130  72  15]
 [  0   0   0   0   0   0   0   0   0   0   0   1   0  69 207 223 218 216
  216 163 127 121 122 146 141  88 172  66]
 [  0   0   0   0   0   0   0   0   0   1   1   1   0 200 232 232 233 229
  223 223 215 213 164 127 123 196 229   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0 183 225 216 223 228
  235 227 224 222 224 221 223 245 173   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0 193 228 218 213 198
  180 212 210 211 213 223 220 243 202   0]
 [  0   0   0   0   0   0   0   0   0   1   3   0  12 219 220 212 218 192
  169 227 208 218 224 212 226 197 209  52]
 [  0   0   0   0   0   0   0   0   0   0   6   0  99 244 222 220 218 203
  198 221 215 213 222 220 245 119 167  56]
 [  0   0   0   0   0   0   0   0   0   4   0   0  55 236 228 230 228 240
  232 213 218 223 234 217 217 209  92   0]
 [  0   0   1   4   6   7   2   0   0   0   0   0 237 226 217 223 222 219
  222 221 216 223 229 215 218 255  77   0]
 [  0   3   0   0   0   0   0   0   0  62 145 204 228 207 213 221 218 208
  211 218 224 223 219 215 224 244 159   0]
 [  0   0   0   0  18  44  82 107 189 228 220 222 217 226 200 205 211 230
  224 234 176 188 250 248 233 238 215   0]
 [  0  57 187 208 224 221 224 208 204 214 208 209 200 159 245 193 206 223
  255 255 221 234 221 211 220 232 246   0]
 [  3 202 228 224 221 211 211 214 205 205 205 220 240  80 150 255 229 221
  188 154 191 210 204 209 222 228 225   0]
 [ 98 233 198 210 222 229 229 234 249 220 194 215 217 241  65  73 106 117
  168 219 221 215 217 223 223 224 229  29]
 [ 75 204 212 204 193 205 211 225 216 185 197 206 198 213 240 195 227 245
  239 223 218 212 209 222 220 221 230  67]
 [ 48 203 183 194 213 197 185 190 194 192 202 214 219 221 220 236 225 216
  199 206 186 181 177 172 181 205 206 115]
 [  0 122 219 193 179 171 183 196 204 210 213 207 211 210 200 196 194 191
  195 191 198 192 176 156 167 177 210  92]
 [  0   0  74 189 212 191 175 172 175 181 185 188 189 188 193 198 204 209
  210 210 211 188 188 194 192 216 170   0]
 [  2   0   0   0  66 200 222 237 239 242 246 243 244 221 220 193 191 179
  182 182 181 176 166 168  99  58   0   0]
 [  0   0   0   0   0   0   0  40  61  44  72  41  35   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]]

Training label: 9

Woah, we get a large list of numbers, followed (the data) by a single number (the class label).

What about the shapes?

In [67]:

Copied!

# Check the shape of our data
train_data.shape, train_labels.shape, test_data.shape, test_labels.shape
# Check the shape of our data
train_data.shape, train_labels.shape, test_data.shape, test_labels.shape

Out[67]:

((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

In [68]:

Copied!

# Check shape of a single example
train_data[0].shape, train_labels[0].shape
# Check shape of a single example
train_data[0].shape, train_labels[0].shape

Out[68]:

((28, 28), ())

Okay, 60,000 training examples each with shape (28, 28) and a label each as well as 10,000 test examples of shape (28, 28).

But these are just numbers, let's visualize.

In [69]:

Copied!

# Plot a single example
import matplotlib.pyplot as plt
plt.imshow(train_data[7]);
# Plot a single example
import matplotlib.pyplot as plt
plt.imshow(train_data[7]);

Hmm, but what about its label?

In [70]:

Copied!

# Check our samples label
train_labels[7]
# Check our samples label
train_labels[7]

Out[70]:

It looks like our labels are in numerical form. And while this is fine for a neural network, you might want to have them in human readable form.

Let's create a small list of the class names (we can find them on the dataset's GitHub page).

🔑 Note: Whilst this dataset has been prepared for us and ready to go, it's important to remember many datasets won't be ready to go like this one. Often you'll have to do a few preprocessing steps to have it ready to use with a neural network (we'll see more of this when we work with our own data later).

In [71]:

Copied!

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# How many classes are there (this'll be our output shape)?
len(class_names)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# How many classes are there (this'll be our output shape)?
len(class_names)

Out[71]:

Now we have these, let's plot another example.

🤔 Question: Pay particular attention to what the data we're working with looks like. Is it only straight lines? Or does it have non-straight lines as well? Do you think if we wanted to find patterns in the photos of clothes (which are actually collections of pixels), will our model need non-linearities (non-straight lines) or not?

In [72]:

Copied!

# Plot an example image and its label
plt.imshow(train_data[17], cmap=plt.cm.binary) # change the colours to black & white
plt.title(class_names[train_labels[17]]);
# Plot an example image and its label
plt.imshow(train_data[17], cmap=plt.cm.binary) # change the colours to black & white
plt.title(class_names[train_labels[17]]);

In [74]:

Copied!





# Plot multiple random images of fashion MNIST
import random
plt.figure(figsize=(7, 7))
for i in range(4):
  ax = plt.subplot(2, 2, i + 1)
  rand_index = random.choice(range(len(train_data)))
  plt.imshow(train_data[rand_index], cmap=plt.cm.binary)
  plt.title(class_names[train_labels[rand_index]])
  plt.axis(False)
# Plot multiple random images of fashion MNIST
import random
plt.figure(figsize=(7, 7))
for i in range(4):
  ax = plt.subplot(2, 2, i + 1)
  rand_index = random.choice(range(len(train_data)))
  plt.imshow(train_data[rand_index], cmap=plt.cm.binary)
  plt.title(class_names[train_labels[rand_index]])
  plt.axis(False)

Alright, let's build a model to figure out the relationship between the pixel values and their labels.

Since this is a multiclass classification problem, we'll need to make a few changes to our architecture (inline with Table 1 above):

The input shape will have to deal with 28x28 tensors (the height and width of our images).
- We're actually going to squash the input into a tensor (vector) of shape (784).
The output shape will have to be 10 because we need our model to predict for 10 different classes.
- We'll also change the activation parameter of our output layer to be "softmax" instead of 'sigmoid'. As we'll see the "softmax" activation function outputs a series of values between 0 & 1 (the same shape as output shape, which together add up to ~1. The index with the highest value is predicted by the model to be the most likely class.
We'll need to change our loss function from a binary loss function to a multiclass loss function.
- More specifically, since our labels are in integer form, we'll use tf.keras.losses.SparseCategoricalCrossentropy(), if our labels were one-hot encoded (e.g. they looked something like [0, 0, 1, 0, 0...]), we'd use tf.keras.losses.CategoricalCrossentropy().
We'll also use the validation_data parameter when calling the fit() function. This will give us an idea of how the model performs on the test set during training.

You ready? Let's go.

In [75]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create the model
model_11 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784, the Flatten layer does this for us)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_11.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # different loss function for multiclass classifcation
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Fit the model
non_norm_history = model_11.fit(train_data,
                                train_labels,
                                epochs=10,
                                validation_data=(test_data, test_labels)) # see how the model performs on the test set during training
# Set random seed
tf.random.set_seed(42)

# Create the model
model_11 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784, the Flatten layer does this for us)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_11.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # different loss function for multiclass classifcation
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Fit the model
non_norm_history = model_11.fit(train_data,
                                train_labels,
                                epochs=10,
                                validation_data=(test_data, test_labels)) # see how the model performs on the test set during training

Epoch 1/10
1875/1875 [==============================] - 8s 3ms/step - loss: 2.1829 - accuracy: 0.1931 - val_loss: 2.1994 - val_accuracy: 0.2037
Epoch 2/10
1875/1875 [==============================] - 6s 3ms/step - loss: 1.8995 - accuracy: 0.2442 - val_loss: 1.8438 - val_accuracy: 0.2579
Epoch 3/10
1875/1875 [==============================] - 6s 3ms/step - loss: 1.7651 - accuracy: 0.2615 - val_loss: 1.7387 - val_accuracy: 0.2779
Epoch 4/10
1875/1875 [==============================] - 6s 3ms/step - loss: 1.6032 - accuracy: 0.2936 - val_loss: 1.5459 - val_accuracy: 0.3104
Epoch 5/10
1875/1875 [==============================] - 6s 3ms/step - loss: 1.5402 - accuracy: 0.3038 - val_loss: 1.5040 - val_accuracy: 0.3089
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 1.4902 - accuracy: 0.3207 - val_loss: 1.4725 - val_accuracy: 0.3131
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 1.4667 - accuracy: 0.3323 - val_loss: 1.4497 - val_accuracy: 0.3634
Epoch 8/10
1875/1875 [==============================] - 6s 3ms/step - loss: 1.4550 - accuracy: 0.3469 - val_loss: 1.4640 - val_accuracy: 0.3559
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 1.4280 - accuracy: 0.3549 - val_loss: 1.4428 - val_accuracy: 0.3422
Epoch 10/10
1875/1875 [==============================] - 6s 3ms/step - loss: 1.4292 - accuracy: 0.3542 - val_loss: 1.5142 - val_accuracy: 0.3405

In [76]:

Copied!

# Check the shapes of our model
# Note: the "None" in (None, 784) is for batch_size, we'll cover this in a later module
model_11.summary()
# Check the shapes of our model
# Note: the "None" in (None, 784) is for batch_size, we'll cover this in a later module
model_11.summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense_28 (Dense)            (None, 4)                 3140      
                                                                 
 dense_29 (Dense)            (None, 4)                 20        
                                                                 
 dense_30 (Dense)            (None, 10)                50        
                                                                 
=================================================================
Total params: 3210 (12.54 KB)
Trainable params: 3210 (12.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Alright, our model gets to about ~35% accuracy after 10 epochs using a similar style model to what we used on our binary classification problem.

Which is better than guessing (guessing with 10 classes would result in about 10% accuracy) but we can do better.

Do you remember when we talked about neural networks preferring numbers between 0 and 1? (if not, treat this as a reminder)

Well, right now, the data we have isn't between 0 and 1, in other words, it's not normalized (hence why we used the non_norm_history variable when calling fit()). It's pixel values are between 0 and 255.

Let's see.

In [77]:

Copied!

# Check the min and max values of the training data
train_data.min(), train_data.max()
# Check the min and max values of the training data
train_data.min(), train_data.max()

Out[77]:

(0, 255)

We can get these values between 0 and 1 by dividing the entire array by the maximum: 255.0 (dividing by a float also converts to a float).

Doing so will result in all of our data being between 0 and 1 (known as scaling or normalization).

In [78]:

Copied!





# Divide train and test images by the maximum value (normalize it)
train_data = train_data / 255.0
test_data = test_data / 255.0

# Check the min and max values of the training data
train_data.min(), train_data.max()
# Divide train and test images by the maximum value (normalize it)
train_data = train_data / 255.0
test_data = test_data / 255.0

# Check the min and max values of the training data
train_data.min(), train_data.max()

Out[78]:

(0.0, 1.0)

Beautiful! Now our data is between 0 and 1. Let's see what happens when we model it.

We'll use the same model as before (model_11) except this time the data will be normalized.

In [79]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create the model
model_12 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_12.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Fit the model (to the normalized data)
norm_history = model_12.fit(train_data,
                            train_labels,
                            epochs=10,
                            validation_data=(test_data, test_labels))
# Set random seed
tf.random.set_seed(42)

# Create the model
model_12 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_12.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Fit the model (to the normalized data)
norm_history = model_12.fit(train_data,
                            train_labels,
                            epochs=10,
                            validation_data=(test_data, test_labels))

Epoch 1/10
1875/1875 [==============================] - 7s 3ms/step - loss: 1.2368 - accuracy: 0.5151 - val_loss: 0.9158 - val_accuracy: 0.6197
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.8054 - accuracy: 0.6939 - val_loss: 0.7300 - val_accuracy: 0.7337
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6805 - accuracy: 0.7524 - val_loss: 0.6827 - val_accuracy: 0.7570
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6427 - accuracy: 0.7661 - val_loss: 0.6599 - val_accuracy: 0.7663
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6258 - accuracy: 0.7735 - val_loss: 0.6568 - val_accuracy: 0.7681
Epoch 6/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6138 - accuracy: 0.7784 - val_loss: 0.6378 - val_accuracy: 0.7772
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6056 - accuracy: 0.7817 - val_loss: 0.6611 - val_accuracy: 0.7562
Epoch 8/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5993 - accuracy: 0.7842 - val_loss: 0.6351 - val_accuracy: 0.7798
Epoch 9/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5905 - accuracy: 0.7881 - val_loss: 0.6232 - val_accuracy: 0.7782
Epoch 10/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5867 - accuracy: 0.7911 - val_loss: 0.6203 - val_accuracy: 0.7818

Woah, we used the exact same model as before but we with normalized data we're now seeing a much higher accuracy value!

Let's plot each model's history (their loss curves).

In [80]:

Copied!





import pandas as pd
# Plot non-normalized data loss curves
pd.DataFrame(non_norm_history.history).plot(title="Non-normalized Data")
# Plot normalized data loss curves
pd.DataFrame(norm_history.history).plot(title="Normalized data");
import pandas as pd
# Plot non-normalized data loss curves
pd.DataFrame(non_norm_history.history).plot(title="Non-normalized Data")
# Plot normalized data loss curves
pd.DataFrame(norm_history.history).plot(title="Normalized data");

Wow. From these two plots, we can see how much quicker our model with the normalized data (model_12) improved than the model with the non-normalized data (model_11).

🔑 Note: The same model with even slightly different data can produce dramatically different results. So when you're comparing models, it's important to make sure you're comparing them on the same criteria (e.g. same architecture but different data or same data but different architecture).

How about we find the ideal learning rate and see what happens?

We'll use the same architecture we've been using.

In [81]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create the model
model_13 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_13.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Create the learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 10**(epoch/20))

# Fit the model
find_lr_history = model_13.fit(train_data,
                               train_labels,
                               epochs=40, # model already doing pretty good with current LR, probably don't need 100 epochs
                               validation_data=(test_data, test_labels),
                               callbacks=[lr_scheduler])
# Set random seed
tf.random.set_seed(42)

# Create the model
model_13 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_13.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

# Create the learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 10**(epoch/20))

# Fit the model
find_lr_history = model_13.fit(train_data,
                               train_labels,
                               epochs=40, # model already doing pretty good with current LR, probably don't need 100 epochs
                               validation_data=(test_data, test_labels),
                               callbacks=[lr_scheduler])

Epoch 1/40
1875/1875 [==============================] - 7s 3ms/step - loss: 1.3489 - accuracy: 0.5091 - val_loss: 1.0140 - val_accuracy: 0.6485 - lr: 0.0010
Epoch 2/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.8974 - accuracy: 0.6739 - val_loss: 0.8554 - val_accuracy: 0.6812 - lr: 0.0011
Epoch 3/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7930 - accuracy: 0.7102 - val_loss: 0.7868 - val_accuracy: 0.6940 - lr: 0.0013
Epoch 4/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7509 - accuracy: 0.7236 - val_loss: 0.7557 - val_accuracy: 0.7129 - lr: 0.0014
Epoch 5/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7246 - accuracy: 0.7306 - val_loss: 0.7407 - val_accuracy: 0.7340 - lr: 0.0016
Epoch 6/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7055 - accuracy: 0.7378 - val_loss: 0.7294 - val_accuracy: 0.7424 - lr: 0.0018
Epoch 7/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6916 - accuracy: 0.7442 - val_loss: 0.7072 - val_accuracy: 0.7379 - lr: 0.0020
Epoch 8/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6764 - accuracy: 0.7509 - val_loss: 0.7037 - val_accuracy: 0.7459 - lr: 0.0022
Epoch 9/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6649 - accuracy: 0.7562 - val_loss: 0.6898 - val_accuracy: 0.7578 - lr: 0.0025
Epoch 10/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6596 - accuracy: 0.7585 - val_loss: 0.7213 - val_accuracy: 0.7497 - lr: 0.0028
Epoch 11/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6526 - accuracy: 0.7628 - val_loss: 0.7046 - val_accuracy: 0.7515 - lr: 0.0032
Epoch 12/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6484 - accuracy: 0.7644 - val_loss: 0.6996 - val_accuracy: 0.7602 - lr: 0.0035
Epoch 13/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6383 - accuracy: 0.7674 - val_loss: 0.6949 - val_accuracy: 0.7657 - lr: 0.0040
Epoch 14/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6361 - accuracy: 0.7695 - val_loss: 0.6696 - val_accuracy: 0.7616 - lr: 0.0045
Epoch 15/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6343 - accuracy: 0.7717 - val_loss: 0.6948 - val_accuracy: 0.7563 - lr: 0.0050
Epoch 16/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6297 - accuracy: 0.7751 - val_loss: 0.6591 - val_accuracy: 0.7772 - lr: 0.0056
Epoch 17/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6207 - accuracy: 0.7827 - val_loss: 0.6500 - val_accuracy: 0.7791 - lr: 0.0063
Epoch 18/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6184 - accuracy: 0.7862 - val_loss: 0.6448 - val_accuracy: 0.7823 - lr: 0.0071
Epoch 19/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6161 - accuracy: 0.7865 - val_loss: 0.6276 - val_accuracy: 0.7931 - lr: 0.0079
Epoch 20/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6152 - accuracy: 0.7868 - val_loss: 0.6336 - val_accuracy: 0.7889 - lr: 0.0089
Epoch 21/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6146 - accuracy: 0.7876 - val_loss: 0.6349 - val_accuracy: 0.7811 - lr: 0.0100
Epoch 22/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6176 - accuracy: 0.7863 - val_loss: 0.6349 - val_accuracy: 0.7908 - lr: 0.0112
Epoch 23/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6217 - accuracy: 0.7857 - val_loss: 0.7139 - val_accuracy: 0.7555 - lr: 0.0126
Epoch 24/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6278 - accuracy: 0.7825 - val_loss: 0.7166 - val_accuracy: 0.7612 - lr: 0.0141
Epoch 25/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6356 - accuracy: 0.7805 - val_loss: 0.7001 - val_accuracy: 0.7508 - lr: 0.0158
Epoch 26/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6377 - accuracy: 0.7787 - val_loss: 0.7146 - val_accuracy: 0.7597 - lr: 0.0178
Epoch 27/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6791 - accuracy: 0.7617 - val_loss: 0.6618 - val_accuracy: 0.7748 - lr: 0.0200
Epoch 28/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6538 - accuracy: 0.7726 - val_loss: 0.6899 - val_accuracy: 0.7610 - lr: 0.0224
Epoch 29/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6791 - accuracy: 0.7612 - val_loss: 0.6711 - val_accuracy: 0.7719 - lr: 0.0251
Epoch 30/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6992 - accuracy: 0.7521 - val_loss: 0.7585 - val_accuracy: 0.7172 - lr: 0.0282
Epoch 31/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7305 - accuracy: 0.7406 - val_loss: 0.7314 - val_accuracy: 0.7392 - lr: 0.0316
Epoch 32/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7689 - accuracy: 0.7179 - val_loss: 0.8037 - val_accuracy: 0.6799 - lr: 0.0355
Epoch 33/40
1875/1875 [==============================] - 5s 3ms/step - loss: 1.0027 - accuracy: 0.6074 - val_loss: 1.0677 - val_accuracy: 0.5509 - lr: 0.0398
Epoch 34/40
1875/1875 [==============================] - 6s 3ms/step - loss: 0.9718 - accuracy: 0.6132 - val_loss: 1.0379 - val_accuracy: 0.5977 - lr: 0.0447
Epoch 35/40
1875/1875 [==============================] - 5s 3ms/step - loss: 1.0091 - accuracy: 0.6153 - val_loss: 0.9773 - val_accuracy: 0.6447 - lr: 0.0501
Epoch 36/40
1875/1875 [==============================] - 5s 3ms/step - loss: 0.9577 - accuracy: 0.6454 - val_loss: 0.8809 - val_accuracy: 0.6806 - lr: 0.0562
Epoch 37/40
1875/1875 [==============================] - 5s 3ms/step - loss: 1.0135 - accuracy: 0.6252 - val_loss: 1.0258 - val_accuracy: 0.6005 - lr: 0.0631
Epoch 38/40
1875/1875 [==============================] - 6s 3ms/step - loss: 1.0135 - accuracy: 0.6163 - val_loss: 0.9384 - val_accuracy: 0.6446 - lr: 0.0708
Epoch 39/40
1875/1875 [==============================] - 5s 3ms/step - loss: 1.1192 - accuracy: 0.5747 - val_loss: 1.0548 - val_accuracy: 0.5785 - lr: 0.0794
Epoch 40/40
1875/1875 [==============================] - 6s 3ms/step - loss: 1.2500 - accuracy: 0.5285 - val_loss: 1.5707 - val_accuracy: 0.4179 - lr: 0.0891

In [82]:

Copied!





# Plot the learning rate decay curve
import numpy as np
import matplotlib.pyplot as plt
lrs = 1e-3 * (10**(np.arange(40)/20))
plt.semilogx(lrs, find_lr_history.history["loss"]) # want the x-axis to be log-scale
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("Finding the ideal learning rate");
# Plot the learning rate decay curve
import numpy as np
import matplotlib.pyplot as plt
lrs = 1e-3 * (10**(np.arange(40)/20))
plt.semilogx(lrs, find_lr_history.history["loss"]) # want the x-axis to be log-scale
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("Finding the ideal learning rate");

In this case, it looks like somewhere close to the default learning rate of the Adam optimizer (0.001) is the ideal learning rate.

Let's refit a model using the ideal learning rate.

In [83]:

Copied!





# Set random seed
tf.random.set_seed(42)

# Create the model
model_14 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_14.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # ideal learning rate (same as default)
                 metrics=["accuracy"])

# Fit the model
history = model_14.fit(train_data,
                       train_labels,
                       epochs=20,
                       validation_data=(test_data, test_labels))
# Set random seed
tf.random.set_seed(42)

# Create the model
model_14 = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # input layer (we had to reshape 28x28 to 784)
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(4, activation="relu"),
  tf.keras.layers.Dense(10, activation="softmax") # output shape is 10, activation is softmax
])

# Compile the model
model_14.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                 optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # ideal learning rate (same as default)
                 metrics=["accuracy"])

# Fit the model
history = model_14.fit(train_data,
                       train_labels,
                       epochs=20,
                       validation_data=(test_data, test_labels))

Epoch 1/20
1875/1875 [==============================] - 7s 3ms/step - loss: 1.1588 - accuracy: 0.6050 - val_loss: 0.7818 - val_accuracy: 0.7258
Epoch 2/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.7097 - accuracy: 0.7484 - val_loss: 0.7062 - val_accuracy: 0.7526
Epoch 3/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6528 - accuracy: 0.7655 - val_loss: 0.6678 - val_accuracy: 0.7645
Epoch 4/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.6249 - accuracy: 0.7756 - val_loss: 0.6516 - val_accuracy: 0.7684
Epoch 5/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.6082 - accuracy: 0.7798 - val_loss: 0.6405 - val_accuracy: 0.7733
Epoch 6/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5946 - accuracy: 0.7838 - val_loss: 0.6344 - val_accuracy: 0.7743
Epoch 7/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5855 - accuracy: 0.7868 - val_loss: 0.6231 - val_accuracy: 0.7768
Epoch 8/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5773 - accuracy: 0.7880 - val_loss: 0.6256 - val_accuracy: 0.7737
Epoch 9/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5716 - accuracy: 0.7909 - val_loss: 0.6134 - val_accuracy: 0.7812
Epoch 10/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5653 - accuracy: 0.7930 - val_loss: 0.6030 - val_accuracy: 0.7843
Epoch 11/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5604 - accuracy: 0.7944 - val_loss: 0.5985 - val_accuracy: 0.7849
Epoch 12/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5544 - accuracy: 0.7972 - val_loss: 0.5951 - val_accuracy: 0.7868
Epoch 13/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5513 - accuracy: 0.7966 - val_loss: 0.5940 - val_accuracy: 0.7893
Epoch 14/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5462 - accuracy: 0.7994 - val_loss: 0.5953 - val_accuracy: 0.7864
Epoch 15/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5420 - accuracy: 0.8008 - val_loss: 0.5860 - val_accuracy: 0.7911
Epoch 16/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5388 - accuracy: 0.8016 - val_loss: 0.5901 - val_accuracy: 0.7927
Epoch 17/20
1875/1875 [==============================] - 6s 3ms/step - loss: 0.5359 - accuracy: 0.8025 - val_loss: 0.5889 - val_accuracy: 0.7898
Epoch 18/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5341 - accuracy: 0.8028 - val_loss: 0.5746 - val_accuracy: 0.7959
Epoch 19/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5305 - accuracy: 0.8036 - val_loss: 0.5861 - val_accuracy: 0.7897
Epoch 20/20
1875/1875 [==============================] - 5s 3ms/step - loss: 0.5271 - accuracy: 0.8071 - val_loss: 0.5726 - val_accuracy: 0.7970

Now we've got a model trained with a close-to-ideal learning rate and performing pretty well, we've got a couple of options.

We could:

Evaluate its performance using other classification metrics (such as a confusion matrix or classification report).
Assess some of its predictions (through visualizations).
Improve its accuracy (by training it for longer or changing the architecture).
Save and export it for use in an application.

Let's go through the first two options.

First we'll create a classification matrix to visualize its predictions across the different classes.

In [84]:

Copied!





# Note: The following confusion matrix code is a remix of Scikit-Learn's
# plot_confusion_matrix function - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
# and Made with ML's introductory notebook - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
import itertools
from sklearn.metrics import confusion_matrix

# Our function needs a different name to sklearn's plot_confusion_matrix
def make_confusion_matrix(y_true, y_pred, classes=None, figsize=(10, 10), text_size=15):
  """Makes a labelled confusion matrix comparing predictions and ground truth labels.

  If classes is passed, confusion matrix will be labelled, if not, integer class values
  will be used.

  Args:
    y_true: Array of truth labels (must be same shape as y_pred).
    y_pred: Array of predicted labels (must be same shape as y_true).
    classes: Array of class labels (e.g. string form). If `None`, integer labels are used.
    figsize: Size of output figure (default=(10, 10)).
    text_size: Size of output figure text (default=15).

  Returns:
    A labelled confusion matrix plot comparing y_true and y_pred.

  Example usage:
    make_confusion_matrix(y_true=test_labels, # ground truth test labels
                          y_pred=y_preds, # predicted labels
                          classes=class_names, # array of class label names
                          figsize=(15, 15),
                          text_size=10)
  """
  # Create the confustion matrix
  cm = confusion_matrix(y_true, y_pred)
  cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize it
  n_classes = cm.shape[0] # find the number of classes we're dealing with

  # Plot the figure and make it pretty
  fig, ax = plt.subplots(figsize=figsize)
  cax = ax.matshow(cm, cmap=plt.cm.Blues) # colors will represent how 'correct' a class is, darker == better
  fig.colorbar(cax)

  # Are there a list of classes?
  if classes:
    labels = classes
  else:
    labels = np.arange(cm.shape[0])

  # Label the axes
  ax.set(title="Confusion Matrix",
         xlabel="Predicted label",
         ylabel="True label",
         xticks=np.arange(n_classes), # create enough axis slots for each class
         yticks=np.arange(n_classes),
         xticklabels=labels, # axes will labeled with class names (if they exist) or ints
         yticklabels=labels)

  # Make x-axis labels appear on bottom
  ax.xaxis.set_label_position("bottom")
  ax.xaxis.tick_bottom()

  # Set the threshold for different colors
  threshold = (cm.max() + cm.min()) / 2.

  # Plot the text on each cell
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
             horizontalalignment="center",
             color="white" if cm[i, j] > threshold else "black",
             size=text_size)
# Note: The following confusion matrix code is a remix of Scikit-Learn's
# plot_confusion_matrix function - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
# and Made with ML's introductory notebook - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
import itertools
from sklearn.metrics import confusion_matrix

# Our function needs a different name to sklearn's plot_confusion_matrix
def make_confusion_matrix(y_true, y_pred, classes=None, figsize=(10, 10), text_size=15):
  """Makes a labelled confusion matrix comparing predictions and ground truth labels.

  If classes is passed, confusion matrix will be labelled, if not, integer class values
  will be used.

  Args:
    y_true: Array of truth labels (must be same shape as y_pred).
    y_pred: Array of predicted labels (must be same shape as y_true).
    classes: Array of class labels (e.g. string form). If `None`, integer labels are used.
    figsize: Size of output figure (default=(10, 10)).
    text_size: Size of output figure text (default=15).

  Returns:
    A labelled confusion matrix plot comparing y_true and y_pred.

  Example usage:
    make_confusion_matrix(y_true=test_labels, # ground truth test labels
                          y_pred=y_preds, # predicted labels
                          classes=class_names, # array of class label names
                          figsize=(15, 15),
                          text_size=10)
  """
  # Create the confustion matrix
  cm = confusion_matrix(y_true, y_pred)
  cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize it
  n_classes = cm.shape[0] # find the number of classes we're dealing with

  # Plot the figure and make it pretty
  fig, ax = plt.subplots(figsize=figsize)
  cax = ax.matshow(cm, cmap=plt.cm.Blues) # colors will represent how 'correct' a class is, darker == better
  fig.colorbar(cax)

  # Are there a list of classes?
  if classes:
    labels = classes
  else:
    labels = np.arange(cm.shape[0])

  # Label the axes
  ax.set(title="Confusion Matrix",
         xlabel="Predicted label",
         ylabel="True label",
         xticks=np.arange(n_classes), # create enough axis slots for each class
         yticks=np.arange(n_classes),
         xticklabels=labels, # axes will labeled with class names (if they exist) or ints
         yticklabels=labels)

  # Make x-axis labels appear on bottom
  ax.xaxis.set_label_position("bottom")
  ax.xaxis.tick_bottom()

  # Set the threshold for different colors
  threshold = (cm.max() + cm.min()) / 2.

  # Plot the text on each cell
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
             horizontalalignment="center",
             color="white" if cm[i, j] > threshold else "black",
             size=text_size)

Since a confusion matrix compares the truth labels (test_labels) to the predicted labels, we have to make some predictions with our model.

In [85]:

Copied!

# Make predictions with the most recent model
y_probs = model_14.predict(test_data) # "probs" is short for probabilities

# View the first 5 predictions
y_probs[:5]
# Make predictions with the most recent model
y_probs = model_14.predict(test_data) # "probs" is short for probabilities

# View the first 5 predictions
y_probs[:5]

313/313 [==============================] - 1s 1ms/step

Out[85]:

array([[7.1295396e-08, 0.0000000e+00, 3.3289107e-12, 5.1906879e-20,
        2.2590930e-15, 2.9898265e-01, 1.3648585e-07, 7.8765094e-02,
        8.4200781e-03, 6.1383194e-01],
       [2.3802532e-02, 4.3833437e-03, 7.4165797e-01, 9.3923630e-03,
        4.8104558e-02, 1.6303338e-03, 1.6198176e-01, 2.8633869e-07,
        9.0436097e-03, 3.2003202e-06],
       [1.6573335e-06, 9.9551845e-01, 9.1274740e-07, 4.4626528e-03,
        1.4068096e-05, 4.6464342e-16, 2.2328247e-06, 6.8920781e-22,
        2.8975861e-10, 3.0581186e-15],
       [2.8563186e-08, 9.9863142e-01, 4.3472301e-09, 1.3678216e-03,
        6.5454623e-07, 6.7064722e-21, 3.1206884e-08, 3.7914244e-28,
        6.2471518e-13, 5.6674063e-19],
       [2.8196907e-01, 5.4220832e-06, 4.3520968e-02, 3.0695686e-02,
        2.1239575e-02, 2.4910512e-05, 5.9459400e-01, 1.5045735e-09,
        2.7928762e-02, 2.1589773e-05]], dtype=float32)

Our model outputs a list of prediction probabilities, meaning, it outputs a number for how likely it thinks a particular class is to be the label.

The higher the number in the prediction probabilities list, the more likely the model believes that is the right class.

To find the highest value we can use the argmax() method.

In [86]:

Copied!

# See the predicted class number and label for the first example
y_probs[0].argmax(), class_names[y_probs[0].argmax()]
# See the predicted class number and label for the first example
y_probs[0].argmax(), class_names[y_probs[0].argmax()]

Out[86]:

(9, 'Ankle boot')

Now let's do the same for all of the predictions.

In [87]:

Copied!

# Convert all of the predictions from probabilities to labels
y_preds = y_probs.argmax(axis=1)

# View the first 10 prediction labels
y_preds[:10]
# Convert all of the predictions from probabilities to labels
y_preds = y_probs.argmax(axis=1)

# View the first 10 prediction labels
y_preds[:10]

Out[87]:

array([9, 2, 1, 1, 6, 1, 4, 4, 5, 7])

Wonderful, now we've got our model's predictions in label form, let's create a confusion matrix to view them against the truth labels.

In [88]:

Copied!





# Check out the non-prettified confusion matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true=test_labels,
                 y_pred=y_preds)
# Check out the non-prettified confusion matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true=test_labels,
                 y_pred=y_preds)

Out[88]:

array([[759,   0,  30,  71,   6,   1, 124,   0,   8,   1],
       [  0, 939,  11,  40,   5,   0,   5,   0,   0,   0],
       [ 21,   2, 730,   9, 176,   0,  59,   0,   3,   0],
       [ 38,  11,  17, 839,  41,   0,  50,   0,   4,   0],
       [  0,   0, 125,  28, 805,   0,  38,   0,   4,   0],
       [  1,   0,   0,   0,   0, 901,   1,  55,  11,  31],
       [177,   2, 185,  42, 321,   0, 264,   0,   9,   0],
       [  0,   0,   0,   0,   0,  50,   0, 917,   0,  33],
       [  5,   0,   8,   3,  16,  17,  49,   4, 897,   1],
       [  0,   0,   0,   0,   0,  21,   0,  49,  11, 919]])

That confusion matrix is hard to comprehend, let's make it prettier using the function we created before.

In [89]:

Copied!





# Make a prettier confusion matrix
make_confusion_matrix(y_true=test_labels,
                      y_pred=y_preds,
                      classes=class_names,
                      figsize=(15, 15),
                      text_size=10)
# Make a prettier confusion matrix
make_confusion_matrix(y_true=test_labels,
                      y_pred=y_preds,
                      classes=class_names,
                      figsize=(15, 15),
                      text_size=10)

That looks much better! (one of my favourites sights in the world is a confusion matrix with dark squares down the diagonal)

Except the results aren't as good as they could be...

It looks like our model is getting confused between the Shirt and T-shirt/top classes (e.g. predicting Shirt when it's actually a T-shirt/top).

🤔 Question: Does it make sense that our model is getting confused between the Shirt and T-shirt/top classes? Why do you think this might be? What's one way you could investigate?

We've seen how our models predictions line up to the truth labels using a confusion matrix, but how about we visualize some?

Let's create a function to plot a random image along with its prediction.

🔑 Note: Often when working with images and other forms of visual data, it's a good idea to visualize as much as possible to develop a further understanding of the data and the outputs of your model.

In [90]:

Copied!





import random

# Create a function for plotting a random image along with its prediction
def plot_random_image(model, images, true_labels, classes):
  """Picks a random image, plots it and labels it with a predicted and truth label.

  Args:
    model: a trained model (trained on data similar to what's in images).
    images: a set of random images (in tensor form).
    true_labels: array of ground truth labels for images.
    classes: array of class names for images.

  Returns:
    A plot of a random image from `images` with a predicted class label from `model`
    as well as the truth class label from `true_labels`.
  """
  # Setup random integer
  i = random.randint(0, len(images))

  # Create predictions and targets
  target_image = images[i]
  pred_probs = model.predict(target_image.reshape(1, 28, 28)) # have to reshape to get into right size for model
  pred_label = classes[pred_probs.argmax()]
  true_label = classes[true_labels[i]]

  # Plot the target image
  plt.imshow(target_image, cmap=plt.cm.binary)

  # Change the color of the titles depending on if the prediction is right or wrong
  if pred_label == true_label:
    color = "green"
  else:
    color = "red"

  # Add xlabel information (prediction/true label)
  plt.xlabel("Pred: {} {:2.0f}% (True: {})".format(pred_label,
                                                   100*tf.reduce_max(pred_probs),
                                                   true_label),
             color=color) # set the color to green or red
import random

# Create a function for plotting a random image along with its prediction
def plot_random_image(model, images, true_labels, classes):
  """Picks a random image, plots it and labels it with a predicted and truth label.

  Args:
    model: a trained model (trained on data similar to what's in images).
    images: a set of random images (in tensor form).
    true_labels: array of ground truth labels for images.
    classes: array of class names for images.

  Returns:
    A plot of a random image from `images` with a predicted class label from `model`
    as well as the truth class label from `true_labels`.
  """
  # Setup random integer
  i = random.randint(0, len(images))

  # Create predictions and targets
  target_image = images[i]
  pred_probs = model.predict(target_image.reshape(1, 28, 28)) # have to reshape to get into right size for model
  pred_label = classes[pred_probs.argmax()]
  true_label = classes[true_labels[i]]

  # Plot the target image
  plt.imshow(target_image, cmap=plt.cm.binary)

  # Change the color of the titles depending on if the prediction is right or wrong
  if pred_label == true_label:
    color = "green"
  else:
    color = "red"

  # Add xlabel information (prediction/true label)
  plt.xlabel("Pred: {} {:2.0f}% (True: {})".format(pred_label,
                                                   100*tf.reduce_max(pred_probs),
                                                   true_label),
             color=color) # set the color to green or red

In [91]:

Copied!





# Check out a random image as well as its prediction
plot_random_image(model=model_14,
                  images=test_data,
                  true_labels=test_labels,
                  classes=class_names)
# Check out a random image as well as its prediction
plot_random_image(model=model_14,
                  images=test_data,
                  true_labels=test_labels,
                  classes=class_names)

1/1 [==============================] - 0s 22ms/step

After running the cell above a few times you'll start to get a visual understanding of the relationship between the model's predictions and the true labels.

Did you figure out which predictions the model gets confused on?

It seems to mix up classes which are similar, for example, Sneaker with Ankle boot.

Looking at the images, you can see how this might be the case.

The overall shape of a Sneaker and an Ankle Boot are similar.

The overall shape might be one of the patterns the model has learned and so therefore when two images have a similar shape, their predictions get mixed up.

What patterns is our model learning?¶

We've been talking a lot about how a neural network finds patterns in numbers, but what exactly do these patterns look like?

Let's crack open one of our models and find out.

First, we'll get a list of layers in our most recent model (model_14) using the layers attribute.

In [92]:

Copied!

# Find the layers of our most recent model
model_14.layers
# Find the layers of our most recent model
model_14.layers

Out[92]:

[<keras.src.layers.reshaping.flatten.Flatten at 0x7a2407e038b0>,
 <keras.src.layers.core.dense.Dense at 0x7a2407e02260>,
 <keras.src.layers.core.dense.Dense at 0x7a2407e02f50>,
 <keras.src.layers.core.dense.Dense at 0x7a2407e03070>]

We can access a target layer using indexing.

In [93]:

Copied!

# Extract a particular layer
model_14.layers[1]
# Extract a particular layer
model_14.layers[1]

Out[93]:

<keras.src.layers.core.dense.Dense at 0x7a2407e02260>

And we can find the patterns learned by a particular layer using the get_weights() method.

The get_weights() method returns the weights (also known as a weights matrix) and biases (also known as a bias vector) of a particular layer.

In [94]:

Copied!

# Get the patterns of a layer in our network
weights, biases = model_14.layers[1].get_weights()

# Shape = 1 weight matrix the size of our input data (28x28) per neuron (4)
weights, weights.shape
# Get the patterns of a layer in our network
weights, biases = model_14.layers[1].get_weights()

# Shape = 1 weight matrix the size of our input data (28x28) per neuron (4)
weights, weights.shape

Out[94]:

(array([[ 0.41736275,  0.16016883,  0.32807097,  0.51932573],
        [ 0.38906115, -0.11863507, -0.98573697,  0.45491368],
        [ 0.18559982, -1.1637362 , -0.4268363 , -0.01036255],
        ...,
        [-0.11902154, -0.19244409, -0.49631563, -0.086859  ],
        [-0.18131663,  0.0480358 ,  0.13416374, -0.32421213],
        [ 0.37639815, -0.5836524 ,  0.3026454 ,  0.4631417 ]],
       dtype=float32),
 (784, 4))

The weights matrix is the same shape as the input data, which in our case is 784 (28x28 pixels). And there's a copy of the weights matrix for each neuron the in the selected layer (our selected layer has 4 neurons).

Each value in the weights matrix corresponds to how a particular value in the input data influences the network's decisions.

These values start out as random numbers (they're set by the kernel_initializer parameter when creating a layer, the default is "glorot_uniform") and are then updated to better representative values of the data (non-random) by the neural network during training.

neural network supervised learning weight updates Example workflow of how a supervised neural network starts with random weights and updates them to better represent the data by looking at examples of ideal outputs.

Now let's check out the bias vector.

In [95]:

Copied!

# Shape = 1 bias per neuron (we use 4 neurons in the first layer)
biases, biases.shape
# Shape = 1 bias per neuron (we use 4 neurons in the first layer)
biases, biases.shape

Out[95]:

(array([1.2710664 , 1.8170385 , 0.24927875, 1.4434327 ], dtype=float32), (4,))

Every neuron has a bias vector. Each of these is paired with a weight matrix.

The bias values get initialized as zeroes by default (using the bias_initializer parameter).

The bias vector dictates how much the patterns within the corresponding weights matrix should influence the next layer.

In [96]:

Copied!

# Can now calculate the number of paramters in our model
model_14.summary()
# Can now calculate the number of paramters in our model
model_14.summary()

Model: "sequential_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_3 (Flatten)         (None, 784)               0         
                                                                 
 dense_37 (Dense)            (None, 4)                 3140      
                                                                 
 dense_38 (Dense)            (None, 4)                 20        
                                                                 
 dense_39 (Dense)            (None, 10)                50        
                                                                 
=================================================================
Total params: 3210 (12.54 KB)
Trainable params: 3210 (12.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Now we've built a few deep learning models, it's a good time to point out the whole concept of inputs and outputs not only relates to a model as a whole but to every layer within a model.

You might've already guessed this, but starting from the input layer, each subsequent layer's input is the output of the previous layer.

We can see this clearly using the utility plot_model().

In [97]:

Copied!

from tensorflow.keras.utils import plot_model

# See the inputs and outputs of each layer
plot_model(model_14, show_shapes=True)
from tensorflow.keras.utils import plot_model

# See the inputs and outputs of each layer
plot_model(model_14, show_shapes=True)

Out[97]:

How a model learns (in brief)¶

Alright, we've trained a bunch of models, but we've never really discussed what's going on under the hood. So how exactly does a model learn?

A model learns by updating and improving its weight matrices and biases values every epoch (in our case, when we call the fit() fucntion).

It does so by comparing the patterns its learned between the data and labels to the actual labels.

If the current patterns (weight matrices and bias values) don't result in a desirable decrease in the loss function (higher loss means worse predictions), the optimizer tries to steer the model to update its patterns in the right way (using the real labels as a reference).

This process of using the real labels as a reference to improve the model's predictions is called backpropagation.

In other words, data and labels pass through a model (forward pass) and it attempts to learn the relationship between the data and labels.

And if this learned relationship isn't close to the actual relationship or it could be improved, the model does so by going back through itself (backward pass) and tweaking its weights matrices and bias values to better represent the data.

If all of this sounds confusing (and it's fine if it does, the above is a very succinct description), check out the resources in the extra-curriculum section for more.

Exercises 🛠¶

Play with neural networks in the TensorFlow Playground for 10-minutes. Especially try different values of the learning, what happens when you decrease it? What happens when you increase it?
Replicate the model pictured in the TensorFlow Playground diagram below using TensorFlow code. Compile it using the Adam optimizer, binary crossentropy loss and accuracy metric. Once it's compiled check a summary of the model. Try this network out for yourself on the TensorFlow Playground website. Hint: there are 5 hidden layers but the output layer isn't pictured, you'll have to decide what the output layer should be based on the input data.
Create a classification dataset using Scikit-Learn's make_moons() function, visualize it and then build a model to fit it at over 85% accuracy.
Create a function (or write code) to visualize multiple image predictions for the fashion MNIST at the same time. Plot at least three different images and their prediciton labels at the same time. Hint: see the classifcation tutorial in the TensorFlow documentation for ideas.
Recreate TensorFlow's softmax activation function in your own code. Make sure it can accept a tensor and return that tensor after having the softmax function applied to it.
Train a model to get 88%+ accuracy on the fashion MNIST test set. Plot a confusion matrix to see the results after.
Make a function to show an image of a certain class of the fashion MNIST dataset and make a prediction on it. For example, plot 3 images of the T-shirt class with their predictions.

Extra curriculum 📖¶

Watch 3Blue1Brown's neural networks video 2: Gradient descent, how neural networks learn. After you're done, write 100 words about what you've learned.
- If you haven't already, watch video 1: But what is a Neural Network?. Note the activation function they talk about at the end.
Watch MIT's introduction to deep learning lecture 1 (if you haven't already) to get an idea of the concepts behind using linear and non-linear functions.
Spend 1-hour reading Michael Nielsen's Neural Networks and Deep Learning book.
Read the ML-Glossary documentation on activation functions. Which one is your favourite?
- After you've read the ML-Glossary, see which activation functions are available in TensorFlow by searching "tensorflow activation functions".