View source code | Read notebook in online book format

Introduction to TensorFlow, Deep Learning and Transfer Learning (work in progress)¶

Project: Dog Vision 🐶👁 - Using computer vision to classify dog photos into different breeds.
Goals: Learn TensorFlow, deep learning and transfer learning, beat the original research paper results (22% accuracy).
Domain: Computer vision.
Data: Images of dogs from Stanford Dogs Dataset (120 dog breeds, 20,000+ images).
Problem type: Multi-class classification (120 different classes).
Runtime: This project is designed to run end-to-end in Google Colab (for free GPU access and easy setup). If you'd like to run it locally, it will require environment setup.
Demo: See a demo of the trained model running on Hugging Face Spaces.

Welcome, welcome!

The focus of this notebook is to give a quick overview of deep learning with TensorFlow/Keras.

How?

We're going to go through the machine learning workflow steps and build a computer vision project to classify photos of dogs into their respective dog breed (a Predictive AI task, see below for more).

outline of dog vision project from taking a dataset of dog images and then creating a neural network for identifying different dogs in custom images

What we're going to build: Dog Vision 🐶👁️, a neural network capable of identifying different dog breeds in images. All the way from dataset preparation to model building, training and evaluation.

In [1]:

Copied!

# Quick timestamp
import datetime
print(f"Last updated: {datetime.datetime.now()}")
# Quick timestamp
import datetime
print(f"Last updated: {datetime.datetime.now()}")

Last updated: 2024-04-26 01:26:48.838163

What we're going to cover¶

In this project, we're going to be introduced to the power of deep learning and more specifically, transfer learning using TensorFlow/Keras.

We'll go through each of these in the context of the 6 step machine learning framework:

Problem defintion - Use computer vision to classify photos of dogs into different dog breeds.
Data - 20,000+ images of dogs from 120 different dog breeds from the Stanford Dogs dataset.
Evaluation - We'd like to beat the original paper's results (22% mean accuracy across all classes, tip: A good way to practice your skills is to find some results online and try to beat them).
Features - Because we're using deep learning, our model will learn the features on its own.
Modelling - We're going to use a pretrained convolutional neural network (CNN) and transfer learning.
Experiments - We'll try different amounts of data with the same model to see the effects on our results.

Note: It's okay not to know these exact steps ahead of time. When starting a new project, it's often the case you'll figure it out as you go. These steps are only filled out because I've had practice working on several machine learning projects. You'll pick up these ideas overtime.

Table of contents¶

Getting Setup
Getting Data (dog images and their breeds)
Exploring the data (exploratory data analysis)
Creating training and test splits
Turning our datasets into TensorFlow Dataset(s)
Creating a neural network with TensorFlow
Model 0 - Train a model on 10% of the training data
Putting it all together: create, compile, fit
Model 1 - Train a model on 100% of the training data
Make and evaluate predictions of the best model
Save and load the best model
Make predictions on custom images with the best model (bringing Dog Vision 🐶👁️ to life!)
Key takeaways
Extensions & exercises

Where can can you get help?¶

All of the materials for this course are available on GitHub.

If you run into trouble, you can ask a question on the course GitHub Discussions page there too.

You can also:

Search for questions online and end up at places such as Stack Overflow (a great resource of developer-focused Q&A).
Ask AI Assistants such as ChatGPT, Gemini and Claude for help with various coding problems and errors.

Quick definitions¶

Let's start by breaking down some of the most important topics we're going to go through.

What is TensorFlow/Keras?¶

TensorFlow is an open source machine learning and deep learning framework originally developed by Google. Inside TensorFlow, you can also use Keras which is another very helpful machine learning framework known for its ease of use.

Why use TensorFlow?¶

TensorFlow allows you to manipulate data and write deep learning algorithms using Python code.

It also has several built-in capabilities to leverage accelerated computing hardware (e.g. GPUs, Graphics Processing Units and TPUs, Tensor Processing Units).

Many of world's largest companies power their machine learning workloads with TensorFlow.

What is deep learning?¶

Deep learning is a form of machine learning where data passes through a series of progressive layers which all contribute to learning an overall representation of that data.

Each layer performs a pre-defined operation.

The series of progressive layers combine to form what's referred to as a neural network.

For example, a photo may be turned into numbers (e.g. red, green and blue pixel values) and those numbers are then manipulated mathematically through each progressive layer to learn patterns in the photo.

The "deep" in deep learning comes from the number of layers used in the neural network.

So when someone says deep learning or (artificial neural networks), they're typically referring to same thing.

Note: Artificial intelligence (AI), machine learning (ML) and deep learning are all broad terms. You can think of AI as the overall technology, machine learning as a type of AI, and deep learning as a type of machine learning. So if someone refers to AI, you can often assume they are often talking about machine learning or deep learning.

What can deep learning be used for?¶

Deep learning is such a powerful technique that new use cases are being discovered everyday.

Most of the modern forms of artifical intelligence (AI) applications you see, are powered by deep learning.

Two of the most useful types of AI are predictive and generative.

Predictive AI learns the relationship between data and labels such as photos of dog and their breeds (supervised learning). So that when it sees a new photo of a dog, it can predict its breed based on what its learned.

Generative AI generates something new given an input such as creating new text given input text.

Some examples of Predictive AI problems include:

Tesla's self-driving cars use deep learning use object detection models to power their computer vision systems.
Apple's Photos app uses deep learning to recognize faces in images and create Photo Memories.
Siri and Google Assistant use deep learning to transcribe speech and understand voice commands.
Nutrify (an app my brother and I build) uses predictive AI to recognize food in images.
Magika uses deep learning to classify a file into what type it is (e.g. .jpeg, .py, .txt).
Text classification models such as DeBERTa use deep learning to classify text into different categories such as "positive" and "negative" or "spam" or "not spam".

Some examples of Generative AI problems include:

Stable Diffusion uses generative AI to generate images given a text prompt.
ChatGPT and other large language models (LLMs) such as Llama, Claude, Gemini and Mistral use deep learning to process text and return a response.
GitHub Copilot uses generative AI to generate code snippets given surrounding context.

All of these AI use cases are powered by deep learning.

And more often than not, whenever you get started on a deep learning problem, you'll start with transfer learning.

Example of different every day problems where AI/machine learning gets used.

What is transfer learning?¶

Transfer learning is one of the most powerful and useful techniques in modern AI and machine learning.

It involves taking what one model (or neural network) has learned in a similar domain and applying to your own.

In our case, we're going to use transfer learning to take the patterns a neural network has learned from the 1 million+ images and over 1000 classes in ImageNet (a gold standard computer vision benchmark) and apply them to our own problem of recognizing dog breeds.

However, this concept can be applied to many different domains.

You could take a large language model (LLM) that has been pre-trained on most of the text on the internet and learned very well the patterns in naturual language and customize it for your own specific chat use-case.

The biggest benefit of transfer learning is that it often allows you to get outstanding results with less data and time.

A transfer learning workflow. Many publicly available models have been pretrained on large datasets such as ImageNet (1 million+ images). These models can then be applied to similar tasks downstream. For example, we can take a model pretrained on ImageNet and apply it to our Dog Vision 🐶👁️ problem. This same process can be repeated for many different styles of data and problem.

1. Getting setup¶

This notebook is designed to run in Google Colab, an online Jupyter Notebook that provides free access to GPUs (Graphics Processing Units, we'll hear more on these later).

For a quick rundown on how to use Google Colab, see their introductory guide (it's quite similar to a Jupyter Notebook with a few different options).

Google Colab also comes with many data science and machine learning libraries pre-installed, including TensorFlow/Keras.

Getting a GPU on Google Colab¶

Before running any code, we'll make sure our Google Colab instance is connected to a GPU.

You can do this via going to Runtime -> Change runtime type -> GPU (this may restart your existing runtime).

Why use a GPU?

Since neural networks perform a large amount of calculations behind the scenes (the main one being matrix multiplication), you need a computer chip that perform these calculations quickly, otherwise you'll be waiting all day for a model to train.

And in short, GPUs are much faster at performing matrix multiplications than CPUs.

Why this is the case is behind the scope of this project (you can search "why are GPUs faster than CPUs for machine learning?" for more).

The main thing to remember is: generally, in deep learning, GPUs = faster than CPUs.

Note: A good experiment would be to run the neural networks we're going to build later on with and without a GPU and see the difference in their training times.

Ok, enough talking, let's start by importing TensorFlow!

We'll do so using the common abbreviation tf.

In [2]:

Copied!

import tensorflow as tf
tf.__version__
import tensorflow as tf
tf.__version__

Out[2]:

'2.15.0'

Nice!

Note: If you want to run TensorFlow locally, you can follow the TensorFlow installation guide.

Now let's check to see if TensorFlow has access to a GPU (this isn't 100% required to complete this project but will speed things up dramatically).

We can do so with the method tf.config.list_physical_devices().

In [3]:

Copied!





# Do we have access to a GPU?
device_list = tf.config.list_physical_devices()
if "GPU" in [device.device_type for device in device_list]:
  print(f"[INFO] TensorFlow has GPU available to use. Woohoo!! Computing will be sped up!")
  print(f"[INFO] Accessible devices:\n{device_list}")
else:
  print(f"[INFO] TensorFlow does not have GPU available to use. Models may take a while to train.")
  print(f"[INFO] Accessible devices:\n{device_list}")
# Do we have access to a GPU?
device_list = tf.config.list_physical_devices()
if "GPU" in [device.device_type for device in device_list]:
  print(f"[INFO] TensorFlow has GPU available to use. Woohoo!! Computing will be sped up!")
  print(f"[INFO] Accessible devices:\n{device_list}")
else:
  print(f"[INFO] TensorFlow does not have GPU available to use. Models may take a while to train.")
  print(f"[INFO] Accessible devices:\n{device_list}")

[INFO] TensorFlow has GPU available to use. Woohoo!! Computing will be sped up!
[INFO] Accessible devices:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

2. Getting Data¶

All machine learning (and deep learning) projects start with data.

If you have no data, you have no project.

If you have no project, you have no cool models to show your friends or improve your business.

Not to worry!

There are several options and locations to get data for a deep learning project.

Resource	Description
Kaggle Datasets	A collection of datasets across a wide range of topics.
TensorFlow Datasets	A collection of ready-to-use machine learning datasets ready for use under the `tf.data.Datasets` API. You can see a list of all available datasets in the TensorFlow documentation.
Hugging Face Datasets	A continually growing resource of datasets broken into several different kinds of topics.
Google Dataset Search	A search engine by Google specifically focused on searching online datasets.
Original sources	Datasets which are made available by researchers or companies with the release of a product or research paper (sources for these will vary, they could be a link on a website or a link to an application form).
Custom datasets	These are datasets comprised of your own custom source of data. You may build these from scratch on your own or have access to them from an existing product or service. For example, your entire photos library could be your own custom dataset or your entire notes and documents folder or your company's customer order history.

In our case, the dataset we're going to use is called the Stanford Dogs dataset (or ImageNet dogs, as the images are dogs separated from ImageNet).

Because the Stanford Dogs dataset has been around for a while (since 2011, which as of writing this in 2024 is like a lifetime in deep learning), it's available from several resources:

The original project website via link download.
Inside TensorFlow datasets under stanford_dogs.
On Kaggle as a downloadable dataset.

The point here is that when you're starting out with practicing deep learning projects, there's no shortage of datasets available.

However, when you start wanting to work on your own projects or within a company environment, you'll likely start to work on custom datasets (datasets you build yourself or aren't available publicly online).

The main difference between existing datasets and custom datasets is that existing datasets often come preformatted and ready to use.

Where as custom datasets often require some preprocessing before they're ready to use within a machine learning project.

To practice formatting a dataset for a machine learning problem, we're going to download the Stanford Dogs dataset from the original website.

Before we do so, the following code is an example of how we'd get the Stanford Dogs dataset from TensorFlow Datasets.

In [4]:

Copied!

# Download the dataset into train and test split using TensorFlow Datasets
# import tensorflow_datasets as tfds
# ds_train, ds_test = tfds.load('stanford_dogs', split=['train', 'test'])
# Download the dataset into train and test split using TensorFlow Datasets
# import tensorflow_datasets as tfds
# ds_train, ds_test = tfds.load('stanford_dogs', split=['train', 'test'])

Download data directly from Stanford Dogs website¶

Our overall project goal is to build a computer vision model which performs better than the original Stanford Dogs paper (average of 22% accuracy per class across 120 classes).

To do so, we need some data.

Let's download the original Stanford Dogs dataset from the project website.

The data comes in three main files:

Images (757MB) - images.tar
Annotations (21MB) - annotation.tar
Lists with train/test splits (0.5MB) - lists.tar

Our goal is to get a file structure like this:

dog vision data file structure with images.tar, annotation.tar and lists.tar

Note: If you're using Google Colab for this project, remember that any data uploaded to the Google Colab session gets deleted if the session disconnects. So to save us redownloading the data every time, we're going to download it once and save it to Google Drive.

Resource: For a good guide on getting data in and out of Google Colab, see the Google Colab io.ipynb tutorial.

To make sure we don't have to keep redownloading the data every time we leave and come back to Google Colab, we're going to:

Download the data if it doesn't already exist on Google Drive.
Copy it to Google Drive (because Google Colab connects nicely with Google Drive) if it isn't already there.
If the data already exists on Google Drive (we've been through steps 1 & 2), we'll import it instead.

There are two main options to connect Google Colab instances to Google Drive:

Click "Mount Drive" in "Files" menu on the left.
Mount programmatically with from google.colab import drive -> drive.mount('/content/drive').

More specifically, we're going to follow the following steps:

Mount Google Drive.
Setup constants such as our base directory to save files to, the target files we'd like to download and target URL we'd like to download from.
Setup our target local path to save to.
Check if the target files all exist in Google Drive and if they do, copy them locally.
If the target files don't exist in Google Drive, download them from the target URL with the !wget command.
Create a file on Google Drive to store the download files.
Copy the downloaded files to Google Drive for use later if needed.

A fair few steps, but nothing we can't handle!

Plus, this is all good practice for dealing with and manipulating data, a very important skill in the machine learning engineers toolbox.

Note: The following data download section is designed to run in Google Colab. If you are running locally, feel free to modify the code to save to a local directory instead of Google Drive.

In [5]:

Copied!





from pathlib import Path
from google.colab import drive

# 1. Mount Google Drive (this will bring up a pop-up to sign-in/authenticate)
# Note: This step is specifically for Google Colab, if you're working locally, you may need a different setup
drive.mount("/content/drive")

# 2. Setup constants
# Note: For constants like this, you'll often see them created as variables with all capitals
TARGET_DRIVE_PATH = Path("drive/MyDrive/tensorflow/dog_vision_data")
TARGET_FILES = ["images.tar", "annotation.tar", "lists.tar"]
TARGET_URL = "http://vision.stanford.edu/aditya86/ImageNetDogs"

# 3. Setup local path
local_dir = Path("dog_vision_data")

# 4. Check if the target files exist in Google Drive, if so, copy them to Google Colab
if all((TARGET_DRIVE_PATH / file).is_file() for file in TARGET_FILES):
  print(f"[INFO] Copying Dog Vision files from Google Drive to local directory...")
  print(f"[INFO] Source dir: {TARGET_DRIVE_PATH} -> Target dir: {local_dir}")
  !cp -r {TARGET_DRIVE_PATH} .
  print("[INFO] Good to go!")

else:
  # 5. If the files don't exist in Google Drive, download them
  print(f"[INFO] Target files not found in Google Drive.")
  print(f"[INFO] Downloading the target files... this shouldn't take too long...")
  for file in TARGET_FILES:
    # wget is short for "world wide web get", as in "get a file from the web"
    # -nc or --no-clobber = don't download files that already exist locally
    # -P = save the target file to a specified prefix, in our case, local_dir
    !wget -nc {TARGET_URL}/{file} -P {local_dir} # the "!" means to execute the command on the command line rather than in Python

  print(f"[INFO] Saving the target files to Google Drive, so they can be loaded later...")

  # 6. Ensure target directory in Google Drive exists
  TARGET_DRIVE_PATH.mkdir(parents=True, exist_ok=True)

  # 7. Copy downloaded files to Google Drive (so we can use them later and not have to re-download them)
  !cp -r {local_dir}/* {TARGET_DRIVE_PATH}/
from pathlib import Path
from google.colab import drive

# 1. Mount Google Drive (this will bring up a pop-up to sign-in/authenticate)
# Note: This step is specifically for Google Colab, if you're working locally, you may need a different setup
drive.mount("/content/drive")

# 2. Setup constants
# Note: For constants like this, you'll often see them created as variables with all capitals
TARGET_DRIVE_PATH = Path("drive/MyDrive/tensorflow/dog_vision_data")
TARGET_FILES = ["images.tar", "annotation.tar", "lists.tar"]
TARGET_URL = "http://vision.stanford.edu/aditya86/ImageNetDogs"

# 3. Setup local path
local_dir = Path("dog_vision_data")

# 4. Check if the target files exist in Google Drive, if so, copy them to Google Colab
if all((TARGET_DRIVE_PATH / file).is_file() for file in TARGET_FILES):
  print(f"[INFO] Copying Dog Vision files from Google Drive to local directory...")
  print(f"[INFO] Source dir: {TARGET_DRIVE_PATH} -> Target dir: {local_dir}")
  !cp -r {TARGET_DRIVE_PATH} .
  print("[INFO] Good to go!")

else:
  # 5. If the files don't exist in Google Drive, download them
  print(f"[INFO] Target files not found in Google Drive.")
  print(f"[INFO] Downloading the target files... this shouldn't take too long...")
  for file in TARGET_FILES:
    # wget is short for "world wide web get", as in "get a file from the web"
    # -nc or --no-clobber = don't download files that already exist locally
    # -P = save the target file to a specified prefix, in our case, local_dir
    !wget -nc {TARGET_URL}/{file} -P {local_dir} # the "!" means to execute the command on the command line rather than in Python

  print(f"[INFO] Saving the target files to Google Drive, so they can be loaded later...")

  # 6. Ensure target directory in Google Drive exists
  TARGET_DRIVE_PATH.mkdir(parents=True, exist_ok=True)

  # 7. Copy downloaded files to Google Drive (so we can use them later and not have to re-download them)
  !cp -r {local_dir}/* {TARGET_DRIVE_PATH}/

Mounted at /content/drive
[INFO] Copying Dog Vision files from Google Drive to local directory...
[INFO] Source dir: drive/MyDrive/tensorflow/dog_vision_data -> Target dir: dog_vision_data
[INFO] Good to go!

Data downloaded!

Nice work! This may seem like a bit of work but it's an important step with any deep learning project. Getting data to work with.

Now if we get the contents of local_dir (dog_vision_data), what do we get?

We can first make sure it exists with Path.exists() and then we can iterate through its contents with Path.iterdir() and print out the .name attribute of each file.

In [6]:

Copied!





if local_dir.exists():
  print(str(local_dir) + "/")
  for item in local_dir.iterdir():
    print("  ", item.name)
if local_dir.exists():
  print(str(local_dir) + "/")
  for item in local_dir.iterdir():
    print("  ", item.name)

dog_vision_data/
   lists.tar
   images.tar
   annotation.tar

Excellent! That's exactly the format we wanted.

Now you might've noticed that each file ends in .tar.

What's this?

Searching "what is .tar?", I found:

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes.

Source: Wikipedia tar page).

Exploring a bit more, I found that the .tar format is similar to .zip, however, .zip offers compression, where as .tar mostly combines many files into one.

So how do we "untar" the files in images.tar, annotation.tar and lists.tar?

We can use the !tar command (or just tar from outside of a Jupyter Cell)!

Doing this will expand all of the files within each of the .tar archives.

We'll also use a couple of flags to help us out:

The -x flag tells tar to extract files from an archive.
The -f flag specifies that the following argument is the name of the archive file.
You can combine flags by putting them together -xf.

Let's try it out!

In [7]:

Copied!





# Untar images, notes/tags:
# -x = extract files from the zipped file
# -v = verbose
# -z = decompress files
# -f = tell tar which file to deal with
!tar -xf dog_vision_data/images.tar
!tar -xf dog_vision_data/annotation.tar
!tar -xf dog_vision_data/lists.tar
# Untar images, notes/tags:
# -x = extract files from the zipped file
# -v = verbose
# -z = decompress files
# -f = tell tar which file to deal with
!tar -xf dog_vision_data/images.tar
!tar -xf dog_vision_data/annotation.tar
!tar -xf dog_vision_data/lists.tar

What new files did we get?

We can check in Google Colab by inspecting the "Files" tab on the left.

Or with Python by using os.listdir(".") where "." means "the current directory".

In [8]:

Copied!

import os

os.listdir(".") # "." stands for "here" or "current directory"
import os

os.listdir(".") # "." stands for "here" or "current directory"

Out[8]:

['.config',
 'dog_vision_data',
 'file_list.mat',
 'drive',
 'train_list.mat',
 'Images',
 'Annotation',
 'test_list.mat',
 'sample_data']

Ooooh!

Looks like we've got some new files!

Specifically:

train_list.mat - a list of all the training set images.
test_list.mat - a list of all the testing set images.
Images/ - a folder containing all of the images of dogs.
Annotation/ - a folder containing all of the annotations for each image.
file_list.mat - a list of all the files (training and test list combined).

Our next step is to go through them and see what we've got.

3. Exploring the data¶

Once you've got a dataset, before building a model, it's wise to explore it for a bit to see what kind of data you're working with.

Exploring a dataset can mean many things.

But a few rules of thumb when exploring new data:

View at least 100+ random samples for a "vibe check". For example, if you have a large dataset of images, randomly sample 10 images at a time and view them. Or if you have a large dataset of texts, what do some of them say? The same with audio. It will often be impossible to view all samples in your dataset, but you can start to get a good idea of what's inside by randomly inspecting samples.
Visualize, viuslaize, visualize! This is the data explorer's motto. Use it often. As in, it's good to get statistics about your dataset but it's often even better to view 100s of samples with your own eyes (see the point above).
Check the distributions and other various statistics. How many samples are there? If you're dealing with classification, how many classes and labels per class are there? Which classes don't you understand? If you don't have labels, investigate clustering methods to put similar samples close together.

As Abraham Lossfunction says...

text of a tweet saying 'if I had 8 hours to build a machine learning model, I'd spend the first 6 hours preparing my dataset'

A play on words of Abraham Lincoln's famous quote on sharpening an axe before cutting down a tree in theme of machine learning. Source: Daniel Bourke X/Twitter.

Our target data format¶

Since our goal is to build a computer vision model to classify dog breeds, we need a way to tell our model what breed of dog is in what image.

A common data format for a classification problem is to have samples stored in folders named after their class name.

For example:

example directory structure of images for an image classification problem

In the case of dog images, we'd put all of the images labelled "chihuahua" in a folder called chihuahua/ (and so on for all the other classes and images).

We could split these folders so that training images go in train/chihuahua/ and testing images go in test/chihuahua/.

This is what we'll be working towards creating.

Note: This structure of folder format doesn't just work for only images, it can work for text, audio and other kind of classification data too.

Exploring the file lists¶

How about we check out the train_list.mat, test_list.mat and full_list.mat files?

Searching online, for "what is a .mat file?", I found that it's a MATLAB file. Before Python became the default language for machine learning and deep learning, many models and datasets were built in MATLAB.

Then I searched, "how to open a .mat file with Python?" and found an answer on Stack Overflow saying I could use the scipy library (a scientific computing library).

The good news is, Google Colab comes with scipy preinstalled.

We can use the scipy.io.loadmat() method to open a .mat file.

In [9]:

Copied!





import scipy

# Open lists of train and test .mat
train_list = scipy.io.loadmat("train_list.mat")
test_list = scipy.io.loadmat("test_list.mat")
file_list = scipy.io.loadmat("file_list.mat")

# Let's inspect the output and type of the train_list
train_list, type(train_list)
import scipy

# Open lists of train and test .mat
train_list = scipy.io.loadmat("train_list.mat")
test_list = scipy.io.loadmat("test_list.mat")
file_list = scipy.io.loadmat("file_list.mat")

# Let's inspect the output and type of the train_list
train_list, type(train_list)

Out[9]:

({'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct  9 08:36:13 2011',
  '__version__': '1.0',
  '__globals__': [],
  'file_list': array([[array(['n02085620-Chihuahua/n02085620_5927.jpg'], dtype='<U38')],
         [array(['n02085620-Chihuahua/n02085620_4441.jpg'], dtype='<U38')],
         [array(['n02085620-Chihuahua/n02085620_1502.jpg'], dtype='<U38')],
         ...,
         [array(['n02116738-African_hunting_dog/n02116738_6754.jpg'], dtype='<U48')],
         [array(['n02116738-African_hunting_dog/n02116738_9333.jpg'], dtype='<U48')],
         [array(['n02116738-African_hunting_dog/n02116738_2503.jpg'], dtype='<U48')]],
        dtype=object),
  'annotation_list': array([[array(['n02085620-Chihuahua/n02085620_5927'], dtype='<U34')],
         [array(['n02085620-Chihuahua/n02085620_4441'], dtype='<U34')],
         [array(['n02085620-Chihuahua/n02085620_1502'], dtype='<U34')],
         ...,
         [array(['n02116738-African_hunting_dog/n02116738_6754'], dtype='<U44')],
         [array(['n02116738-African_hunting_dog/n02116738_9333'], dtype='<U44')],
         [array(['n02116738-African_hunting_dog/n02116738_2503'], dtype='<U44')]],
        dtype=object),
  'labels': array([[  1],
         [  1],
         [  1],
         ...,
         [120],
         [120],
         [120]], dtype=uint8)},
 dict)

Okay, looks like we get a dictionary with several fields we may be interested in.

Let's check out the keys of the dictionary.

In [10]:

Copied!

train_list.keys()
train_list.keys()

Out[10]:

dict_keys(['__header__', '__version__', '__globals__', 'file_list', 'annotation_list', 'labels'])

My guess is that the file_list key is what we're after, as this looks like a large array of image names (the files all end in .jpg).

How about we see how many files are in each file_list key?

In [11]:

Copied!





# Check the length of the file_list key
print(f"Number of files in training list: {len(train_list['file_list'])}")
print(f"Number of files in testing list: {len(test_list['file_list'])}")
print(f"Number of files in full list: {len(file_list['file_list'])}")
# Check the length of the file_list key
print(f"Number of files in training list: {len(train_list['file_list'])}")
print(f"Number of files in testing list: {len(test_list['file_list'])}")
print(f"Number of files in full list: {len(file_list['file_list'])}")

Number of files in training list: 12000
Number of files in testing list: 8580
Number of files in full list: 20580

Beautiful! Looks like these lists contain our training and test splits and the full list has a list of all the files in the dataset.

Let's inspect the train_list['file_list'] further.

In [12]:

Copied!

train_list['file_list']
train_list['file_list']

Out[12]:

array([[array(['n02085620-Chihuahua/n02085620_5927.jpg'], dtype='<U38')],
       [array(['n02085620-Chihuahua/n02085620_4441.jpg'], dtype='<U38')],
       [array(['n02085620-Chihuahua/n02085620_1502.jpg'], dtype='<U38')],
       ...,
       [array(['n02116738-African_hunting_dog/n02116738_6754.jpg'], dtype='<U48')],
       [array(['n02116738-African_hunting_dog/n02116738_9333.jpg'], dtype='<U48')],
       [array(['n02116738-African_hunting_dog/n02116738_2503.jpg'], dtype='<U48')]],
      dtype=object)

Looks like we've got an array of arrays.

How about we turn them into a Python list for easier handling?

We can do so by extracting each individual item via indexing and list comprehension.

Let's see what it's like to get a single file name.

In [13]:

Copied!

# Get a single filename
train_list['file_list'][0][0][0]
# Get a single filename
train_list['file_list'][0][0][0]

Out[13]:

'n02085620-Chihuahua/n02085620_5927.jpg'

Now let's get a Python list of all the individual file names (e.g. n02097130-giant_schnauzer/n02097130_2866.jpg) so we can use them later.

In [14]:

Copied!





# Get a Python list of all file names for each list
train_file_list = list([item[0][0] for item in train_list["file_list"]])
test_file_list = list([item[0][0] for item in test_list["file_list"]])
full_file_list = list([item[0][0] for item in file_list["file_list"]])

len(train_file_list), len(test_file_list), len(full_file_list)
# Get a Python list of all file names for each list
train_file_list = list([item[0][0] for item in train_list["file_list"]])
test_file_list = list([item[0][0] for item in test_list["file_list"]])
full_file_list = list([item[0][0] for item in file_list["file_list"]])

len(train_file_list), len(test_file_list), len(full_file_list)

Out[14]:

(12000, 8580, 20580)

Wonderful!

How about we view a random sample of the filenames we extracted?

Note: One of my favourite things to do whilst exploring data is to continually view random samples of it. Whether it be file names or images or text snippets. Why? You can always view the first X number of samples, however, I find that continually viewing random samples of the data gives you a better of overview of the different kinds of data you're working with. It also gives you the small chance of stumbling upon a potential error.

We can view random samples of the data using Python's random.sample() method.

In [15]:

Copied!

import random

random.sample(train_file_list, k=10)
import random

random.sample(train_file_list, k=10)

Out[15]:

['n02094258-Norwich_terrier/n02094258_439.jpg',
 'n02113624-toy_poodle/n02113624_3624.jpg',
 'n02102973-Irish_water_spaniel/n02102973_3635.jpg',
 'n02102318-cocker_spaniel/n02102318_2048.jpg',
 'n02098286-West_Highland_white_terrier/n02098286_1261.jpg',
 'n02088238-basset/n02088238_10095.jpg',
 'n02108915-French_bulldog/n02108915_9457.jpg',
 'n02098286-West_Highland_white_terrier/n02098286_5979.jpg',
 'n02109047-Great_Dane/n02109047_31274.jpg',
 'n02095889-Sealyham_terrier/n02095889_760.jpg']

Now let's do a quick check to make sure none of the training image file names appear in the testing image file names list.

This is important because the number 1 rule in machine learning is: always keep the test set separate from the training set.

We can check that there are no overlaps by turning train_file_list into a Python set() and using the intersection() method.

In [16]:

Copied!

# How many files in the training set intersect with the testing set?
len(set(train_file_list).intersection(test_file_list))
# How many files in the training set intersect with the testing set?
len(set(train_file_list).intersection(test_file_list))

Out[16]:

Excellent! Looks like there are no overlaps.

We could even put an assert check to raise an error if there are any overlaps (e.g. the length of the intersection is greater than 0).

assert works in the fashion: assert expression, message_if_expression_fails.

If the assert check doesn't output anything, we're good to go!

In [17]:

Copied!

# Make an assertion statement to check there are no overlaps (try changing test_file_list to train_file_list to see how it works)
assert len(set(train_file_list).intersection(test_file_list)) == 0, "There are overlaps between the training and test set files, please check them."
# Make an assertion statement to check there are no overlaps (try changing test_file_list to train_file_list to see how it works)
assert len(set(train_file_list).intersection(test_file_list)) == 0, "There are overlaps between the training and test set files, please check them."

Woohoo!

Looks like there's no overlaps, let's keep exploring the data.

Exploring the Annotation folder¶

How about we look at the Annotation folder next?

We can click the folder on the file explorer on the left to see what's inside.

But we can also explore the contents of the folder with Python.

Let's use os.listdir() to see what's inside.

In [18]:

Copied!

os.listdir("Annotation")[:10]
os.listdir("Annotation")[:10]

Out[18]:

['n02111129-Leonberg',
 'n02102973-Irish_water_spaniel',
 'n02110806-basenji',
 'n02105251-briard',
 'n02093991-Irish_terrier',
 'n02099267-flat-coated_retriever',
 'n02110627-affenpinscher',
 'n02112137-chow',
 'n02094114-Norfolk_terrier',
 'n02095570-Lakeland_terrier']

Looks like there are files each with a dog breed name with several numbered files inside.

Each of the files contains a HTML version of an annotation relating to an image.

For example, Annotation/n02085620-Chihuahua/n02085620_10074:

<annotation>
	<folder>02085620</folder>
	<filename>n02085620_10074</filename>
	<source>
		<database>ImageNet database</database>
	</source>
	<size>
		<width>333</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segment>0</segment>
	<object>
		<name>Chihuahua</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>25</xmin>
			<ymin>10</ymin>
			<xmax>276</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>

The fields include the name of the image, the size of the image, the label of the object and where it is (bounding box coordinates).

If we were performing object detection (finding the location of a thing in an image), we'd pay attention to the <bndbox> coordinates.

However, since we're focused on classification, our main consideration is the mapping of image name to class name.

Since we're dealing with 120 classes of dog breed, let's write a function to check the number of subfolders in the Annotation directory (there should be 120 subfolders, one for each breed of dog).

To do so, we can use Python's pathlib.Path class, along with Path.iterdir() to loop over the contents of Annotation and Path.is_dir() to check if the target item is a directory.

In [19]:

Copied!





from pathlib import Path

def count_subfolders(directory_path: str) -> int:
    """
    Count the number of subfolders in a given directory.

    Args:
    directory_path (str): The path to the directory in which to count subfolders.

    Returns:
    int: The number of subfolders in the specified directory.

    Examples:
    >>> count_subfolders('/path/to/directory')
    3  # if there are 3 subfolders in the specified directory
    """
    return len([name for name in Path(directory_path).iterdir() if name.is_dir()])


directory_path = "Annotation"
folder_count = count_subfolders(directory_path)
print(f"Number of subfolders in {directory_path} directory: {folder_count}")
from pathlib import Path

def count_subfolders(directory_path: str) -> int:
    """
    Count the number of subfolders in a given directory.

    Args:
    directory_path (str): The path to the directory in which to count subfolders.

    Returns:
    int: The number of subfolders in the specified directory.

    Examples:
    >>> count_subfolders('/path/to/directory')
    3  # if there are 3 subfolders in the specified directory
    """
    return len([name for name in Path(directory_path).iterdir() if name.is_dir()])


directory_path = "Annotation"
folder_count = count_subfolders(directory_path)
print(f"Number of subfolders in {directory_path} directory: {folder_count}")

Number of subfolders in Annotation directory: 120

Perfect!

There are 120 subfolders of annotations, one for each class of dog we'd like to identify.

But on further inspection of our file lists, it looks like the class name is already in the filepath.

In [20]:

Copied!

# View a single training file pathname
train_file_list[0]
# View a single training file pathname
train_file_list[0]

Out[20]:

'n02085620-Chihuahua/n02085620_5927.jpg'

With this information we know, that image n02085620_5927.jpg should contain a Chihuahua.

Let's check.

I searched "how to display an image in Google Colab" and found another answer on Stack Overflow.

Turns out you can use IPython.display.Image(), as Google Colab comes with IPython (Interactive Python) built-in.

In [21]:

Copied!

from IPython.display import Image
Image(Path("Images", train_file_list[0]))
from IPython.display import Image
Image(Path("Images", train_file_list[0]))

Out[21]:

No description has been provided for this image

Woah!

We get an image of a dog!

Exploring the Images folder¶

We've explored the Annotations folder, now let's check out our Images folder.

We know that the image file names come in the format class_name/image_name, for example, n02085620-Chihuahua/n02085620_5927.jpg.

To make things a little simpler, let's create the following:

A mapping from folder name -> class name in dictionary form, for example, {'n02113712-miniature_poodle': 'miniature_poodle', 'n02092339-Weimaraner': 'weimaraner', 'n02093991-Irish_terrier': 'irish_terrier'...}. This will help us when visualizing our data from its original folder.
A list of all unique dog class names with simple formatting, for example, ['affenpinscher', 'afghan_hound', 'african_hunting_dog', 'airedale', 'american_staffordshire_terrier'...].

Let's start by getting a list of all the folders in the Images directory with os.listdir().

In [22]:

Copied!

# Get a list of all image folders
image_folders = os.listdir("Images")
image_folders[:10]
# Get a list of all image folders
image_folders = os.listdir("Images")
image_folders[:10]

Out[22]:

['n02111129-Leonberg',
 'n02102973-Irish_water_spaniel',
 'n02110806-basenji',
 'n02105251-briard',
 'n02093991-Irish_terrier',
 'n02099267-flat-coated_retriever',
 'n02110627-affenpinscher',
 'n02112137-chow',
 'n02094114-Norfolk_terrier',
 'n02095570-Lakeland_terrier']

Excellent!

Now let's make a dictionary which maps from the folder name to a simplified version of the class name, for example:

{'n02085782-Japanese_spaniel': 'japanese_spaniel',
'n02106662-German_shepherd': 'german_shepherd',
'n02093256-Staffordshire_bullterrier': 'staffordshire_bullterrier',
...}

In [23]:

Copied!





# Create folder name -> class name dict
folder_to_class_name_dict = {}
for folder_name in image_folders:
  # Turn folder name into class_name
  # E.g. "n02089078-black-and-tan_coonhound" -> "black_and_tan_coonhound"
  # We'll split on the first "-" and join the rest of the string with "_" and then lower it
  class_name = "_".join(folder_name.split("-")[1:]).lower()
  folder_to_class_name_dict[folder_name] = class_name

# Make sure there are 120 entries in the dictionary
assert len(folder_to_class_name_dict) == 120
# Create folder name -> class name dict
folder_to_class_name_dict = {}
for folder_name in image_folders:
  # Turn folder name into class_name
  # E.g. "n02089078-black-and-tan_coonhound" -> "black_and_tan_coonhound"
  # We'll split on the first "-" and join the rest of the string with "_" and then lower it
  class_name = "_".join(folder_name.split("-")[1:]).lower()
  folder_to_class_name_dict[folder_name] = class_name

# Make sure there are 120 entries in the dictionary
assert len(folder_to_class_name_dict) == 120

Folder name to class name mapping created, let's view the first 10.

In [24]:

Copied!

list(folder_to_class_name_dict.items())[:10]
list(folder_to_class_name_dict.items())[:10]

Out[24]:

[('n02111129-Leonberg', 'leonberg'),
 ('n02102973-Irish_water_spaniel', 'irish_water_spaniel'),
 ('n02110806-basenji', 'basenji'),
 ('n02105251-briard', 'briard'),
 ('n02093991-Irish_terrier', 'irish_terrier'),
 ('n02099267-flat-coated_retriever', 'flat_coated_retriever'),
 ('n02110627-affenpinscher', 'affenpinscher'),
 ('n02112137-chow', 'chow'),
 ('n02094114-Norfolk_terrier', 'norfolk_terrier'),
 ('n02095570-Lakeland_terrier', 'lakeland_terrier')]

And we can get a list of unique dog names by getting the values() of the folder_to_class_name_dict and turning it into a list.

In [25]:

Copied!

dog_names = sorted(list(folder_to_class_name_dict.values()))
dog_names[:10]
dog_names = sorted(list(folder_to_class_name_dict.values()))
dog_names[:10]

Out[25]:

['affenpinscher',
 'afghan_hound',
 'african_hunting_dog',
 'airedale',
 'american_staffordshire_terrier',
 'appenzeller',
 'australian_terrier',
 'basenji',
 'basset',
 'beagle']

Perfect!

Now we've got:

folder_to_class_name_dict - a mapping from the folder name to the class name.
dog_names - a list of all the unique dog breeds we're working with.

Visualize a group of random images¶

How about we follow the data explorers motto of visualize, visualize, visualize and view some random images?

To help us visualize, let's create a function that takes in a list of image paths and then randomly selects 10 of those paths to display.

The function will:

Take in a select list of image paths.
Create a grid of matplotlib plots (e.g. 2x5 = 10 plots to plot on).
Randomly sample 10 image paths from the input image path list (using random.sample()).
Iterate through the flattened axes via axes.flat which is a reference to the attribute numpy.ndarray.flat.
Extract the sample path from the list of samples.
Get the sample title from the parent folder of the path using Path.parent.stem and then extract the formatted dog breed name by indexing folder_to_class_name_dict.
Read the image with plt.imread() and show it on the target ax with ax.imshow().
Set the title of the plot to the parent folder name with ax.set_title() and turn the axis marks of with ax.axis("off") (this makes for pretty plots).
Show the plot with plt.show().

Woah!

A lot of steps! But nothing we can't handle, let's do it.

In [26]:

Copied!





import random

from pathlib import Path
from typing import List

import matplotlib.pyplot as plt

# 1. Take in a select list of image paths
def plot_10_random_images_from_path_list(path_list: List[Path],
                                         extract_title: bool=True) -> None:
  # 2. Set up a grid of plots
  fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))

  # 3. Randomly sample 10 paths from the list
  samples = random.sample(path_list, 10)

  # 4. Iterate through the flattened axes and corresponding sample paths
  for i, ax in enumerate(axes.flat):

    # 5. Get the target sample path (e.g. "Images/n02087394-Rhodesian_ridgeback/n02087394_1161.jpg")
    sample_path = samples[i]

    # 6. Extract the parent directory name to use as the title (if necessary)
    # (e.g. n02087394-Rhodesian_ridgeback/n02087394_1161.jpg -> n02087394-Rhodesian_ridgeback -> rhodesian_ridgeback)
    if extract_title:
      sample_title = folder_to_class_name_dict[sample_path.parent.stem]
    else:
      sample_title = sample_path.parent.stem

    # 7. Read the image file and plot it on the corresponding axis
    ax.imshow(plt.imread(sample_path))

    # 8. Set the title of the axis and turn of the axis (for pretty plots)
    ax.set_title(sample_title)
    ax.axis("off")

  # 9. Display the plot
  plt.show()

plot_10_random_images_from_path_list(path_list=[Path("Images") / Path(file) for file in train_file_list])
import random

from pathlib import Path
from typing import List

import matplotlib.pyplot as plt

# 1. Take in a select list of image paths
def plot_10_random_images_from_path_list(path_list: List[Path],
                                         extract_title: bool=True) -> None:
  # 2. Set up a grid of plots
  fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))

  # 3. Randomly sample 10 paths from the list
  samples = random.sample(path_list, 10)

  # 4. Iterate through the flattened axes and corresponding sample paths
  for i, ax in enumerate(axes.flat):

    # 5. Get the target sample path (e.g. "Images/n02087394-Rhodesian_ridgeback/n02087394_1161.jpg")
    sample_path = samples[i]

    # 6. Extract the parent directory name to use as the title (if necessary)
    # (e.g. n02087394-Rhodesian_ridgeback/n02087394_1161.jpg -> n02087394-Rhodesian_ridgeback -> rhodesian_ridgeback)
    if extract_title:
      sample_title = folder_to_class_name_dict[sample_path.parent.stem]
    else:
      sample_title = sample_path.parent.stem

    # 7. Read the image file and plot it on the corresponding axis
    ax.imshow(plt.imread(sample_path))

    # 8. Set the title of the axis and turn of the axis (for pretty plots)
    ax.set_title(sample_title)
    ax.axis("off")

  # 9. Display the plot
  plt.show()

plot_10_random_images_from_path_list(path_list=[Path("Images") / Path(file) for file in train_file_list])

Those are some nice looking dogs!

What I like to do here is rerun the random visualizations until I've seen 100+ samples so I've got an idea of the data we're working with.

Question: Here's something to think about, how would you code a system of rules to differentiate between all the different breeds of dogs? Perhaps you write an algorithm to look at the shapes or the colours? For example, if the dog had black fur, it's unlikely to be a golden retriever. You might be thinking "that would take quite a long time..." And you'd be right. Then how would we do it? With machine learning of course!

Exploring the distribution of our data¶

After visualization, another valuable way to explore the data is by checking the data distribution.

Distribution refers to the "spread" of data.

In our case, how many images of dogs do we have per breed?

A balanced distribution would mean having roughly the same number of images for each breed (e.g. 100 images per dog breed).

Note: There's a deeper level of distribution than just images per dog breed. Ideally, the images for each different breed are well distributed as well. For example, we wouldn't want to have 100 of the same image per dog breed. Not only would we like a similar number of images per breed, we'd like the images of each particular breed to be in different scenarios, different lighting, different angles. We want this because we want to our model to be able to recognize the correct dog breed no matter what angle the photo is taken from.

To figure out how many images we have per class, let's write a function count the number of images per subfolder in a given directory.

Specifically, we'll want the function to:

Take in a target directory/folder.
Create a list of all the subdirectories/subfolders in the target folder.
Create an empty list, image_class_counts to append subfolders and their counts to.
Iterate through all of the subdirectories.
Get the class name of the target folder as the name of the folder.
Count the number of images in the target folder using the length of the list of image paths (we can get these with Path().rglob(*.jpg) where *.jpg means "all files with the extension .jpg.
Append a dictionary of {"class_name": class_name, "image_count": image_count} to the image_class_counts list (we create a list of dictionaries so we can turn this into a pandas DataFrame).
Return the image_class_counts list.

In [27]:

Copied!





# Create a dictionary of image counts
from pathlib import Path
from typing import List, Dict

# 1. Take in a target directory
def count_images_in_subdirs(target_directory: str) -> List[Dict[str, int]]:
    """
    Counts the number of JPEG images in each subdirectory of the given directory.

    Each subdirectory is assumed to represent a class, and the function counts
    the number of '.jpg' files within each one. The result is a list of
    dictionaries with the class name and corresponding image count.

    Args:
        target_directory (str): The path to the directory containing subdirectories.

    Returns:
        List[Dict[str, int]]: A list of dictionaries with 'class_name' and 'image_count' for each subdirectory.

    Examples:
        >>> count_images_in_subdirs('/path/to/directory')
        [{'class_name': 'beagle', 'image_count': 50}, {'class_name': 'poodle', 'image_count': 60}]
    """
    # 2. Create a list of all the subdirectoires in the target directory (these contain our images)
    images_dir = Path(target_directory)
    image_class_dirs = [directory for directory in images_dir.iterdir() if directory.is_dir()]

    # 3. Create an empty list to append image counts to
    image_class_counts = []

    # 4. Iterate through all of the subdirectories
    for image_class_dir in image_class_dirs:

        # 5. Get the class name from image directory (e.g. "Images/n02116738-African_hunting_dog" -> "n02116738-African_hunting_dog")
        class_name = image_class_dir.stem

        # 6. Count the number of images in the target subdirectory
        image_count = len(list(image_class_dir.rglob("*.jpg")))  # get length all files with .jpg file extension

        # 7. Append a dictionary of class name and image count to count list
        image_class_counts.append({"class_name": class_name,
                                   "image_count": image_count})

    # 8. Return the list
    return image_class_counts
# Create a dictionary of image counts
from pathlib import Path
from typing import List, Dict

# 1. Take in a target directory
def count_images_in_subdirs(target_directory: str) -> List[Dict[str, int]]:
    """
    Counts the number of JPEG images in each subdirectory of the given directory.

    Each subdirectory is assumed to represent a class, and the function counts
    the number of '.jpg' files within each one. The result is a list of
    dictionaries with the class name and corresponding image count.

    Args:
        target_directory (str): The path to the directory containing subdirectories.

    Returns:
        List[Dict[str, int]]: A list of dictionaries with 'class_name' and 'image_count' for each subdirectory.

    Examples:
        >>> count_images_in_subdirs('/path/to/directory')
        [{'class_name': 'beagle', 'image_count': 50}, {'class_name': 'poodle', 'image_count': 60}]
    """
    # 2. Create a list of all the subdirectoires in the target directory (these contain our images)
    images_dir = Path(target_directory)
    image_class_dirs = [directory for directory in images_dir.iterdir() if directory.is_dir()]

    # 3. Create an empty list to append image counts to
    image_class_counts = []

    # 4. Iterate through all of the subdirectories
    for image_class_dir in image_class_dirs:

        # 5. Get the class name from image directory (e.g. "Images/n02116738-African_hunting_dog" -> "n02116738-African_hunting_dog")
        class_name = image_class_dir.stem

        # 6. Count the number of images in the target subdirectory
        image_count = len(list(image_class_dir.rglob("*.jpg")))  # get length all files with .jpg file extension

        # 7. Append a dictionary of class name and image count to count list
        image_class_counts.append({"class_name": class_name,
                                   "image_count": image_count})

    # 8. Return the list
    return image_class_counts

Ho ho, what a function!

Let's run it on our target directory Images and view the first few indexes.

In [28]:

Copied!

image_class_counts = count_images_in_subdirs("Images")
image_class_counts[:3]
image_class_counts = count_images_in_subdirs("Images")
image_class_counts[:3]

Out[28]:

[{'class_name': 'n02111129-Leonberg', 'image_count': 210},
 {'class_name': 'n02102973-Irish_water_spaniel', 'image_count': 150},
 {'class_name': 'n02110806-basenji', 'image_count': 209}]

Nice!

Since our image_class_counts variable is the form of a list of dictionaries, we can turn it into a pandas DataFrame.

Let's sort the DataFrame by "image_count" so the classes with the most images appear at the top, we can do so with DataFrame.sort_values().

In [29]:

Copied!





# Create a DataFrame
import pandas as pd
image_counts_df = pd.DataFrame(image_class_counts).sort_values(by="image_count", ascending=False)
image_counts_df.head()
# Create a DataFrame
import pandas as pd
image_counts_df = pd.DataFrame(image_class_counts).sort_values(by="image_count", ascending=False)
image_counts_df.head()

Out[29]:

	class_name	image_count
116	n02085936-Maltese_dog	252
53	n02088094-Afghan_hound	239
111	n02092002-Scottish_deerhound	232
103	n02112018-Pomeranian	219
54	n02107683-Bernese_mountain_dog	218

And let's cleanup the "class_name" column to be more readable by mapping the the values to our folder_to_class_name_dict.

In [30]:

Copied!

# Make class name column easier to read
image_counts_df["class_name"] = image_counts_df["class_name"].map(folder_to_class_name_dict)
image_counts_df.head()
# Make class name column easier to read
image_counts_df["class_name"] = image_counts_df["class_name"].map(folder_to_class_name_dict)
image_counts_df.head()

Out[30]:

	class_name	image_count
116	maltese_dog	252
53	afghan_hound	239
111	scottish_deerhound	232
103	pomeranian	219
54	bernese_mountain_dog	218

Now we've got a DataFrame of image counts per class, we can make them more visual by turning them into a plot.

We covered plotting data directly from pandas DataFrame's in Section 3 of the Introduction to Matplotlib notebook: Plotting data directly with pandas.

To do so, we can use image_counts_df.plot(kind="bar", ...) along with some other customization.

In [31]:

Copied!





# Turn the image counts DataFrame into a graph
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 7))
image_counts_df.plot(kind="bar",
                     x="class_name",
                     y="image_count",
                     legend=False,
                     ax=plt.gca()) # plt.gca() = "get current axis", get the plt we setup above and put the data there

# Add customization
plt.ylabel("Image Count")
plt.title("Total Image Counts by Class")
plt.xticks(rotation=90, # Rotate the x labels for better visibility
           fontsize=8) # Make the font size smaller for easier reading
plt.tight_layout() # Ensure things fit nicely
plt.show()
# Turn the image counts DataFrame into a graph
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 7))
image_counts_df.plot(kind="bar",
                     x="class_name",
                     y="image_count",
                     legend=False,
                     ax=plt.gca()) # plt.gca() = "get current axis", get the plt we setup above and put the data there

# Add customization
plt.ylabel("Image Count")
plt.title("Total Image Counts by Class")
plt.xticks(rotation=90, # Rotate the x labels for better visibility
           fontsize=8) # Make the font size smaller for easier reading
plt.tight_layout() # Ensure things fit nicely
plt.show()

Beautiful! It looks like our classes are quite balanced. Each breed of dog has ~150 or more images.

We can find out some other quick stats about our data with DataFrame.describe().

In [32]:

Copied!

# Get various statistics about our data distribution
image_counts_df.describe()
# Get various statistics about our data distribution
image_counts_df.describe()

Out[32]:

	image_count
count	120.000000
mean	171.500000
std	23.220898
min	148.000000
25%	152.750000
50%	159.500000
75%	186.250000
max	252.000000

And the table shows a similar story to the plot. We can see the minimum number of images per class is 148, where as the maximum number of images is 252.

If one class had 10x less images than another class, we may look into collecting more data to improve the balance.

The main takeaway(s):

When working on a classification problem, ideally, all classes have a similar number of samples (however, in some problems this may be unattainable, such as fraud detection, where you may have 1000x more "not fraud" samples to "fraud" samples.
If you wanted to add a new class of dog breed to the existing 120, ideally, you'd have at least ~150 images for it (though as we'll see with transfer learning, the number of required images could be less as long as they're high quality).

4. Creating training and test data split directories¶

After exploring the data, one of the next best things you can do is create experimental data splits.

This includes:

Set Name	Description	Typical Percentage of Data
Training Set	A dataset for the model to learn on	70-80%
Testing Set	A dataset for the model to be evaluated on	20-30%
(Optional) Validation Set	A dataset to tune the model on	50% of the test data
(Optional) Smaller Training Set	A smaller size dataset to run quick experiments on	5-20% of the training set

Our dog dataset already comes with specified training and test set splits.

So we'll stick with those.

But we'll also create a smaller training set (a random 10% of the training data) so we can stick to the machine learning engineers motto of experiment, experiment, experiment! and run quicker experiments.

Note: One of the most important things in machine learning is being able to experiment quickly. As in, try a new model, try a new set of hyperparameters or try a new training setup. When you start out, you want the time between your experiments to be as small as possible so you can quickly figure out what doesn't work so you can spend more time on and run larger experiments with what does work.

As previously discussed, we're working towards a directory structure of:

images_split/
├── train/
│   ├── class_1/
│   │   ├── train_image1.jpg
│   │   ├── train_image2.jpg
│   │   └── ...
│   ├── class_2/
│   │   ├── train_image1.jpg
│   │   ├── train_image2.jpg
│   │   └── ...
└── test/
    ├── class_1/
    │   ├── test_image1.jpg
    │   ├── test_image2.jpg
    │   └── ...
    ├── class_2/
    │   ├── test_image1.jpg
    │   ├── test_image2.jpg
...

So let's write some code to create:

images/train/ directory to hold all of the training images.
images/test/ directory to hold all of the testing images.
Make a directory inside each of images/train/ and images/test/ for each of the dog breed classes.

We can make each of the directories we need using Path.mkdir().

For the dog breed directories, we'll loop through the list of dog_names and create a folder for each inside the images/train/ and images/test/ directories.

In [33]:

Copied!





from pathlib import Path

# Define the target directory for image splits to go
images_split_dir = Path("images_split")

# Define the training and test directories
train_dir = images_split_dir / "train"
test_dir = images_split_dir / "test"

# Using Path.mkdir with exist_ok=True ensures the directory is created only if it doesn't exist
train_dir.mkdir(parents=True, exist_ok=True)
test_dir.mkdir(parents=True, exist_ok=True)
print(f"Directory {train_dir} is exists.")
print(f"Directory {test_dir} is exists.")

# Make a folder for each dog name
for dog_name in dog_names:
  # Make training dir folder
  train_class_dir = train_dir / dog_name
  train_class_dir.mkdir(parents=True, exist_ok=True)
  # print(f"Making directory: {train_class_dir}")

  # Make testing dir folder
  test_class_dir = test_dir / dog_name
  test_class_dir.mkdir(parents=True, exist_ok=True)
  # print(f"Making directory: {test_class_dir}")

# Make sure there is 120 subfolders in each
assert count_subfolders(train_dir) == len(dog_names)
assert count_subfolders(test_dir) == len(dog_names)
from pathlib import Path

# Define the target directory for image splits to go
images_split_dir = Path("images_split")

# Define the training and test directories
train_dir = images_split_dir / "train"
test_dir = images_split_dir / "test"

# Using Path.mkdir with exist_ok=True ensures the directory is created only if it doesn't exist
train_dir.mkdir(parents=True, exist_ok=True)
test_dir.mkdir(parents=True, exist_ok=True)
print(f"Directory {train_dir} is exists.")
print(f"Directory {test_dir} is exists.")

# Make a folder for each dog name
for dog_name in dog_names:
  # Make training dir folder
  train_class_dir = train_dir / dog_name
  train_class_dir.mkdir(parents=True, exist_ok=True)
  # print(f"Making directory: {train_class_dir}")

  # Make testing dir folder
  test_class_dir = test_dir / dog_name
  test_class_dir.mkdir(parents=True, exist_ok=True)
  # print(f"Making directory: {test_class_dir}")

# Make sure there is 120 subfolders in each
assert count_subfolders(train_dir) == len(dog_names)
assert count_subfolders(test_dir) == len(dog_names)

Directory images_split/train is exists.
Directory images_split/test is exists.

Excellent!

We can check out the data split directories/folders we created by inspecting them in the files panel in Google Colab.

Alternatively, we can check the names of each by list the subdirectories inside them.

In [34]:

Copied!

# See the first 10 directories in the training split dir
sorted([str(dir_name) for dir_name in train_dir.iterdir() if dir_name.is_dir()])[:10]
# See the first 10 directories in the training split dir
sorted([str(dir_name) for dir_name in train_dir.iterdir() if dir_name.is_dir()])[:10]

Out[34]:

['images_split/train/affenpinscher',
 'images_split/train/afghan_hound',
 'images_split/train/african_hunting_dog',
 'images_split/train/airedale',
 'images_split/train/american_staffordshire_terrier',
 'images_split/train/appenzeller',
 'images_split/train/australian_terrier',
 'images_split/train/basenji',
 'images_split/train/basset',
 'images_split/train/beagle']

You might've noticed that all of our dog breed directories are empty.

Let's change that by getting some images in there.

To do so, we'll create a function called copy_files_to_target_dir() which will copy images from the Images directory into their respective directories inside images/train and images/test.

More specifically, it will:

Take in a list of source files to copy (e.g. train_file_list) and a target directory to copy files to.
Iterate through the list of sources files to copy (we'll use tqdm which comes installed with Google Colab to create a progress bar of how many files have been copied).
Convert the source file path to a Path object.
Split the source file path and create a Path object for the destination folder (e.g. "n02112018-Pomeranian" -> "pomeranian").
Get the target file name (e.g. "n02112018-Pomeranian/n02112018_6208.jpg" -> "n02112018_6208.jpg").
Create a destination path for the source file to be copied to (e.g. images_split/train/pomeranian/n02112018_6208.jpg).
Ensure the destination directory exists, similar to the step we took in the previous section (you can't copy files to a directory that doesn't exist).
Print out the progress of copying (if necessary).
Copy the source file to the destination using Python's shutil.copy2(src, dst).

In [35]:

Copied!





from pathlib import Path
from shutil import copy2
from tqdm.auto import tqdm

# 1. Take in a list of source files to copy and a target directory
def copy_files_to_target_dir(file_list: list[str],
                             target_dir: str,
                             images_dir: str = "Images",
                             verbose: bool = False) -> None:
    """
    Copies a list of files from the images directory to a target directory.

    Parameters:
    file_list (list[str]): A list of file paths to copy.
    target_dir (str): The destination directory path where files will be copied.
    images_dir (str, optional): The directory path where the images are currently stored. Defaults to 'Images'.
    verbose (bool, optional): If set to True, the function will print out the file paths as they are being copied. Defaults to False.

    Returns:
    None
    """
    # 2. Iterate through source files
    for file in tqdm(file_list):

      # 3. Convert file path to a Path object
      source_file_path = Path(images_dir) / Path(file)

      # 4. Split the file path and create a Path object for the destination folder
      # e.g. "n02112018-Pomeranian" -> "pomeranian"
      file_class_name = folder_to_class_name_dict[Path(file).parts[0]]

      # 5. Get the name of the target image
      file_image_name = Path(file).name

      # 6. Create the destination path
      destination_file_path = Path(target_dir) / file_class_name / file_image_name

      # 7. Ensure the destination directory exists (this is a safety check, can't copy an image to a file that doesn't exist)
      destination_file_path.parent.mkdir(parents=True, exist_ok=True)

      # 8. Print out copy message if necessary
      if verbose:
        print(f"[INFO] Copying: {source_file_path} to {destination_file_path}")

      # 9. Copy the original path to the destination path
      copy2(src=source_file_path, dst=destination_file_path)
from pathlib import Path
from shutil import copy2
from tqdm.auto import tqdm

# 1. Take in a list of source files to copy and a target directory
def copy_files_to_target_dir(file_list: list[str],
                             target_dir: str,
                             images_dir: str = "Images",
                             verbose: bool = False) -> None:
    """
    Copies a list of files from the images directory to a target directory.

    Parameters:
    file_list (list[str]): A list of file paths to copy.
    target_dir (str): The destination directory path where files will be copied.
    images_dir (str, optional): The directory path where the images are currently stored. Defaults to 'Images'.
    verbose (bool, optional): If set to True, the function will print out the file paths as they are being copied. Defaults to False.

    Returns:
    None
    """
    # 2. Iterate through source files
    for file in tqdm(file_list):

      # 3. Convert file path to a Path object
      source_file_path = Path(images_dir) / Path(file)

      # 4. Split the file path and create a Path object for the destination folder
      # e.g. "n02112018-Pomeranian" -> "pomeranian"
      file_class_name = folder_to_class_name_dict[Path(file).parts[0]]

      # 5. Get the name of the target image
      file_image_name = Path(file).name

      # 6. Create the destination path
      destination_file_path = Path(target_dir) / file_class_name / file_image_name

      # 7. Ensure the destination directory exists (this is a safety check, can't copy an image to a file that doesn't exist)
      destination_file_path.parent.mkdir(parents=True, exist_ok=True)

      # 8. Print out copy message if necessary
      if verbose:
        print(f"[INFO] Copying: {source_file_path} to {destination_file_path}")

      # 9. Copy the original path to the destination path
      copy2(src=source_file_path, dst=destination_file_path)

Copying function created!

Let's test it out by copying the files in the train_file_list to train_dir.

In [36]:

Copied!





# Copy training images from Images to images_split/train/...
copy_files_to_target_dir(file_list=train_file_list,
                         target_dir=train_dir,
                         verbose=False) # set this to True to get an output of the copy process
                                        # (warning: this will output a large amount of text)
# Copy training images from Images to images_split/train/...
copy_files_to_target_dir(file_list=train_file_list,
                         target_dir=train_dir,
                         verbose=False) # set this to True to get an output of the copy process
                                        # (warning: this will output a large amount of text)

  0%|          | 0/12000 [00:00<?, ?it/s]

Woohoo!

Looks like our copying function copied 12000 training images in their respective directories inside images_split/train/.

How about we do the same for test_file_list and test_dir?

In [37]:

Copied!

copy_files_to_target_dir(file_list=test_file_list,
                         target_dir=test_dir,
                         verbose=False)
copy_files_to_target_dir(file_list=test_file_list,
                         target_dir=test_dir,
                         verbose=False)

  0%|          | 0/8580 [00:00<?, ?it/s]

Nice! 8580 testing images copied from Images to images_split/test/.

Let's write some code to check that the number of files in the train_file_list is the same as the number of images files in train_dir (and the same for the test files).

In [38]:

Copied!





# Get list of of all .jpg paths in train and test image directories
train_image_paths = list(train_dir.rglob("*.jpg"))
test_image_paths = list(test_dir.rglob("*.jpg"))

# Make sure the number of images in the training and test directories equals the number of files in their original lists
assert len(train_image_paths) == len(train_file_list)
assert len(test_image_paths) == len(test_file_list)

print(f"Number of images in {train_dir}: {len(train_image_paths)}")
print(f"Number of images in {test_dir}: {len(test_image_paths)}")
# Get list of of all .jpg paths in train and test image directories
train_image_paths = list(train_dir.rglob("*.jpg"))
test_image_paths = list(test_dir.rglob("*.jpg"))

# Make sure the number of images in the training and test directories equals the number of files in their original lists
assert len(train_image_paths) == len(train_file_list)
assert len(test_image_paths) == len(test_file_list)

print(f"Number of images in {train_dir}: {len(train_image_paths)}")
print(f"Number of images in {test_dir}: {len(test_image_paths)}")

Number of images in images_split/train: 12000
Number of images in images_split/test: 8580

And adhering to the data explorers motto of visualize, visualize, visualize!, let's plot some random images from the train_image_paths list.

In [39]:

Copied!

# Plot 10 random images from the train_image_paths
plot_10_random_images_from_path_list(path_list=train_image_paths,
                                     extract_title=False) # don't need to extract the title since the image directories are already named simply
# Plot 10 random images from the train_image_paths
plot_10_random_images_from_path_list(path_list=train_image_paths,
                                     extract_title=False) # don't need to extract the title since the image directories are already named simply

Making a 10% training dataset split¶

We've already split the data into training and test sets, so why might we want to make another split?

Well, remember the machine learners motto?

Experiment, experiment, experiment!

We're going to make another training split which contains a random 10% (approximately 1,200 images, since the original training set has 12,000 images) of the data from the original training split.

Why?

Because whilst machine learning models generally perform better with more data, having more data means longer computation times.

And longer computation times means the time between our experiments gets longer.

Which is not what we want in the beginning.

In the beginning of any new machine learning project, your focus should be to reduce the amount of time between experiments as much as possible.

Why?

Because running more experiments means you can figure out what doesn't work.

And if you figure out what doesn't work, you can start working closer towards what does.

Once you find something that does work, you can start to scale up your experiments (more data, bigger models, longer training times - we'll see these later on).

To make our 10% training dataset, let's copy a random 10% of the existing training set to a new folder called images_split/train_10_percent, so we've got the layout:

images_split/
├── train/
│   ├── class_1/
│   │   ├── train_image1.jpg
│   │   ├── train_image2.jpg
│   │   └── ...
│   ├── class_2/
│   │   ├── train_image1.jpg
│   │   ├── train_image2.jpg
│   │   └── ...
├── train_10_percent/ <--- NEW!
│   ├── class_1/
│   │   ├── random_train_image42.jpg
│   │   └── ...
│   ├── class_2/
│   │   ├── random_train_image106.jpg
│   │   └── ...
└── test/
    ├── class_1/
    │   ├── test_image1.jpg
    │   ├── test_image2.jpg
    │   └── ...
    ├── class_2/
    │   ├── test_image1.jpg
    │   ├── test_image2.jpg
    │   └── ...

Let's start by creating that folder.

In [40]:

Copied!

# Create train_10_percent directory
train_10_percent_dir = images_split_dir / "train_10_percent"
train_10_percent_dir.mkdir(parents=True, exist_ok=True)
# Create train_10_percent directory
train_10_percent_dir = images_split_dir / "train_10_percent"
train_10_percent_dir.mkdir(parents=True, exist_ok=True)

Now we should have 3 split folders inside images_split.

In [41]:

Copied!

os.listdir(images_split_dir)
os.listdir(images_split_dir)

Out[41]:

['test', 'train_10_percent', 'train']

Beautiful!

Now let's create a list of random training sample filepaths using Python's random.sample(), we'll want the total length of the list to equal 10% of the original training split.

To make things reproducible, we'll use a random seed (this is not 100% necessary, it just makes it so we get the same 10% of training image paths each time).

In [42]:

Copied!





import random

# Set a random seed
random.seed(42)

# Get a 10% sample of the training image paths
train_image_paths_random_10_percent = random.sample(population=train_image_paths,
                                                    k=int(0.1*len(train_image_paths)))

# Check how many image paths we got
print(f"Original number of training image paths: {len(train_image_paths)}")
print(f"Number of 10% training image paths: {len(train_image_paths_random_10_percent)}")
print("First 5 random 10% training image paths:")
train_image_paths_random_10_percent[:5]
import random

# Set a random seed
random.seed(42)

# Get a 10% sample of the training image paths
train_image_paths_random_10_percent = random.sample(population=train_image_paths,
                                                    k=int(0.1*len(train_image_paths)))

# Check how many image paths we got
print(f"Original number of training image paths: {len(train_image_paths)}")
print(f"Number of 10% training image paths: {len(train_image_paths_random_10_percent)}")
print("First 5 random 10% training image paths:")
train_image_paths_random_10_percent[:5]

Original number of training image paths: 12000
Number of 10% training image paths: 1200
First 5 random 10% training image paths:

Out[42]:

[PosixPath('images_split/train/miniature_pinscher/n02107312_2706.jpg'),
 PosixPath('images_split/train/irish_wolfhound/n02090721_272.jpg'),
 PosixPath('images_split/train/greater_swiss_mountain_dog/n02107574_3274.jpg'),
 PosixPath('images_split/train/italian_greyhound/n02091032_3763.jpg'),
 PosixPath('images_split/train/bloodhound/n02088466_7962.jpg')]

Random 10% training image paths acquired!

Let's copy them to the images_split/train_10_percent directory using similar code to our copy_files_to_target_dir() function.

In [43]:

Copied!





# Copy training 10% split images from images_split/train/ to images_split/train_10_percent/...
for source_file_path in tqdm(train_image_paths_random_10_percent):

  # Create the destination file path
  destination_file_and_image_name = Path(*source_file_path.parts[-2:]) # "images_split/train/yorkshire_terrier/n02094433_2223.jpg" -> "yorkshire_terrier/n02094433_2223.jpg"
  destination_file_path = train_10_percent_dir / destination_file_and_image_name # "yorkshire_terrier/n02094433_2223.jpg" -> "images_split/train_10_percent/yorkshire_terrier/n02094433_2223.jpg"

  # If the target directory doesn't exist, make it
  target_class_dir = destination_file_path.parent
  if not target_class_dir.is_dir():
    # print(f"Making directory: {target_class_dir}")
    target_class_dir.mkdir(parents=True,
                           exist_ok=True)

  # print(f"Copying: {source_file_path} to {destination_file_path}")
  copy2(src=source_file_path,
        dst=destination_file_path)
# Copy training 10% split images from images_split/train/ to images_split/train_10_percent/...
for source_file_path in tqdm(train_image_paths_random_10_percent):

  # Create the destination file path
  destination_file_and_image_name = Path(*source_file_path.parts[-2:]) # "images_split/train/yorkshire_terrier/n02094433_2223.jpg" -> "yorkshire_terrier/n02094433_2223.jpg"
  destination_file_path = train_10_percent_dir / destination_file_and_image_name # "yorkshire_terrier/n02094433_2223.jpg" -> "images_split/train_10_percent/yorkshire_terrier/n02094433_2223.jpg"

  # If the target directory doesn't exist, make it
  target_class_dir = destination_file_path.parent
  if not target_class_dir.is_dir():
    # print(f"Making directory: {target_class_dir}")
    target_class_dir.mkdir(parents=True,
                           exist_ok=True)

  # print(f"Copying: {source_file_path} to {destination_file_path}")
  copy2(src=source_file_path,
        dst=destination_file_path)

  0%|          | 0/1200 [00:00<?, ?it/s]

1200 images copied!

Let's check our training 10% set distribution and make sure we've got some images for each class.

We can use our count_images_in_subdirs() function to count the images in each of the dog breed folders in the train_10_percent_dir.

In [44]:

Copied!





# Count images in train_10_percent_dir
train_10_percent_image_class_counts = count_images_in_subdirs(train_10_percent_dir)
train_10_percent_image_class_counts_df = pd.DataFrame(train_10_percent_image_class_counts).sort_values("image_count", ascending=True)
train_10_percent_image_class_counts_df.head()
# Count images in train_10_percent_dir
train_10_percent_image_class_counts = count_images_in_subdirs(train_10_percent_dir)
train_10_percent_image_class_counts_df = pd.DataFrame(train_10_percent_image_class_counts).sort_values("image_count", ascending=True)
train_10_percent_image_class_counts_df.head()

Out[44]:

	class_name	image_count
33	labrador_retriever	3
23	welsh_springer_spaniel	4
61	great_dane	4
64	curly_coated_retriever	4
100	sussex_spaniel	5

Okay, looks like a few classes have only a handful of images.

Let's make sure there's 120 subfolders by checking the length of the train_10_percent_image_class_counts_df.

In [45]:

Copied!

# How many subfolders are there?
print(len(train_10_percent_image_class_counts_df))
# How many subfolders are there?
print(len(train_10_percent_image_class_counts_df))

Beautiful, our train 10% dataset split has a folder for each of the dog breed classes.

Note: Ideally our random 10% training set would have the same distribution per class as the original training set, however, for this example, we've taken a global random 10% rather than a random 10% per class. This is okay for now, however for more fine-grained tasks, you may want to make sure your smaller training set is better distributed.

For one last check, let's plot the distribution of our train 10% dataset.

In [46]:

Copied!





# Plot distribution of train 10% dataset.
plt.figure(figsize=(14, 7))
train_10_percent_image_class_counts_df.plot(kind="bar",
                     x="class_name",
                     y="image_count",
                     legend=False,
                     ax=plt.gca()) # plt.gca() = "get current axis", get the plt we setup above and put the data there

# Add customization
plt.title("Train 10 Percent Image Counts by Class")
plt.ylabel("Image Count")
plt.xticks(rotation=90, # Rotate the x labels for better visibility
           fontsize=8) # Make the font size smaller for easier reading
plt.tight_layout() # Ensure things fit nicely
plt.show()
# Plot distribution of train 10% dataset.
plt.figure(figsize=(14, 7))
train_10_percent_image_class_counts_df.plot(kind="bar",
                     x="class_name",
                     y="image_count",
                     legend=False,
                     ax=plt.gca()) # plt.gca() = "get current axis", get the plt we setup above and put the data there

# Add customization
plt.title("Train 10 Percent Image Counts by Class")
plt.ylabel("Image Count")
plt.xticks(rotation=90, # Rotate the x labels for better visibility
           fontsize=8) # Make the font size smaller for easier reading
plt.tight_layout() # Ensure things fit nicely
plt.show()

Excellent! Our train 10% dataset distribution looks similar to the original training set distribution.

However, it could be better.

If we really wanted to, we could recreate the train 10% dataset with 10% of the images from each class rather than 10% of images globally.

Extension: How would you create the train_10_percent data split with 10% of the images from each class? For example, each folder would have at least 10 images of a particular dog breed.

5. Turning datasets into TensorFlow Dataset(s)¶

Alright, we've spent a bunch of time getting our dog images into different folders.

But how do we get the images from different folders into a machine learning model?

Well, like the other machine learning models we've built throughout the course, we need a way to turn our images into numbers.

Specifically, we're going to turn our images into tensors.

That's where the "Tensor" comes from in "TensorFlow".

A tensor is a way to numerically represent something (where something can be almost anything you can think of, text, images, audio, rows and columns).

There are several different ways to load data into TensorFlow.

But the formula is the same across data types, have data -> use TensorFlow to turn it into tensors.

The reason why we spent time getting our data into the standard image classification format (where the class name is the folder name) is because TensorFlow includes several utility functions to load data from this directory format.

Function	Description
`tf.keras.utils.image_dataset_from_directory()`	Creates a `tf.data.Dataset` from image files in a directory.
`tf.keras.utils.audio_dataset_from_directory()`	Creates a `tf.data.Dataset` from audio files in a directory.
`tf.keras.utils.text_dataset_from_directory()`	Creates a `tf.data.Dataset` from text files in a directory.
`tf.keras.utils.timeseries_dataset_from_array()`	Creates a dataset of sliding windows over a timeseries provided as array.

What is a tf.data.Dataset?

It's TensorFlow's efficient way to store a potentially large set of elements.

As machine learning datasets can get quite large, you need an efficient way to store and load them.

This is what the tf.data.Dataset API provides.

And it's what we'd like to turn our dog images into.

Since we're working with images, we can do so with tf.keras.utils.image_dataset_from_directory().

We'll pass in the following parameters:

directory = the target directory we'd like to turn into a tf.data.Dataset.
label_mode = the kind of labels we'd like to use, in our case it's "categorical" since we're dealing with a multi-class classification problem (we would use "binary" if we were working with binary classifcation problem).
batch_size = the number of images we'd like our model to see at a time (due to computation limitations, our model won't be able to look at every image at once so we split them into small batches and the model looks at each batch individually), generally 32 is a good value to start, this means our model will look at 32 images at a time (this number is flexible).
image_size = the size we'd like to shape our images to before we feed them to our model (height x width).
shuffle = whether we'd like our dataset to be shuffled to randomize the order.
seed = if we're shuffling the order in a random fashion, do we want that to be reproducible?

Note: Values such as batch_size and image_size are known as hyperparameters, meaning they're values that you can decide what to set them as. As for the best value for a given hyperparameter, that depends highly on the data you're working with, problem space and compute capabilities you've got avaiable. Best to experiment!

With all this being said, let's see it in practice!

We'll make 3 tf.data.Dataset's, train_10_percent_ds, train_ds and test_ds.

In [47]:

Copied!





import tensorflow as tf

# Create constants
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
SEED = 42

# Create train 10% dataset
train_10_percent_ds = tf.keras.utils.image_dataset_from_directory(
    directory=train_10_percent_dir,
    label_mode="categorical", # turns labels into one-hot representations (e.g. [0, 0, 1, ..., 0, 0])
    batch_size=BATCH_SIZE,
    image_size=IMG_SIZE,
    shuffle=True, # shuffle training datasets to prevent learning of order
    seed=SEED
)

# Create full train dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
    directory=train_dir,
    label_mode="categorical",
    batch_size=BATCH_SIZE,
    image_size=IMG_SIZE,
    shuffle=True,
    seed=SEED
)

# Create test dataset
test_ds = tf.keras.utils.image_dataset_from_directory(
    directory=test_dir,
    label_mode="categorical",
    batch_size=BATCH_SIZE,
    image_size=IMG_SIZE,
    shuffle=False, # don't need to shuffle the test dataset (this makes evaluations easier)
    seed=SEED
)
import tensorflow as tf

# Create constants
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
SEED = 42

# Create train 10% dataset
train_10_percent_ds = tf.keras.utils.image_dataset_from_directory(
    directory=train_10_percent_dir,
    label_mode="categorical", # turns labels into one-hot representations (e.g. [0, 0, 1, ..., 0, 0])
    batch_size=BATCH_SIZE,
    image_size=IMG_SIZE,
    shuffle=True, # shuffle training datasets to prevent learning of order
    seed=SEED
)

# Create full train dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
    directory=train_dir,
    label_mode="categorical",
    batch_size=BATCH_SIZE,
    image_size=IMG_SIZE,
    shuffle=True,
    seed=SEED
)

# Create test dataset
test_ds = tf.keras.utils.image_dataset_from_directory(
    directory=test_dir,
    label_mode="categorical",
    batch_size=BATCH_SIZE,
    image_size=IMG_SIZE,
    shuffle=False, # don't need to shuffle the test dataset (this makes evaluations easier)
    seed=SEED
)

Found 1200 files belonging to 120 classes.
Found 12000 files belonging to 120 classes.
Found 8580 files belonging to 120 classes.

Note: If you're working with similar styles of data (e.g. all dog photos), it's best practice to shuffle training datasets to prevent the model from learning any order in the data, no need to shuffle testing datasets (this makes for easier evaluation).

tf.data.Datasets created!

Let's check out one of them.

In [48]:

Copied!

train_10_percent_ds
train_10_percent_ds

Out[48]:

<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 120), dtype=tf.float32, name=None))>

You'll notice a few things going on here.

Essentially, we've got a collection of tuples:

The image tensor(s) - TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None) where (None, 224, 224, 3) is the shape of the image tensor (None is the batch size, (224, 224) is the IMG_SIZE we set and 3 is the number of colour channels, as in, red, green, blue or RGB since our images are in colour).
The label tensor(s) - TensorSpec(shape=(None, 120), dtype=tf.float32, name=None) where None is the batch size and 120 is the number of labels we're using.

The batch size often appears as None since it's flexible and can change on the fly.

Each batch of images is assosciated with a batch of labels.

Instead of talking about it, let's check out what a single batch looks like.

We can do so by turning the tf.data.Dataset into an iterable with Python's built-in iter() and then getting the "next" batch with next().

In [49]:

Copied!

# What does a single batch look like?
image_batch, label_batch = next(iter(train_ds))
image_batch.shape, label_batch.shape
# What does a single batch look like?
image_batch, label_batch = next(iter(train_ds))
image_batch.shape, label_batch.shape

Out[49]:

(TensorShape([32, 224, 224, 3]), TensorShape([32, 120]))

Nice!

We get back a single batch of images and labels.

Looks like a single image_batch has a shape of [32, 224, 224, 3] ([batch_size, height, width, colour_channels]).

And our labels have a shape of [32, 120] ([batch_size, labels]).

These are numerical representations of our data images and labels!

Note: The shape of a tensor does not necessarily reflect the values inside a tensor. The shape only reflects the dimensionality of a tensor. For example, [32, 224, 224, 3] is a 4-dimensional tensor. Values inside a tensor can be any number (positive, negative, 0, float, integer, etc) representing almost any kind of data.

We can further inspect our data by looking at a single sample.

In [50]:

Copied!





# Get a single sample from a single batch
print(f"Single image tensor:\n{image_batch[0]}\n")
print(f"Single label tensor: {label_batch[0]}") # notice the 1 is the index of the target label (our labels are one-hot encoded)
print(f"Single sample class name: {dog_names[tf.argmax(label_batch[0])]}")
# Get a single sample from a single batch
print(f"Single image tensor:\n{image_batch[0]}\n")
print(f"Single label tensor: {label_batch[0]}") # notice the 1 is the index of the target label (our labels are one-hot encoded)
print(f"Single sample class name: {dog_names[tf.argmax(label_batch[0])]}")

Single image tensor:
[[[196.61607  174.61607  160.61607 ]
  [197.84822  175.84822  161.84822 ]
  [200.       178.       164.      ]
  ...
  [ 60.095097  79.75804   45.769207]
  [ 61.83293   71.22575   63.288315]
  [ 77.65755   83.65755   81.65755 ]]

 [[196.       174.       160.      ]
  [197.83876  175.83876  161.83876 ]
  [199.07945  177.07945  163.07945 ]
  ...
  [ 94.573715 110.55229   83.59694 ]
  [125.869865 135.26268  127.33472 ]
  [122.579605 128.5796   126.579605]]

 [[195.73691  173.73691  159.73691 ]
  [196.896    174.896    160.896   ]
  [199.       177.       163.      ]
  ...
  [ 26.679413  38.759026  20.500835]
  [ 24.372307  31.440136  26.675896]
  [ 20.214453  26.214453  24.214453]]

 ...

 [[ 61.57369   70.18976  104.72547 ]
  [189.91965  199.61607  213.28572 ]
  [247.26637  255.       252.70387 ]
  ...
  [113.40158   83.40158   57.40158 ]
  [110.75214   78.75214   53.752136]
  [107.37048   75.37048   50.370483]]

 [[ 61.27007   69.88614  104.42185 ]
  [188.93079  198.62721  212.29686 ]
  [246.33257  255.       251.77007 ]
  ...
  [110.88623   80.88623   54.88623 ]
  [102.763245  70.763245  45.763245]
  [ 99.457634  67.457634  42.457638]]

 [[ 60.25893   68.875    103.41071 ]
  [188.58261  198.27904  211.94868 ]
  [245.93112  254.6097   251.36862 ]
  ...
  [105.02222   75.02222   49.022217]
  [109.11186   77.11186   52.111866]
  [106.56936   74.56936   49.56936 ]]]

Single label tensor: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Single sample class name: schipperke

Woah!!

We've got a numerical representation of a dog image (in the form of red, green, blue pixel values)!

This is exactly the kind of format our model will want.

Can we do the reverse?

Instead of image -> numbers, can we go from numbers -> image?

You bet.

Visualizing images from our TensorFlow Dataset¶

Let's follow the data explorer's motto once again and visualize, visualize, visualize!

How about we turn our single sample from tensor format to image format?

We can do so by passing the single sample image tensor to matplotlib's plt.imshow() (we'll also need to convert its datatype from float32 to uint8 to avoid matplotlib colour range issues).

In [51]:

Copied!

plt.imshow(image_batch[0].numpy().astype("uint8")) # convert tensor to uint8 to avoid matplotlib colour range issues
plt.title(dog_names[tf.argmax(label_batch[0])])
plt.axis("off");
plt.imshow(image_batch[0].numpy().astype("uint8")) # convert tensor to uint8 to avoid matplotlib colour range issues
plt.title(dog_names[tf.argmax(label_batch[0])])
plt.axis("off");

How about we plot multiple images?

We can do so by first setting up a plot with multiple subplots.

And then we can iterate through our dataset with tf.data.Dataset.take(count=1) which will "take" 1 batch of data (in our case, one batch is 32 samples) which we can then index on for each subplot.

In [52]:

Copied!





# Create multiple subplots
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))

# Iterate through a single batch and plot images
for images, labels in train_ds.take(count=1): # note: because our training data is shuffled, each "take" will be different
  for i, ax in enumerate(axes.flat):
    ax.imshow(images[i].numpy().astype("uint8"))
    ax.set_title(dog_names[tf.argmax(labels[i])])
    ax.axis("off")
# Create multiple subplots
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))

# Iterate through a single batch and plot images
for images, labels in train_ds.take(count=1): # note: because our training data is shuffled, each "take" will be different
  for i, ax in enumerate(axes.flat):
    ax.imshow(images[i].numpy().astype("uint8"))
    ax.set_title(dog_names[tf.argmax(labels[i])])
    ax.axis("off")

Aren't those good looking dogs!

Getting labels from our TensorFlow Dataset¶

Since our data is now in tf.data.Dataset format, there are a couple of important attributes we can pull from it if necessary.

The first is the collection of filepaths asosciated with a tf.data.Dataset.

These are accessible by the .file_paths attribute.

Note: You can often a see a list of assosciated methods and attributes of a variable/class in Google Colab (or other IDEs) by pressing TAB afterwards (e.g type variable_name. + TAB).

In [53]:

Copied!

# Get the first 5 file paths of the training dataset
train_ds.file_paths[:5]
# Get the first 5 file paths of the training dataset
train_ds.file_paths[:5]

Out[53]:

['images_split/train/boston_bull/n02096585_1753.jpg',
 'images_split/train/kerry_blue_terrier/n02093859_855.jpg',
 'images_split/train/border_terrier/n02093754_2281.jpg',
 'images_split/train/rottweiler/n02106550_11823.jpg',
 'images_split/train/airedale/n02096051_5884.jpg']

We can also get the class names assosciated with a dataset using .class_names (TensorFlow has read these from the names of our target folders in the images_split directory).

In [54]:

Copied!

# Get the class names TensorFlow has read from the target directory
class_names = train_ds.class_names
class_names[:5]
# Get the class names TensorFlow has read from the target directory
class_names = train_ds.class_names
class_names[:5]

Out[54]:

['affenpinscher',
 'afghan_hound',
 'african_hunting_dog',
 'airedale',
 'american_staffordshire_terrier']

And we can make sure the class names are the same across our datasets by comparing them.

In [55]:

Copied!

assert set(train_10_percent_ds.class_names) == set(train_ds.class_names) == set(test_ds.class_names)
assert set(train_10_percent_ds.class_names) == set(train_ds.class_names) == set(test_ds.class_names)

Configuring our datasets for performance¶

There's one last step we're going to do before we build our first TensorFlow model.

And that's configure our datasets for performance.

More specifically, we're going to focus on following the TensorFlow guide for Better performance with the tf.data API.

Why?

Because data loading is one of the biggest bottlenecks in machine learning.

Modern GPUs can perform calculations (matrix multiplications) to find patterns in data quite quickly.

However, for the GPU to perform such calculations, the data needs to be there.

Good news for us is that if we follow the TensorFlow tf.data best practices, TensorFlow will take care of all these optimizations and hardware acceleration for us.

We're going to call three methods on our dataset to optimize it for performance:

cache() - Cache the elements in the dataset in memory or a target folder (speeds up loading.
shuffle() - Shuffle a set number of samples in preparation for loading (this will mean our samples and batches of samples will be shuffled), for example, setting shuffle(buffer_size=1000) will prepare and shuffle 1000 elements of data at a time.
prefetch() - Prefetch the next batch of data and prepare it for computation whilst the previous one is being computed on (can scale to multiple prefetches depending on hardware availability). TensorFlow can automatically configure how many elements/batches to prefetch by setting prefetch(buffer_size=tf.data.AUTOTUNE).

Resource: For more performance tips on loading dataset in TensorFlow, see the Datasets Performance tips guide.

In our case, let's start by calling cache() on our datasets to save the loaded samples to memory.

We'll then shuffle() the training splits with buffer_size=10*BATCH_SIZE for the training 10% split and buffer_size=100*BATCH_SIZE for the full training set.

Why these numbers?

That's how many I decided to use via experimentation, feel free to figure out a different number that may work better.

Ideally if your dataset isn't too large, you would shuffle all possible samples (TensorFlow has a method of finding the number of samples in a dataset called tf.data.Dataset.cardinality()).

We won't call shuffle() on the testing dataset since it isn't required.

And we'll call prefetch(buffer_size=tf.data.AUTOTUNE) on each of our datasets to automatically load and prepare a number of data batches.

In [56]:

Copied!





AUTOTUNE = tf.data.AUTOTUNE # let TensorFlow find the best values to use automatically

# Shuffle and optimize performance on training datasets
# Note: these methods can be chained together and will have the same effect as calling them individually
train_10_percent_ds = train_10_percent_ds.cache().shuffle(buffer_size=10*BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
train_ds = train_ds.cache().shuffle(buffer_size=100*BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)

# Don't need to shuffle test datasets (for easier evaluation)
test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)
AUTOTUNE = tf.data.AUTOTUNE # let TensorFlow find the best values to use automatically

# Shuffle and optimize performance on training datasets
# Note: these methods can be chained together and will have the same effect as calling them individually
train_10_percent_ds = train_10_percent_ds.cache().shuffle(buffer_size=10*BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
train_ds = train_ds.cache().shuffle(buffer_size=100*BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)

# Don't need to shuffle test datasets (for easier evaluation)
test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)

Dataset performance optimized!

We spent some extra time here because datasets are so important to machine learning and deep learning workflows, wherever you can make them faster, you should.

Time to create our first neural network with TensorFlow!

6. Creating a neural network with TensorFlow¶

We've spent lots of time preparing the data.

This is because it's often the largest part of a machine learning problem, getting your data ready for a machine learning model.

Thanks to modern frameworks like TensorFlow, when you've got your data in order, building a deep learning model to find patterns in your data can be one of the easier steps of the process.

When you hear people talk about deep learning, they're often referring to neural networks.

Neural networks are one of the most flexible machine learning models there is.

You can create a neural network to fit almost any kind of data.

The "deep" in deep learning refers to the many layers that can be contained inside a neural network.

A neural network often follows the structure of:

Input layer -> Middle layer(s) -> Output layer.

General anatomy of a neural network. Neural networks are almost infinitely customisable. The main premise is that data goes in one end, gets manipulated by many small functions in an attempt to learn patterns/weights which represent the data to produce useful outputs. Note that "patterns" is an arbitrary term, you’ll often hear "embedding", "weights", "feature representation", "representation" all referring to similar things.

Where the input layer takes in the data, the middle layer(s) perform calculations on the data and (hopefully) learn patterns (also called weights/biases) to represent the data and the output layer performs a final transformation on the learned patterns to make them usable in human applications.

What goes into the middle layer(s)?

That's an excellent question.

Because there are so many different options.

But two of the most popular modern kinds of neural network are Convolutional Neural Networks (CNNs) and Transformers (the Transformer is the "T" in GPT, Generative Pretrained Transformer).

Architecture	Description	Example Layers	Problem Examples
Transformer)	A combination of fully connected layers as well as attention-based layers.	`tf.keras.layers.Attention`, `tf.keras.layers.Dense`	NLP, Machine Translation, Computer Vision
Convolutional Neural Network	A combination of fully connected layers as well as convolutional-based layers.	`tf.keras.layers.Conv2D`, `tf.keras.layers.Dense`	Computer Vision, Audio Processing

Because our problem is in the computer space, we're going to use a CNN.

And instead of crafting our own CNN from scratch, we're going to take an existing CNN model and apply it to our own problem, harnessing the wonderful superpower of transfer learning.

Note: You can build and use working neural networks with TensorFlow without knowing the intricate details that's going on the behind the scenes (that's what we're focused on). For an idea of the mathematical operations that make neural networks work, I'd recommend going through 3Blue1Brown's YouTube series on Neural Networks.

The magic of transfer learning¶

Transfer learning is the process of getting an existing working model and adjusting it to your own problem.

This works particularly well for neural networks.

The main benefit of transfer learning is being able to get better results in less time with less data.

How?

An existing model may have the following features:

Trained on lots of data (in the case of computer vision, existing models are often pretrained on ImageNet, a dataset of 1M+ images, this means they've already learned patterns across many different kinds of images).
Crafted by expert researchers (large universities and companies such as Google and Meta often open-source their best models for others to try and use).
Trained of lots of computing hardware (the larger the model and the larger the dataset, the more compute power you need, not everyone has access to 10s, 100s or 1000s of GPUs).
Proven to perform well on a given task through several studies (this means it has a good chance on performing well on your task if it's similar).

You may be thinking, ok so, this all sounds incredible, where can I get pretrained models?

And the good news is, there are plenty of places to find pretrained models!

Resource	Description
`tf.keras.applications`	A module built-in to TensorFlow and Keras with a series of pretrained models ready to use.
KerasNLP and KerasCV	Two dedicated libraries for NLP (natural language processing) and CV (computer vision) each of which includes many modality-specific APIs and is capable of running with TensorFlow, JAX or PyTorch.
Hugging Face Models Hub	A large collection of pretrained models on a wide range on tasks, from computer vision to natural language processing to audio processing.
Kaggle Models	A huge collection of different pretrained models for many different tasks.

Different locations to find pretrained models. This list is consistantly expanding as machine learning becomes more and more open-source.

Note: For most new machine learning problems, if you're looking to get good results quickly, you should generally look for a pretrained model similar to your problem and use transfer learning to adapt it to your own domain.

Since we're focused on TensorFlow/Keras, we're going to be using a pretrained model from tf.keras.applications.

More specifically, we're going to take the tf.keras.applications.efficientnet_v2.EfficientNetV2B0() model from the 2021 machine learning paper EfficientNetV2: Smaller Models and Faster Training from Google Research and apply it to our own problem.

This model has been trained on ImageNet1k (1M+ images across 1000 different diverse classes, there is a version called ImageNet22k with 14M+ images across 22,000 categories) so it has a good baseline understanding of patterns in images across a wide domain.

We'll see if we can adjust those patterns slightly to our dog images.

Let's create an instance of it and call it base_model (I'll explain why next).

In [57]:

Copied!





# Create the input shape to our model
INPUT_SHAPE = (*IMG_SIZE, 3)

base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=True, # do want to include the top layer? (ImageNet has 1000 classes, so the top layer is formulated for this, we want to create our own top layer)
    include_preprocessing=True, # do we want the network to preprocess our data into the right format for us? (yes)
    weights="imagenet", # do we want the network to come with pretrained weights? (yes)
    input_shape=INPUT_SHAPE # what is the input shape of our data we're going to pass to the network? (224, 224, 3) -> (height, width, colour_channels)
)
# Create the input shape to our model
INPUT_SHAPE = (*IMG_SIZE, 3)

base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=True, # do want to include the top layer? (ImageNet has 1000 classes, so the top layer is formulated for this, we want to create our own top layer)
    include_preprocessing=True, # do we want the network to preprocess our data into the right format for us? (yes)
    weights="imagenet", # do we want the network to come with pretrained weights? (yes)
    input_shape=INPUT_SHAPE # what is the input shape of our data we're going to pass to the network? (224, 224, 3) -> (height, width, colour_channels)
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/efficientnet_v2/efficientnetv2-b0.h5
29403144/29403144 [==============================] - 0s 0us/step

Base model created!

We can find out information about our base model by calling base_model.summary().

In [58]:

Copied!

# Note: Uncomment to see full output
# base_model.summary()
# Note: Uncomment to see full output
# base_model.summary()

Truncated output of base_model.summary():

Woah! Look at all those layers... this is what the "deep" in deep learning means! A deep number of layers.

How about we count the number of layers?

In [59]:

Copied!

# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")
# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")

Number of layers in base_model: 273

273 layers!

Wow, there's a lot going on.

Rather than step through each layer and explain what's happening in each layer, I'll leave that for the curious mind to research on their own.

Just know that when starting out deep learning you don't need to know what's happening every layer in a model to be able to use a model.

For now, let's pay attention to a few things:

The input layer (the first layer) input shape, this will tell us the shape of the data the model expects as input.
The output layer (the last layer) output shape, this will tell us the shape of the data the model will output.
The number of parameters of the model, these are "learnable" numbers (also called weights) that a model will use to derive patterns out of and represent the data. Generally, the more parameters a model has, the more learning capacity it has.
The number of layers a model has. Generally, the more layers a model has, the more learning capacity it has (each layer will learn progressively deeper patterns from the data). However, this caps out at a certain range.

Let's step through each of these.

Model input and output shapes¶

One of the most important practical steps in using a deep learning model is input and output shapes.

Two questions to ask:

What is the shape of my input data?
What is the ideal shape of my output data?

We ask about shapes because in all deep learning models input and output data comes in the form of tensors.

This goes for text, audio, images and more.

The raw data gets converted to a numerical representation first before being passed to a model.

In our case, our input data has the shape of [(32, 224, 224, 3)] or [(batch_size, height, width, colour_channels)].

And our ideal output shape will be [(32, 120)] or [(batch_size, number_of_dog_classes).

Your input and output shapes will differ depending on the problem and data you're working with.

But as you get deeper into the world of machine learning (and deep learning), you'll find input and output shapes are one of the most common errors.

We can check our model's input and output shapes with the .input_shape and .output_shape attributes.

In [60]:

Copied!

# Check the input shape of our model
base_model.input_shape
# Check the input shape of our model
base_model.input_shape

Out[60]:

(None, 224, 224, 3)

Nice! Looks like our model's input shape is where we want it (remember None in this case is equivalent to a wild card dimension, meaning it could be any value, but we've set ours to 32).

This is because the model we chose, tf.keras.applications.efficientnet_v2.EfficientNetV2B0, has been trained on images the same size as our images.

If our model had a different input shape, we'd have to make sure we processed our images to be the same shape.

Now let's check the output shape.

In [61]:

Copied!

# Check the model's output shape
base_model.output_shape
# Check the model's output shape
base_model.output_shape

Out[61]:

(None, 1000)

Hmm, is this what we're after?

Since we have 120 dog classes, we'd like an output shape of (None, 120).

Why is it by default (None, 1000)?

This is because the model has been trained already on ImageNet, a dataset of 1,000,000+ images with 1000 classes (hence the 1000 in the output shape).

How can we change this?

Let's recreate a base_model instance, except this time we'll change the classes parameter to 120.

In [62]:

Copied!





# Create a base model with 120 output classes
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=True,
    include_preprocessing=True,
    weights="imagenet",
    input_shape=INPUT_SHAPE,
    classes=len(dog_names)
)

base_model.output_shape
# Create a base model with 120 output classes
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=True,
    include_preprocessing=True,
    weights="imagenet",
    input_shape=INPUT_SHAPE,
    classes=len(dog_names)
)

base_model.output_shape

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-5e9b29e6f858> in <cell line: 2>()
      1 # Create a base model with 120 output classes
----> 2 base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
      3     include_top=True,
      4     include_preprocessing=True,
      5     weights="imagenet",

/usr/local/lib/python3.10/dist-packages/keras/src/applications/efficientnet_v2.py in EfficientNetV2B0(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation, include_preprocessing)
   1128     include_preprocessing=True,
   1129 ):
-> 1130     return EfficientNetV2(
   1131         width_coefficient=1.0,
   1132         depth_coefficient=1.0,

/usr/local/lib/python3.10/dist-packages/keras/src/applications/efficientnet_v2.py in EfficientNetV2(width_coefficient, depth_coefficient, default_size, dropout_rate, drop_connect_rate, depth_divisor, min_depth, bn_momentum, activation, blocks_args, model_name, include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation, include_preprocessing)
    932 
    933     if weights == "imagenet" and include_top and classes != 1000:
--> 934         raise ValueError(
    935             "If using `weights` as `'imagenet'` with `include_top`"
    936             " as true, `classes` should be 1000"

ValueError: If using `weights` as `'imagenet'` with `include_top` as true, `classes` should be 1000Received: classes=120

Oh dam!

We get an error:

ValueError: If using weights as 'imagenet' with include_top as true, classes should be 1000 Received: classes=120

What this is saying is that if we want to using the pretrained 'imagenet' weights (which we do to leverage the visual patterns/features a model has already learned on ImageNet, we need to change the parameters to the base_model.

What we're going to do is create our own top layers.

We can do this by setting include_top=False.

What this means is we'll use most of the model's existing layers to extract features and patterns out of our images and then customize the final few layers to our own problem.

This kind of transfer learning is often called feature extraction.

A setup where you use an existing models pretrained weights to extract features (or patterns) from your own custom data.

You can then used those extracted features and further tailor them to your own use case.

Let's create an instance of base_model without a top layer.

In [63]:

Copied!





# Create a base model with no top
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False, # don't include the top layer (we want to make our own top layer)
    include_preprocessing=True,
    weights="imagenet",
    input_shape=INPUT_SHAPE,
)

# Check the output shape
base_model.output_shape
# Create a base model with no top
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False, # don't include the top layer (we want to make our own top layer)
    include_preprocessing=True,
    weights="imagenet",
    input_shape=INPUT_SHAPE,
)

# Check the output shape
base_model.output_shape

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/efficientnet_v2/efficientnetv2-b0_notop.h5
24274472/24274472 [==============================] - 0s 0us/step

Out[63]:

(None, 7, 7, 1280)

Hmm, what's this output shape?

This still isn't what we want (we're after (None, 120) for our number of dog classes).

How about we check the number of layers again?

In [64]:

Copied!

# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")
# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")

Number of layers in base_model: 270

Looks like our new base_model has less layers than our previous one.

This is because we used include_top=False.

This means we've still got 270 base layers to extract features and patterns from our images, however, it also means we get to customize the output layers to our liking.

We'll come back to this shortly.

Model parameters¶

In traditional programming, you write a list of rules for inputs to go in, get manipulated in some predefined way and then outputs come out.

However, as we've discussed, machine learning switches the order.

Inputs and ideal outputs go in (for example, dog images and their corresponding labels) and rules come out.

A model's parameters are the learned rules.

And learned is the important point.

In an ideal setup, we never tell the model what parameters to learn, it learns them itself by connecting input data to labels in supervised learning and by grouping together similar samples in unsupervised learning.

Note: Parameters are values learned by a model where as hyperpameters (e.g. batch size) are values set by a human.

Parameters also get referred to as "weights" or "patterns" or "learned features" or "learned representations".

Generally, the more parameters a model has, the more capacity it has to learn.

Each layer in a deep learning model will have a specific number of parameters (these vary depending on which layer you use).

The benefit of using a preconstructed model and transfer learning is that someone else has done the hard work in finding what combination of layers leads to a good set of parameters (a big thank you to these wonderful people).

We can count the number of parameters in a model/layer via the the .count_params() method.

In [65]:

Copied!

# Check the number of parameters in our model
base_model.count_params()
# Check the number of parameters in our model
base_model.count_params()

Out[65]:

Holy smokes!

Our model has 5,919,312 parameters!

That means each time an image goes through our model, it will be influenced in some small way by 5,919,312 numbers.

Each one of these is a potential learning opportunity (except for parameters that are non-trainable but we'll get to that soon too).

Now, you may be thinking, 5 million+ parameters sounds like a lot.

And it is.

However, many modern large scale models, such as GPT-3 (175B) and GPT-4 (200B+? the actual number of parameters was never released) deal in the billions of parameters (note: this is written in 2024, so if you're reading this in future, parameter counts may be in the trillions).

Generally, more parameters leads to better models.

However, there are always tradeoffs.

More parameters means more compute power to run the models.

In practice, if you have limited compute power (e.g. a single GPU on Google Colab), it's best to start with smaller models and gradually increase the size when necessary.

We can get the trainable and non-trainable parameters from our model with the trainable_weights and non_trainable_weights attributes (remember, parameters are also referred to as weights).

Note: Trainable weights are parameters of the model which are updated by backpropagation during training (they are changed to better match the data) where as non-trainable weights are parameters of the model which are not updated by backpropagation during training (they are fixed in place).

Let's write a function to count the trainable, non-trainable and trainable parameters of a model.

In [66]:

Copied!





import numpy as np

def count_parameters(model, print_output=True):
  """
  Counts the number of trainable, non-trainable and total parameters of a given model.
  """
  trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.trainable_weights])
  non_trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.non_trainable_weights])
  total_parameters = trainable_parameters + non_trainable_parameters
  if print_output:
    print(f"Model {model.name} parameter counts:")
    print(f"Total parameters: {total_parameters}")
    print(f"Trainable parameters: {trainable_parameters}")
    print(f"Non-trainable parameters: {non_trainable_parameters}")
  else:
    return total_parameters, trainable_parameters, non_trainable_parameters

count_parameters(model=base_model, print_output=True)
import numpy as np

def count_parameters(model, print_output=True):
  """
  Counts the number of trainable, non-trainable and total parameters of a given model.
  """
  trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.trainable_weights])
  non_trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.non_trainable_weights])
  total_parameters = trainable_parameters + non_trainable_parameters
  if print_output:
    print(f"Model {model.name} parameter counts:")
    print(f"Total parameters: {total_parameters}")
    print(f"Trainable parameters: {trainable_parameters}")
    print(f"Non-trainable parameters: {non_trainable_parameters}")
  else:
    return total_parameters, trainable_parameters, non_trainable_parameters

count_parameters(model=base_model, print_output=True)

Model efficientnetv2-b0 parameter counts:
Total parameters: 5919312
Trainable parameters: 5858704
Non-trainable parameters: 60608

Nice! It looks like our function worked.

Most of our model's parameters are trainable.

This means they will be tweaked as they see more images of dogs.

However, a standard practice in transfer learning is to freeze the base layers of a model and only train the custom top layers to suit your problem.

Example of how we can take a pretrained model and customize it to our own use case. This kind of transfer learning workflow is often referred to as a feature extracting workflow as the base layers are frozen (not changed during training) and only the top layers are trained. Note: In this image the EfficientNetB0 architecture is being demonstrated, however we're going to be using the EfficientNetV2B0 architecture which is slightly different. I've used the older architecture image from the research paper as a newer one wasn't available.

In other words, keep the patterns an existing model has learned on a similar problem (if they're good) to form a base representation of an input sample and then manipulate that base representation to suit our needs.

Why do this?

It's faster.

The less trainable parameters, the faster your model training will be, the faster your experiments will be.

But how will we know this works?

We're going to run experiments to test it.

Okay, so how do we freeze the parameters of our base_model?

We can set its .trainable attribute to False.

In [67]:

Copied!

# Freeze the base model
base_model.trainable = False
base_model.trainable
# Freeze the base model
base_model.trainable = False
base_model.trainable

Out[67]:

False

base_model frozen!

Now let's check the number of trainable and non-trainable parameters.

In [68]:

Copied!

count_parameters(model=base_model, print_output=True)
count_parameters(model=base_model, print_output=True)

Model efficientnetv2-b0 parameter counts:
Total parameters: 5919312.0
Trainable parameters: 0.0
Non-trainable parameters: 5919312

Beautiful!

Looks like all of the parameters in our base_model are now non-trainable (frozen).

This means they won't be updated during training.

Passing data through our model¶

We've spoken a couple of times how our base_model is a "feature extractor" or "pattern extractor".

But what does this mean?

It means that when a data sample goes through the base_model, its numbers get manipulated into a compressed set of features.

In other words, the layers of the model will each perform a calculation on the sample eventually leading to an output tensor with patterns the model has deemed most important.

This is often referred to a compressed feature space.

That's one of the central ideas of deep learning.

Take a large input (e.g. an image tensor of shape [224, 224, 3]) and compress it into a smaller output (e.g. a feature vector#Feature_vectors) of shape [1280]) that captures a useful representation of the input.

Example of how a model can take an input piece of data and compress its representation into a feature vector with much lower dimensionality than the original data.

Note: A feature vector is also referred to as an embedding, a compressed representation of a data sample that makes it useful. The concept of embeddings is not limited to images either, the concept of embeddings stretches across all data types (text, images, video, audio + more).

We can see this in action by passing a single image through our base_model.

In [69]:

Copied!

# Extract features from a single image using our base model
feature_extraction = base_model(image_batch[0])
feature_extraction
# Extract features from a single image using our base model
feature_extraction = base_model(image_batch[0])
feature_extraction

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-69-957d897dc1dc> in <cell line: 2>()
      1 # Extract features from a single image using our base model
----> 2 feature_extraction = base_model(image_batch[0])
      3 feature_extraction

/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name)
    296                 if spec_dim is not None and dim is not None:
    297                     if spec_dim != dim:
--> 298                         raise ValueError(
    299                             f'Input {input_index} of layer "{layer_name}" is '
    300                             "incompatible with the layer: "

ValueError: Input 0 of layer "efficientnetv2-b0" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(224, 224, 3)

Oh no!

Another error...

ValueError: Input 0 of layer "efficientnetv2-b0" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(224, 224, 3)

We've stumbled upon one of the most common errors in machine learning, shape errors.

In our case, the shape of the data we're trying to put into the model doesn't match the input shape the model is expecting.

Our input data shape is (224, 224, 3) ((height, width, colour_channels)), however, our model is expecting (None, 224, 224, 3) ((batch_size, height, width, colour_channels)).

We can fix this error by adding a singluar batch_size dimension to our input and thus make it (1, 224, 224, 3) (a batch_size of 1 for a single sample).

To do so, we can use the tf.expand_dims(input=target_sample, axis=0) where target_sample is our input tensor and axis=0 means we want to expand the first dimension.

In [70]:

Copied!





# Current image shape
shape_of_image_without_batch = image_batch[0].shape

# Add a batch dimension to our single image
shape_of_image_with_batch = tf.expand_dims(input=image_batch[0], axis=0).shape

print(f"Shape of image without batch: {shape_of_image_without_batch}")
print(f"Shape of image with batch: {shape_of_image_with_batch}")
# Current image shape
shape_of_image_without_batch = image_batch[0].shape

# Add a batch dimension to our single image
shape_of_image_with_batch = tf.expand_dims(input=image_batch[0], axis=0).shape

print(f"Shape of image without batch: {shape_of_image_without_batch}")
print(f"Shape of image with batch: {shape_of_image_with_batch}")

Shape of image without batch: (224, 224, 3)
Shape of image with batch: (1, 224, 224, 3)

Perfect!

Now let's pass this image with a batch dimension to our base_model.

In [71]:

Copied!

# Extract features from a single image using our base model
feature_extraction = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_extraction
# Extract features from a single image using our base model
feature_extraction = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_extraction

Out[71]:

<tf.Tensor: shape=(1, 7, 7, 1280), dtype=float32, numpy=
array([[[[-2.19177201e-01, -3.44185606e-02, -1.40321642e-01, ...,
          -1.44454449e-01, -2.73809850e-01, -7.41252452e-02],
         [-8.69670734e-02, -6.48750067e-02, -2.14546964e-01, ...,
          -4.57209721e-02, -2.77900100e-01, -8.20885971e-02],
         [-2.76872963e-01, -8.26781020e-02, -3.85153107e-02, ...,
          -2.72128999e-01, -2.52802134e-01, -2.28105962e-01],
         ...,
         [-1.01604000e-01, -3.55145968e-02, -2.23027021e-01, ...,
          -2.26227805e-01, -8.61771777e-02, -1.60450727e-01],
         [-5.87608740e-02, -4.65543661e-03, -1.06193267e-01, ...,
          -2.87548676e-02, -9.06914026e-02, -1.82624385e-01],
         [-6.27618432e-02, -1.38620799e-03,  1.52704502e-02, ...,
          -7.85450079e-03, -1.84584558e-01, -2.62404829e-01]],

        [[-2.17334151e-01, -1.10280879e-01, -2.74605274e-01, ...,
          -2.22405165e-01, -2.74738282e-01, -1.01998925e-01],
         [-1.40700653e-01, -1.66820198e-01, -2.77449101e-01, ...,
           2.40375683e-01, -2.77627349e-01, -9.07808691e-02],
         [-2.40916476e-01, -2.00582087e-01, -2.38370374e-01, ...,
          -8.27576742e-02, -2.78428614e-01, -1.23056054e-01],
         ...,
         [-2.67296195e-01, -5.43131726e-03, -6.44061863e-02, ...,
          -3.34720500e-02, -1.55141622e-01, -3.23073938e-02],
         [-2.66513556e-01, -2.09966358e-02, -1.50375053e-01, ...,
          -6.29274473e-02, -2.69798309e-01, -2.74081439e-01],
         [-8.39830115e-02, -1.58605091e-02, -2.78447241e-01, ...,
          -1.43555822e-02, -2.77474761e-01,  1.37483165e-01]],

        [[-2.15840712e-01,  4.50323820e-01, -7.51058161e-02, ...,
          -2.43637279e-01, -2.75048614e-01, -6.00421876e-02],
         [-2.39066556e-01, -2.25066260e-01, -4.89832312e-02, ...,
          -2.77957618e-01, -1.14677951e-01, -2.69968715e-02],
         [-1.60943881e-01, -2.12972730e-01, -1.08622171e-01, ...,
          -2.78464079e-01, -1.95970193e-01, -2.92074662e-02],
         ...,
         [-2.67642140e-01, -7.13412274e-10, -2.47387841e-01, ...,
          -1.27752789e-03,  1.69062471e+00, -1.07747754e-02],
         [-2.69456387e-01, -3.02123808e-05, -2.19904676e-01, ...,
          -1.19841937e-02,  6.54936790e-01,  4.92877871e-01],
         [-1.83339473e-02, -9.84105989e-02, -2.77752399e-01, ...,
          -9.53171253e-02, -2.76987553e-01, -1.81873620e-01]],

        ...,

        [[-6.59235120e-02, -1.64803467e-03, -1.58951283e-01, ...,
          -1.34164095e-01, -6.30896613e-02, -7.77927637e-02],
         [-1.83377475e-01, -4.98497509e-04, -1.57654762e-01, ...,
          -4.48885784e-02, -1.06884383e-01, -2.78372377e-01],
         [-2.45749369e-01, -9.95399058e-03, -1.79216102e-01, ...,
          -1.02837617e-02, -1.84168354e-01, -1.70697242e-01],
         ...,
         [ 2.22050592e-01, -2.04384560e-04, -1.46467671e-01, ...,
          -2.65387502e-02, -1.85434178e-01, -9.71652716e-02],
         [ 1.52228832e+00, -3.39617883e-03, -3.22414264e-02, ...,
          -1.19287046e-02, -1.46435276e-01, -8.73169452e-02],
         [-1.89164400e-01, -5.49114570e-02, -2.05218419e-01, ...,
          -1.32163316e-01, -1.48950770e-01, -1.18042991e-01]],

        [[-2.16520607e-01, -7.84920622e-03, -1.43650264e-01, ...,
          -1.73660204e-01, -4.83706780e-02, -3.76228467e-02],
         [-2.78293848e-01, -6.24539470e-03, -2.28590608e-01, ...,
          -2.06465453e-01, -1.93291768e-01, -9.23046917e-02],
         [-2.40500003e-01, -2.73558766e-01, -1.58736348e-01, ...,
          -4.13209312e-02, -2.64240265e-01, -3.26484852e-02],
         ...,
         [-2.31358394e-01, -2.72292078e-01, -6.80670887e-02, ...,
          -2.16453914e-02, -2.71368980e-01, -3.88960652e-02],
         [-2.45319903e-01, -2.78179497e-01, -6.18890636e-02, ...,
          -1.86282583e-02, -2.23804727e-01, -2.72233319e-02],
         [-2.31111392e-01, -2.37449735e-01, -5.13911694e-02, ...,
          -4.55225781e-02, -2.74753064e-01, -3.51530202e-02]],

        [[-3.96142267e-02, -1.39998682e-02, -9.56050456e-02, ...,
          -2.33392462e-01, -1.83407709e-01, -4.99856956e-02],
         [-2.60713607e-01, -3.96164991e-02, -1.29626304e-01, ...,
          -2.78417081e-01, -2.78285533e-01, -7.70441368e-02],
         [-8.02241415e-02, -2.30456606e-01, -1.13508031e-01, ...,
          -5.45607917e-02, -2.71063268e-01, -2.75666509e-02],
         ...,
         [-9.41052362e-02, -2.42691532e-01, -5.48249595e-02, ...,
          -2.13044193e-02, -2.63691694e-01, -9.28506851e-02],
         [-9.08804908e-02, -2.40457997e-01, -7.88932368e-02, ...,
          -3.80579121e-02, -2.71065891e-01, -4.05692160e-02],
         [-1.26358300e-01, -2.17053503e-01, -7.44825602e-02, ...,
          -5.66985942e-02, -2.75216103e-01, -6.91162944e-02]]]],
      dtype=float32)>

Woah! Look at all those numbers!

After passing through ~270 layers, this is the numerical representation our model has created of our input image.

You might be thinking, okay, there's a lot here, how can I possibly understand all of them?

Well, with enough effort, you might.

However, these numbers are more for a model/computer to understand than for a human to understand.

Let's not stop there, let's check the shape of our feature_extraction.

In [72]:

Copied!

# Check shape of feature extraction
feature_extraction.shape
# Check shape of feature extraction
feature_extraction.shape

Out[72]:

TensorShape([1, 7, 7, 1280])

Ok, looks like our model has compressed our input image into a lower dimensional feature space.

Note: Feature space (or latent space or embedding space) is a numerical region where pieces of data are represented by tensors of various dimensions. Feature space is hard for humans to imagine because it could be 1000s of dimensions (humans are only good at imagining 3-4 dimensions at max). But you can think of feature space as an area where numerical representations of similar items will be close to together. If feature space was a grocery store, one breed of dogs may be in one aisle (similar numbers) where as another breed of dogs may be in the next aisle. You can see an example of a large embedding space representation of 8M Stack Overflow questions on Nomic Atlas.

Let's compare the new shape to the input shape.

In [73]:

Copied!

num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280

# Calculate the compression ratio
num_input_features / feature_extraction_features
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280

# Calculate the compression ratio
num_input_features / feature_extraction_features

Out[73]:

2.4

Looks like our model has compressed the numerical representation of our input image by 2.4x so far.

But you might've noticed our feature_extraction is still a tensor.

How about we take it further and turn it into a vector and compress the representation even further?

We can do so by taking our feature_extraction tensor and pooling together the inner dimensions.

By pooling, I mean taking the average or the maximum values.

Why?

Because a neural network often outputs a large amount of learned feature values but many of them can be insignificant compared to others.

So taking the average or the max across them helps us compress the representation further while stil preserving the most important features.

This process is often referred to as:

Average pooling - Take the average across given dimensions of a tensor, can perform with tf.keras.layers.GlobalAveragePooling2D().
Max pooling - Take the maximum value across given dimensions of a tensor, can perform with tf.keras.layers.MaxPooling2D().

Let's try apply average pooling to our feature extraction and see what happens.

In [74]:

Copied!

# Turn feature extraction into a feature vector
feature_vector = tf.keras.layers.GlobalAveragePooling2D()(feature_extraction) # pass feature_extraction to the pooling layer
feature_vector
# Turn feature extraction into a feature vector
feature_vector = tf.keras.layers.GlobalAveragePooling2D()(feature_extraction) # pass feature_extraction to the pooling layer
feature_vector

Out[74]:

<tf.Tensor: shape=(1, 1280), dtype=float32, numpy=
array([[-0.11521906, -0.04476562, -0.12476546, ..., -0.09118073,
        -0.08420841, -0.07769417]], dtype=float32)>

Ho, ho!

Looks like we've compressed our feature_extraction tensor into a feature vector (notice the new shape of (1, 1280)).

Now if you're not sure what all these numbers mean, that's okay. I don't either.

A feature vector (also called an embedding) is supposed to be a numerical representation that's meaningful to computers.

We'll perform a few more transforms on it before it's recognizable to us.

Let's check out its shape.

In [75]:

Copied!

# Check out the feature vector shape
feature_vector.shape
# Check out the feature vector shape
feature_vector.shape

Out[75]:

TensorShape([1, 1280])

We've reduced the shape of feature_extraction from (1, 7, 7, 1280) to (1, 1280) (we've gone from a tensor with multiple dimensions to a vector with one dimension of size 1280).

Our neural network has performed calculations on our image and it is now represented by 1280 numbers.

This is one of the main goals of deep learning, to reduce higher dimensional information into a lower dimensional but still representative space.

Let's calculate how much we've reduced the dimensionality of our single input image.

In [76]:

Copied!





# Compare the reduction
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280
feature_vector_features = 1*1280

print(f"Input -> feature extraction reduction factor: {num_input_features / feature_extraction_features}")
print(f"Feature extraction -> feature vector reduction factor: {feature_extraction_features / feature_vector_features}")
print(f"Input -> feature extraction -> feature vector reduction factor: {num_input_features / feature_vector_features}")
# Compare the reduction
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280
feature_vector_features = 1*1280

print(f"Input -> feature extraction reduction factor: {num_input_features / feature_extraction_features}")
print(f"Feature extraction -> feature vector reduction factor: {feature_extraction_features / feature_vector_features}")
print(f"Input -> feature extraction -> feature vector reduction factor: {num_input_features / feature_vector_features}")

Input -> feature extraction reduction factor: 2.4
Feature extraction -> feature vector reduction factor: 49.0
Input -> feature extraction -> feature vector reduction factor: 117.6

A 117.6x reduction from our original image to its feature vector representation!

Why compress the representation like this?

Because representing our data in a compressed format but still with meaningful numbers (to a computer) means that less computation is required to reuse the patterns.

For example, imagine you have to relearn how to spell words every time you use them.

Would this be efficient?

Not at all.

Instead, you take a while to learn them at the start and then continually reuse this knowledge over time.

This is the same with a deep learning model.

It learns representative patterns in data, figures out the ideal connections between inputs and outputs and then reuses them over time in the form of numerical weights.

Going from image to feature vector (practice)¶

We've covered a fair bit in the past few sections.

So let's practice.

The important takeaway is that one of the main goals of deep learning is to create a model that is able to take some kind of high dimensional data (e.g. an image tensor, a text tensor, an audio tensor) and extract meaningful patterns in it whilst compressing it to a lower dimensional form (e.g. a feature vector or embedding).

We can then use this lower dimensional form for our specific use cases.

And one of the most powerful ways to do this is with transfer learning.

Taking an existing model from a similar domain to yours and applying it to your own problem.

To practice turning a data sample into a feature vector, let's start by recreating a base_model instance.

This time, we can add in a pooling layer automatically using pooling="avg" or pooling="max".

Note: I demonstrated the use of the tf.keras.layers.GlobalAveragePooling2D() layer because not all pretrained models have the functionality of a pooling layer being built-in.

In [77]:

Copied!





# Create a base model with no top and a pooling layer built-in
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False,
    weights="imagenet",
    input_shape=INPUT_SHAPE,
    pooling="avg", # can also use "max"
    include_preprocessing=True,
)

# Check the summary (optional)
# base_model.summary()

# Check the output shape
base_model.output_shape
# Create a base model with no top and a pooling layer built-in
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False,
    weights="imagenet",
    input_shape=INPUT_SHAPE,
    pooling="avg", # can also use "max"
    include_preprocessing=True,
)

# Check the summary (optional)
# base_model.summary()

# Check the output shape
base_model.output_shape

Out[77]:

(None, 1280)

Boom!

We get the same output shape from the base_model as we did when using it with a pooling layer thanks to using pooling="avg".

Let's now freeze these base weights, so they're not trainable.

In [78]:

Copied!

# Freeze the base weights
base_model.trainable = False

# Count the parameters
count_parameters(model=base_model, print_output=True)
# Freeze the base weights
base_model.trainable = False

# Count the parameters
count_parameters(model=base_model, print_output=True)

Model efficientnetv2-b0 parameter counts:
Total parameters: 5919312.0
Trainable parameters: 0.0
Non-trainable parameters: 5919312

And now we can pass an image through our base model and get a feature vector from it.

In [79]:

Copied!

# Get a feature vector of a single image (don't forget to add a batch dimension)
feature_vector_2 = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_vector_2
# Get a feature vector of a single image (don't forget to add a batch dimension)
feature_vector_2 = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_vector_2

Out[79]:

<tf.Tensor: shape=(1, 1280), dtype=float32, numpy=
array([[-0.11521906, -0.04476562, -0.12476546, ..., -0.09118073,
        -0.08420841, -0.07769417]], dtype=float32)>

Wonderful!

Now is this the same as our original feature_vector?

We can find out by comparing feature_vector and feature_vector_2 and seeing if all of the values are the same with np.all().

In [80]:

Copied!

# Compare the two feature vectors
np.all(feature_vector == feature_vector_2)
# Compare the two feature vectors
np.all(feature_vector == feature_vector_2)

Out[80]:

True

Perfect!

Let's put it all together and create a full model for our dog vision problem.

Creating a custom model for our dog vision problem¶

The main steps when creating any kind of deep learning model from scratch are:

Define the input layer(s).
Define the middle layer(s).
Define the output layer(s).

These sound broad because they are. Deep learning models are almost infinitely customizable.

Good news is, thanks to transfer learning, all of our middle layers are defined by base_model (you could argue the input layer is created too).

So now it's up to us to define our input and output layers.

TensorFlow/Keras have two main ways of connecting layers to form a model.

The Sequential model (tf.keras.Sequential) - Useful for making simple models with one tensor in and one tensor out, not suited for complex models.
The Functional API - Useful for making more complex and multi-step models but can also be used for simple models.

Let's start with the Sequential model.

It takes a list of layers and will pass data through them sequentially.

Our base_model will be the input and middle layers and we'll use a tf.keras.layers.Dense() layer as the output (we'll discuss this shortly).

Creating a model with the Sequential API¶

The Sequential API is the most straightforward way to create a model.

Your model comes in the form of a list of layers from input to middle layers to output.

Each layer is executed sequentially.

In [81]:

Copied!





# Create a sequential model
tf.random.set_seed(42)
sequential_model = tf.keras.Sequential([base_model, # input and middle layers
                                        tf.keras.layers.Dense(units=len(dog_names), # output layer
                                                              activation="softmax")])
sequential_model.summary()
# Create a sequential model
tf.random.set_seed(42)
sequential_model = tf.keras.Sequential([base_model, # input and middle layers
                                        tf.keras.layers.Dense(units=len(dog_names), # output layer
                                                              activation="softmax")])
sequential_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 efficientnetv2-b0 (Functio  (None, 1280)              5919312   
 nal)                                                            
                                                                 
 dense (Dense)               (None, 120)               153720    
                                                                 
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

Wonderful!

We've now got a model with 6,073,032 parameters, however, only 153,720 of them (the ones in the dense layer) are trainable.

Our dense layer (also called a fully-connected layer or feed-forward layer) takes the outputs of the base_model and performs further calulations on them to map them to our required number of classes (120 for the number of dog breeds).

We use activation="softmax" (the Softmax function) to get prediction probablities, values between 0 and 1 which represent how much our model "thinks" a specific image relates to a certain class.

There's another common activation function called Sigmoid. If we only had two classes, for example, "dog" or "cat", we'd lean towards using this function.

Confusing, yes, but you'll get used to different functions with practice.

The following table summarizes a few use cases.

Activation Function	Use Cases	Code
Sigmoid	- When you have two choices (like yes or no, true or false). - In binary classification, where you're deciding between one thing or another (like if an email is spam or not spam). - When you want the output to be a probability between 0 and 1.	`tf.keras.activations.sigmoid` or `activation="sigmoid"`
Softmax	- When you have more than two choices. - In multi-class classification, like if you're trying to decide if a picture is of a dog, a cat, a horse, or a bird. - When you want to compare the probabilities across different options and pick the most likely one.	`tf.keras.activations.softmax` or `activation="softmax"`

Now our model is built, let's check our input and output shapes.

In [82]:

Copied!

# Check the input shape
sequential_model.input_shape
# Check the input shape
sequential_model.input_shape

Out[82]:

(None, 224, 224, 3)

In [83]:

Copied!

# Check the output shape
sequential_model.output_shape
# Check the output shape
sequential_model.output_shape

Out[83]:

(None, 120)

Beautiful!

Our sequential model takes in an image tensor of size [None, 224, 224, 3] and outputs a vector of shape [None, 120] where None is the batch size we specify.

Let's try our sequential model out with a single image input.

In [84]:

Copied!





# Get a single image with a batch size of 1
single_image_input = tf.expand_dims(image_batch[0], axis=0)

# Pass the image through our model
single_image_output_sequential = sequential_model(single_image_input)

# Check the output
single_image_output_sequential
# Get a single image with a batch size of 1
single_image_input = tf.expand_dims(image_batch[0], axis=0)

# Pass the image through our model
single_image_output_sequential = sequential_model(single_image_input)

# Check the output
single_image_output_sequential

Out[84]:

<tf.Tensor: shape=(1, 120), dtype=float32, numpy=
array([[0.00783153, 0.01119391, 0.00476165, 0.0072348 , 0.00766934,
        0.00753752, 0.00522398, 0.02337082, 0.00579716, 0.00539333,
        0.00549823, 0.01011768, 0.00610076, 0.0109506 , 0.00540159,
        0.0079683 , 0.01227358, 0.01056393, 0.00507148, 0.00996652,
        0.00604106, 0.00729022, 0.0155036 , 0.00745004, 0.00628229,
        0.00796217, 0.00905823, 0.00712278, 0.01243507, 0.006427  ,
        0.00602891, 0.01276839, 0.00652441, 0.00842482, 0.01247454,
        0.00749902, 0.01086363, 0.007803  , 0.0058652 , 0.00474356,
        0.00902809, 0.00715358, 0.00981051, 0.00444271, 0.01031628,
        0.00691859, 0.00699083, 0.0065892 , 0.00966169, 0.01177148,
        0.00908043, 0.00729699, 0.00496712, 0.00509035, 0.00584058,
        0.01068885, 0.00817651, 0.00602052, 0.00901201, 0.01008151,
        0.00495409, 0.01285929, 0.00480146, 0.0108622 , 0.01421483,
        0.00814719, 0.00910061, 0.00798947, 0.00789293, 0.00636969,
        0.00656019, 0.01309155, 0.00754355, 0.00702062, 0.00485884,
        0.00958675, 0.01086809, 0.00682202, 0.00923016, 0.00856321,
        0.00482627, 0.01234931, 0.01140433, 0.00771413, 0.01140642,
        0.00382939, 0.00891482, 0.00409833, 0.00771865, 0.00652135,
        0.00668143, 0.00935989, 0.00784146, 0.00751913, 0.00785116,
        0.00794632, 0.0079146 , 0.00798953, 0.01011222, 0.01318719,
        0.00721227, 0.00736159, 0.01369175, 0.01087009, 0.00510072,
        0.00843218, 0.00451756, 0.00966478, 0.01013771, 0.00715721,
        0.00367131, 0.00825834, 0.00832634, 0.01225684, 0.00724481,
        0.00670675, 0.00536995, 0.01070637, 0.00937007, 0.00998812]],
      dtype=float32)>

Nice!

Our model has output a tensor of prediction probabilities in shape [1, 120], one value for each our dog classes.

Thanks to the softmax function, all of these values are between 0 and 1 and they should all add up to 1 (or close to it).

In [85]:

Copied!

# Sum the output
np.sum(single_image_output_sequential)
# Sum the output
np.sum(single_image_output_sequential)

Out[85]:

1.0

Beautiful!

Now how do we figure out which of the values our model thinks is most likely?

We take the index of the highest value!

We can find the index of the highest value using tf.argmax() or np.argmax().

We'll get the highest value (not the index) alongside it.

Let's try.

In [86]:

Copied!





# Find the index with the highest value
highest_value_index_sequential_model_output = np.argmax(single_image_output_sequential)
highest_value_sequential_model_output = np.max(single_image_output_sequential)

print(f"Highest value index: {highest_value_index_sequential_model_output} ({dog_names[highest_value_index_sequential_model_output]})")
print(f"Prediction probability: {highest_value_sequential_model_output}")
# Find the index with the highest value
highest_value_index_sequential_model_output = np.argmax(single_image_output_sequential)
highest_value_sequential_model_output = np.max(single_image_output_sequential)

print(f"Highest value index: {highest_value_index_sequential_model_output} ({dog_names[highest_value_index_sequential_model_output]})")
print(f"Prediction probability: {highest_value_sequential_model_output}")

Highest value index: 7 (basenji)
Prediction probability: 0.023370817303657532

Note: these values may change every time due to the model/data being randomly initalized, don't worry too much about them being different, in machine learning randomness is a good thing.

This prediction probability value is quite low.

With the highest potential value being 1.0, it means the model isn't very confident on its prediction.

Let's check the original label value of our single image.

In [87]:

Copied!

# Check the original label value
print(f"Predicted value: {highest_value_index_sequential_model_output}")
print(f"Actual value: {tf.argmax(label_batch[0]).numpy()}")
# Check the original label value
print(f"Predicted value: {highest_value_index_sequential_model_output}")
print(f"Actual value: {tf.argmax(label_batch[0]).numpy()}")

Predicted value: 7
Actual value: 95

Oh no! Looks like our model predicted the wrong label (or if it got it right, it was by pure chance).

This is to be expected.

As although our model comes with pretrained parameters from ImageNet, the dense layer we added on the end is initialized with random parameters.

So in essence, our model is randomly guessing what the label should be.

How do we fix this?

We can train the model to adjust its trainable parameters to better suit the data we're working with.

For completeness let's check out the text-based label our model predicted versus the original label.

In [88]:

Copied!





# Index on class_names with our model's highest prediction probability
sequential_model_predicted_label = class_names[tf.argmax(sequential_model(tf.expand_dims(image_batch[0], axis=0)), axis=1).numpy()[0]]

# Get the truth label
single_image_ground_truth_label = class_names[tf.argmax(label_batch[0])]

# Print predicted and ground truth labels
print(f"Sequential model predicted label: {sequential_model_predicted_label}")
print(f"Ground truth label: {single_image_ground_truth_label}")
# Index on class_names with our model's highest prediction probability
sequential_model_predicted_label = class_names[tf.argmax(sequential_model(tf.expand_dims(image_batch[0], axis=0)), axis=1).numpy()[0]]

# Get the truth label
single_image_ground_truth_label = class_names[tf.argmax(label_batch[0])]

# Print predicted and ground truth labels
print(f"Sequential model predicted label: {sequential_model_predicted_label}")
print(f"Ground truth label: {single_image_ground_truth_label}")

Sequential model predicted label: basenji
Ground truth label: schipperke

Creating a model with the Functional API¶

As mentioned before, the Keras Functional API is a way/design pattern for creating more complex models.

It can include multiple different modelling steps.

But it can also be used for simple models.

And it's the way we'll construct our Dog Vision models going forward.

Let's recreate our sequential_model using the Functional API.

We'll follow the same process as mentioned before:

Define the input layer(s).
Define the middle/hidden layer(s).
Define the output layer(s).
Bonus: Connect the inputs and outputs within an instance of tf.keras.Model().

In [89]:

Copied!





# 1. Create input layer
inputs = tf.keras.Input(shape=INPUT_SHAPE)

# 2. Create hidden layer
x = base_model(inputs, training=False)

# 3. Create the output layer
outputs = tf.keras.layers.Dense(units=len(class_names), # one output per class
                                activation="softmax",
                                name="output_layer")(x)

# 4. Connect the inputs and outputs together
functional_model = tf.keras.Model(inputs=inputs,
                                  outputs=outputs,
                                  name="functional_model")

# Get a model summary
functional_model.summary()
# 1. Create input layer
inputs = tf.keras.Input(shape=INPUT_SHAPE)

# 2. Create hidden layer
x = base_model(inputs, training=False)

# 3. Create the output layer
outputs = tf.keras.layers.Dense(units=len(class_names), # one output per class
                                activation="softmax",
                                name="output_layer")(x)

# 4. Connect the inputs and outputs together
functional_model = tf.keras.Model(inputs=inputs,
                                  outputs=outputs,
                                  name="functional_model")

# Get a model summary
functional_model.summary()

Model: "functional_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_4 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 efficientnetv2-b0 (Functio  (None, 1280)              5919312   
 nal)                                                            
                                                                 
 output_layer (Dense)        (None, 120)               153720    
                                                                 
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

Functional model created!

Let's try it out.

It works in the same fashion as our sequential_model.

In [90]:

Copied!





# Pass a single image through our functional_model
single_image_output_functional = functional_model(single_image_input)

# Find the index with the highest value
highest_value_index_functional_model_output = np.argmax(single_image_output_functional)
highest_value_functional_model_output = np.max(single_image_output_functional)

highest_value_index_functional_model_output, highest_value_functional_model_output
# Pass a single image through our functional_model
single_image_output_functional = functional_model(single_image_input)

# Find the index with the highest value
highest_value_index_functional_model_output = np.argmax(single_image_output_functional)
highest_value_functional_model_output = np.max(single_image_output_functional)

highest_value_index_functional_model_output, highest_value_functional_model_output

Out[90]:

(69, 0.017855722)

Nice!

Looks like we got a slightly different value to our sequential_model (or they may be the same if randomness wasn't so random).

Why is this?

Because our functional_model was initialized with a random tf.keras.layers.Dense layer as well.

So the outputs of our functional_model are essentially random as well (neural networks start with random numbers and adjust them to better represent patterns in data).

Not to fear, we'll fix this soon when we train our model.

Right now we've created our model with a few scattered lines of code.

How about we functionize the model creation so we can repeat it later on?

Functionizing model creation¶

We've created two different kinds of models so far.

Each of which use the same layers.

Except one was with the Keras Sequential API and the other was with the Keras Functional API.

However, it would be quite tedious to rewrite that modelling code every time we wanted to create a new model.

So let's create a function called create_model() to replicate the model creation step with the Functional API.

Note: We're focused on the Functional API since it takes a bit more practice than the Sequential API.

In [91]:

Copied!





def create_model(include_top: bool = False,
                 num_classes: int = 1000,
                 input_shape: tuple[int, int, int] = (224, 224, 3),
                 include_preprocessing: bool = True,
                 trainable: bool = False,
                 dropout: float = 0.2,
                 model_name: str = "model") -> tf.keras.Model:
  """
  Create an EfficientNetV2 B0 feature extractor model with a custom classifier layer.

  Args:
      include_top (bool, optional): Whether to include the top (classifier) layers of the model.
      num_classes (int, optional): Number of output classes for the classifier layer.
      input_shape (tuple[int, int, int], optional): Input shape for the model's images (height, width, channels).
      include_preprocessing (bool, optional): Whether to include preprocessing layers for image normalization.
      trainable (bool, optional): Whether to make the base model trainable.
      dropout (float, optional): Dropout rate for the global average pooling layer.
      model_name (str, optional): Name for the created model.

  Returns:
      tf.keras.Model: A TensorFlow Keras model with the specified configuration.
  """
  # Create base model
  base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=include_top,
    weights="imagenet",
    input_shape=input_shape,
    include_preprocessing=include_preprocessing,
    pooling="avg" # Can use this instead of adding tf.keras.layers.GlobalPooling2D() to the model
    # pooling="max" # Can use this instead of adding tf.keras.layers.MaxPooling2D() to the model
  )

  # Freeze the base model (if necessary)
  base_model.trainable = trainable

  # Create input layer
  inputs = tf.keras.Input(shape=input_shape, name="input_layer")

  # Create model backbone (middle/hidden layers)
  x = base_model(inputs, training=trainable)
  # x = tf.keras.layers.GlobalAveragePooling2D()(x) # note: you should include pooling here if not using `pooling="avg"`
  # x = tf.keras.layers.Dropout(0.2)(x) # optional regularization layer (search "dropout" for more)

  # Create output layer (also known as "classifier" layer)
  outputs = tf.keras.layers.Dense(units=num_classes,
                                  activation="softmax",
                                  name="output_layer")(x)

  # Connect input and output layer
  model = tf.keras.Model(inputs=inputs,
                         outputs=outputs,
                         name=model_name)

  return model
def create_model(include_top: bool = False,
                 num_classes: int = 1000,
                 input_shape: tuple[int, int, int] = (224, 224, 3),
                 include_preprocessing: bool = True,
                 trainable: bool = False,
                 dropout: float = 0.2,
                 model_name: str = "model") -> tf.keras.Model:
  """
  Create an EfficientNetV2 B0 feature extractor model with a custom classifier layer.

  Args:
      include_top (bool, optional): Whether to include the top (classifier) layers of the model.
      num_classes (int, optional): Number of output classes for the classifier layer.
      input_shape (tuple[int, int, int], optional): Input shape for the model's images (height, width, channels).
      include_preprocessing (bool, optional): Whether to include preprocessing layers for image normalization.
      trainable (bool, optional): Whether to make the base model trainable.
      dropout (float, optional): Dropout rate for the global average pooling layer.
      model_name (str, optional): Name for the created model.

  Returns:
      tf.keras.Model: A TensorFlow Keras model with the specified configuration.
  """
  # Create base model
  base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=include_top,
    weights="imagenet",
    input_shape=input_shape,
    include_preprocessing=include_preprocessing,
    pooling="avg" # Can use this instead of adding tf.keras.layers.GlobalPooling2D() to the model
    # pooling="max" # Can use this instead of adding tf.keras.layers.MaxPooling2D() to the model
  )

  # Freeze the base model (if necessary)
  base_model.trainable = trainable

  # Create input layer
  inputs = tf.keras.Input(shape=input_shape, name="input_layer")

  # Create model backbone (middle/hidden layers)
  x = base_model(inputs, training=trainable)
  # x = tf.keras.layers.GlobalAveragePooling2D()(x) # note: you should include pooling here if not using `pooling="avg"`
  # x = tf.keras.layers.Dropout(0.2)(x) # optional regularization layer (search "dropout" for more)

  # Create output layer (also known as "classifier" layer)
  outputs = tf.keras.layers.Dense(units=num_classes,
                                  activation="softmax",
                                  name="output_layer")(x)

  # Connect input and output layer
  model = tf.keras.Model(inputs=inputs,
                         outputs=outputs,
                         name=model_name)

  return model

What a beautiful function!

Let's try it out.

In [92]:

Copied!

# Create a model
model_0 = create_model(num_classes=len(class_names))
model_0.summary()
# Create a model
model_0 = create_model(num_classes=len(class_names))
model_0.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 efficientnetv2-b0 (Functio  (None, 1280)              5919312   
 nal)                                                            
                                                                 
 output_layer (Dense)        (None, 120)               153720    
                                                                 
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

Woohoo! Looks like it worked!

Now how about we inspect each of the layers and whether they're trainable?

In [93]:

Copied!

for layer in model_0.layers:
  print(layer.name, layer.trainable)
for layer in model_0.layers:
  print(layer.name, layer.trainable)

input_layer True
efficientnetv2-b0 False
output_layer True

Nice, looks like our base_model (efficientnetv2-b0) is frozen (it's not trainable).

And our output_layer is trainable.

This means we'll be reusing the patterns learned in the base_model to feed into our output_layer and then customizing those parameters to suit our own problem.

7. Model 0 - Train a model on 10% of the training data¶

We've seen our model make a couple of predictions on our data.

And so far it hasn't done so well.

This is expected though.

Our model is essentially predicting random class values given an image.

Let's change that.

How?

By training the final layer on our model to be customized to recognizing images of dogs.

We can do so via five steps:

Creating the model - We've done this ✅.
Compiling the model - Here's where we'll tell the model how to improve itself and how to measure its performance.
Fitting the model - Here's where we'll show the model examples of what we'd like it to learn (e.g. batches of samples containing pairs of dog images and their breed).
Evaluating the model - Once our model is trained on the training data, we can evaluate it on the testing data (data the model has never seen).
Making a custom prediction - Finally, the best way to test a machine learning model is by seeing how it goes on custom data. This is where we'll try to make a prediction on our own custom images of dogs.

We'll work through each of these over the next few sections.

To begin, let's create a model.

To do so, we can use our create_model() function that we made earlier.

In [94]:

Copied!

# 1. Create model
model_0 = create_model(num_classes=len(class_names),
                       model_name="model_0")

model_0.summary()
# 1. Create model
model_0 = create_model(num_classes=len(class_names),
                       model_name="model_0")

model_0.summary()

Model: "model_0"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 efficientnetv2-b0 (Functio  (None, 1280)              5919312   
 nal)                                                            
                                                                 
 output_layer (Dense)        (None, 120)               153720    
                                                                 
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

Model created!

How about we compile it?

Compiling a model¶

After we've created a model, the next step is to compile it.

If creating a model is putting together learning blocks, compiling a model is to getting those learning blocks ready to learn.

We can compile our model_0 using the tf.keras.Model.compile() method.

There are many options we can pass to the compile() method, however, the main ones we'll be focused on are:

The optimizer - this tells the model how to improve based on the loss value.
The loss function - this measures how wrong the model is (e.g. how far off are its predictions from the truth, an ideal loss value is 0, meaning the model is perfectly predicting the data).
The metric(s) - this is a human-readable value that shows how your model is performing, for example, accuracy is often used as an evaluation metric.

These three settings work together to help improve a model.

Which optimizer should I use?¶

An optimizer tells a model how to improve its internal parameters (weights) to hopefully improve a loss value.

In most cases, improving the loss means to minimize it (a loss value is a measure of how wrong your model's predictions are, a perfect model will have a loss value of 0).

It does this through a process called gradient descent.

The gradients needed for gradient descent are calculated through backpropagation, a method that computes the gradient of the loss function with respect to each weight in the model.

Once the gradients have been calculated, the optimizer then tries to update the model weights so that they move in the opposite direction of the gradient (if you go down the gradient of a function, you reduce its value).

If you've never heard of the above processes, that's okay.

TensorFlow implements many of them behind the scenes.

For now, the main takeaway is that neural networks learn in the following fashion:

Start with random patterns/weights -> Look at data (forward pass) -> Try to predict data (with current weights) -> Measure performance of predictions (loss function, backpropagation calculates gradients of loss with respect to weights) -> Update patterns/weights (optimizer, gradient descent adjusts weights in the opposite direction of the gradients to minimize loss) -> Look at data (forward pass) -> Try to predict data (with updated weights) -> Measure performance (loss function) -> Update patterns/weights (optimizer) -> Repeat all of the above X times.

A schematic representation of a machine learning process for classifying dog breeds. The image displays a four-step cycle beginning with 'Inputs' of stylized dog images, moving to 'Numerical encoding' of the images' color values, then to a neural network diagram labeled 'Learns representation (patterns/features/weights),' and finally showing 'Representation outputs' with probabilities leading to classified outputs of dog breeds such as Great Dane, Labrador, and Boxer. The cycle completes with 'Repeat with more examples X times' indicating iterative learning.

Example of how a neural network learns (in brief). Note the cyclical nature of the learning. You can think of it as a big game of guess and check, where the guess (hopefully) get better over time.

I'll leave the intricacies of gradient descent and backpropagation to your own extra-curricula research.

We're going to focus on using the tools TensorFlow has to offer to implement this process.

As for optimizer functions, there are two main options to get started:

Optimizer	Code
Stochastic Gradient Descent (SGD)	`tf.keras.optimizers.SGD()` or `"sgd"` for short.
Adam	`tf.keras.optimizers.Adam()` or `"adam"` for short.

Why these two?

Because they're the most often used in practice (you can see this via the number of machine learning papers referencing each one on paperswithcode.com).

There are many more optimizers available in the tf.keras.optimizers module too.

The good thing about using a premade optimizer from tf.keras.optimizers is that they usually come with good starting settings.

One the main ones being the learning_rate value.

The learning_rate is one of the most important hyperparameters to set in a neural network training setup.

It determines how much of a step change the optimizer will adjust your models weights every iteration.

Too low and the model won't learn.

Too high and the model will try to take too big of steps.

By default, TensorFlow sets the learning rate of the Adam optimizer to 0.001 (tf.keras.optimizers.Adam(learning_rate=0.001)) which is a good setting for many problems to get started with.

We can also set this default with the shortcut optimizer="adam".

For more on finding the optimal learning rate, try searching for "finding the optimal learning rate for neural networks".

In [95]:

Copied!





# Create optimizer (short version)
optimizer = "adam"

# The above line is the same as below
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer
# Create optimizer (short version)
optimizer = "adam"

# The above line is the same as below
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer

Out[95]:

<keras.src.optimizers.adam.Adam at 0x7f3bb4107040>

Which loss function should I use?¶

A loss function measures how wrong your model's predictions are.

A model with poor predictions in comparison to the truth data will have a high loss value.

Where as a model with perfect predictions (e.g. it gets every prediction correct) will have a loss value of 0.

Different problems have different loss functions.

Some of the most common ones include:

Loss Function	Problem Type	Code
Mean Absolute Error (MAE)	Regression (predicting a number)	`tf.keras.losses.MeanAbsoluteError` or `"mae"` for short
Mean Squared Error (MSE)	Regression (predicting a number)	`tf.keras.losses.MeanSquaredError`
Binary Cross Entropy (BCE)	Binary classification	`tf.keras.losses.BinaryCrossentropy`
Categorical Cross Entropy	Multi-class classification	`tf.keras.losses.CategoricalCrossentropy` if your labels are one-hot encoded (e.g. `[0, 0, 0, 0, 1, 0...]`) or `tf.keras.losses.SparseCategoricalCrossentropy` if your labels are integers (e.g. `[[1], [23], [43], [16]...]`)

In our case, since we're working with multi-class classification (multiple different dog breeds) and our labels are one-hot encoded, we'll be using tf.keras.losses.CategoricalCrossentropy.

We can leave all of the default parameters as they are as well.

However, if we didn't have activation="softmax" in the final layer of our model, we'd have to change from_logits=False to from_logits=True as the softmax activation function does this conversion for us.

There are more loss functions than the ones we've discussed and you can see many of them on paperswithcode.com.

TensorFlow also has many more loss function implementations available in tf.keras.losses.

Let's check out a single sample of our labels to make sure they're one-hot encoded.

In [96]:

Copied!

# Check that our labels are one-hot encoded
label_batch[0]
# Check that our labels are one-hot encoded
label_batch[0]

Out[96]:

<tf.Tensor: shape=(120,), dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0.], dtype=float32)>

Excellent! Looks like our labels are indeed one-hot encoded.

Now let's create our loss function as tf.keras.losses.CategoricalCrossentropy(from_logits=False) or "categorical_crossentropy" for short.

We set from_logits=False (this is the default) because our model uses activation="softmax" in the final layer so it's outputing prediction probabilities rather than logits (without activation="softmax" the outputs of our model would be referred to as logits, I'll leave this for extra-curricula investigation).

In [97]:

Copied!

# Create our loss function
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) # use from_logits=False if using an activation function in final layer of model (default)
loss
# Create our loss function
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) # use from_logits=False if using an activation function in final layer of model (default)
loss

Out[97]:

<keras.src.losses.CategoricalCrossentropy at 0x7f3bb4107430>

Which mertics should I use?¶

The evaluation metric is a human-readable value which is used to see how well your model is performing.

A slightly confusing concept is that the evaluation metric and loss function can be the same equation.

However, the main difference between a loss function and an evaluation metric is that the loss function will typically be differentiable (there are some exceptions to the rule but in most cases, the loss function will be differentiable).

Whereas, the evaluation metric does not have to be differtiable.

In the case of regression (predicting a number), your loss function and evaluation metric could be mean squared error (MSE).

Whereas in the case of classification, your loss function will generally be binary crossentropy (for two classes) or categorical crossentropy (for multiple classes) and your evalaution metric(s) could be accuracy, F1-score, precision and/or recall.

TensorFlow provides many pre-built metrics in the tf.keras.metrics module.

Evaluation Metric	Problem Type	Code
Accuracy	Classification	`tf.keras.metrics.Accuracy` or `"accuracy"` for short
Precision	Classification	`tf.keras.metrics.Precision`
Recall	Classification	`tf.keras.metrics.Recall`
F1 Score	Classification	`tf.keras.metrics.F1Score`
Mean Squared Error (MSE)	Regression	`tf.keras.metrics.MeanSquaredError` or `"mse"` for short
Mean Absolute Error (MAE)	Regression	`tf.keras.metrics.MeanAbsoluteError` or `"mae"`
Area Under the ROC Curve (AUC-ROC)	Binary Classification	`tf.keras.metrics.AUC` with `curve='ROC'`

The tf.keras.Model.compile() method expects the metrics parameter input as a list.

Since we're working with a classification problem, let's setup our evaluation metric as accuracy.

In [98]:

Copied!

# Create list of evaluation metrics
metrics = ["accuracy"]
# Create list of evaluation metrics
metrics = ["accuracy"]

Learn more on how a model learns¶

We've breifly touched on optimizers, loss functions, gradient descent and backpropagation, the backbone of neural network learning, however, for a more in-depth look at each of these, I'd check out the following:

3Blue1Brown's series on Neural Networks - a fantastic 4 part video series on how neural networks are built to how they learn through gradient descent and backpropagation.
The Little Book of Deep Learning by François Fleuret - a free ~150 page booklet on the ins and outs of deep learning. The notation may be intimidating at first but with practice you will begin to understand it.

Putting it all together and compiling our model¶

Phew!

We've now been through all the main steps in compiling a model:

Creating the optimizer.
Creating the loss function.
Creating the evaluation metrics.

Now let's put everything we've done together and compile our model_0.

First we'll do it with shortcuts (e.g. "accuracy") then we'll do it with specific classes.

In [99]:

Copied!





# Compile model with shortcuts (faster to write code but less customizable)
model_0.compile(optimizer="adam",
                loss="categorical_crossentropy",
                metrics=["accuracy"])

# Compile model with classes (will do the same as above)
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
                metrics=["accuracy"])
# Compile model with shortcuts (faster to write code but less customizable)
model_0.compile(optimizer="adam",
                loss="categorical_crossentropy",
                metrics=["accuracy"])

# Compile model with classes (will do the same as above)
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
                metrics=["accuracy"])

Fitting a model on the data¶

Model created and compiled!

Time to fit it to the data.

This means we're going to pass all of the data we have (dog images and their assigned labels) through our model and ask it to try and learn the relationship between the images and the labels.

Fitting the model is step 3 in our list:

Creating the model - We've done this ✅.
Compiling the model - We've done this ✅.
Fitting the model - Here's where we'll show the model examples of what we'd like it to learn (e.g. the relationship between an image of a dog and its breed).
Evaluating the model - Once our model is trained on the training data, we can evaluate it on the testing data (data the model has never seen).
Making a custom prediction - Finally, the best way to test a machine learning model is by seeing how it goes on custom data. This is where we'll try to make a prediction on our own custom images of dogs.

We can fit our model_0 instance with the tf.keras.Model.fit() method.

The main parameters of the fit() method we'll be paying attention to are:

x = What data do you want the model to train on?
y = What labels do you want your model to learn the patterns from your data to?
batch_size = The number of samples your model will look at per gradient update (e.g. 32 samples at a time before updating its internal patterns).
epochs = How many times do you want the model to go through all samples (e.g. epochs=5 means looking at all of the data 5 times)?
validation_data = What data do you want to evaluate your model's learning on?

There are plenty more options in the TensorFlow/Keras documentation for the fit() method.

However, these options will be more than enough for us.

In our case, let's keep our experiments quick and set the following:

x=train_10_percent_ds - Since we've crafted a tf.data.Dataset, our x and y values are combined into one. We'll also start by training on 10% of the data for quicker experimentation (if things work on a smaller subset of the data, we can always increase it).
epochs=5 - The more epochs you do, the more opportunities your model has to learn patterns, however, it also prolongs training.
validation_data=test_ds - We'll evaluate the model's learning on the test dataset (samples its never seen before).

Let's do it!

Time to train our first neural network and bring Dog Vision 🐶👁️ to life!

Note: If you don't have a GPU here, training will likely take a considerably long time. You can activate a GPU in Google Colab by going to Runtime -> Change runtime type -> Hardware accelerator -> GPU. Note that changing a runtime type will mean you will have to restart your runtime and rerun all of the cells above.

In [100]:

Copied!





# Fit model_0 for 5 epochs
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
                        epochs=epochs,
                        validation_data=test_ds)
# Fit model_0 for 5 epochs
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
                        epochs=epochs,
                        validation_data=test_ds)

Epoch 1/5
38/38 [==============================] - 27s 482ms/step - loss: 3.9758 - accuracy: 0.3000 - val_loss: 3.0500 - val_accuracy: 0.5415
Epoch 2/5
38/38 [==============================] - 14s 379ms/step - loss: 2.0531 - accuracy: 0.8008 - val_loss: 1.8650 - val_accuracy: 0.7041
Epoch 3/5
38/38 [==============================] - 14s 375ms/step - loss: 1.0491 - accuracy: 0.9025 - val_loss: 1.3060 - val_accuracy: 0.7548
Epoch 4/5
38/38 [==============================] - 14s 373ms/step - loss: 0.6138 - accuracy: 0.9483 - val_loss: 1.0317 - val_accuracy: 0.7910
Epoch 5/5
38/38 [==============================] - 14s 373ms/step - loss: 0.4157 - accuracy: 0.9683 - val_loss: 0.8927 - val_accuracy: 0.8044

Woah!!!

Looks like our model performed outstandingly well!

Achieving a validation accuracy of ~80% after just 5 epochs of training.

That's far better than the original Stanford Dogs paper results of 22% accuracy.

How?

That's the power of transfer learning (and a series of modern updates to neural network architectures, hardware and training regimes)!

But these are just numbers on a page.

We'll get more in-depth on evaluations shortly.

For now, let's do a recap on the 3 steps we've practiced: create, compile, fit.

8. Putting it all together: create, compile, fit¶

Let's practice what we've done so far to train our first neural network.

Specifically, we're going to:

Create a model (using our create_model()) function.
Compile our model (selecting our optimizer, loss function and evaluation metric).
Fit our model (get it to figure out the patterns bettwen images and labels).

And later on, we'll get to the other steps of evaluation and making custom predictions.

In [101]:

Copied!





# 1. Create a model
model_0 = create_model(num_classes=len(dog_names))

# 2. Compile the model
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                loss="categorical_crossentropy",
                metrics=["accuracy"])

# 3. Fit the model
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
                        epochs=epochs,
                        validation_data=test_ds)
# 1. Create a model
model_0 = create_model(num_classes=len(dog_names))

# 2. Compile the model
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                loss="categorical_crossentropy",
                metrics=["accuracy"])

# 3. Fit the model
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
                        epochs=epochs,
                        validation_data=test_ds)

Epoch 1/5
38/38 [==============================] - 22s 418ms/step - loss: 3.9263 - accuracy: 0.3225 - val_loss: 2.9969 - val_accuracy: 0.5549
Epoch 2/5
38/38 [==============================] - 14s 379ms/step - loss: 1.9899 - accuracy: 0.7900 - val_loss: 1.8436 - val_accuracy: 0.7063
Epoch 3/5
38/38 [==============================] - 14s 380ms/step - loss: 1.0152 - accuracy: 0.9058 - val_loss: 1.2817 - val_accuracy: 0.7702
Epoch 4/5
38/38 [==============================] - 14s 376ms/step - loss: 0.5997 - accuracy: 0.9483 - val_loss: 1.0173 - val_accuracy: 0.7945
Epoch 5/5
38/38 [==============================] - 14s 374ms/step - loss: 0.4040 - accuracy: 0.9708 - val_loss: 0.8792 - val_accuracy: 0.8107

Nice! We just trained our second neural network!

We practice these steps because they will be part of many of your future machine learning workflows.

As an extension, you could create a function called create_and_compile() which does the first two steps in one hit.

Now we've got a trained model, let's get to evaluating it.

Evaluate Model 0 on the test data¶

Alright, the next step in our journey is to evaluate our trained model.

In fact, evaluating a model is just as important as training a model.

There are several ways to evaluate a model:

Look at the metrics (such as accuracy).
Plot the loss curves.
Make predictions on the test set and compare them to the truth labels.
Make predictions on custom samples (not contained in the training or test sets).

We've done the first one, as these metrics were the outputs of our model training.

Now we're going to focus on the next two.

Plotting loss curves and making predictions on the test set.

We'll get to custom images later on.

So what are loss curves?

Loss curves are a visualization of how your model's loss value performs overtime.

We say loss "curves" because you can have a loss curve for each dataset, training, validation and test.

An ideal loss curve will start high and move towards zero (a perfect model will have a loss value of zero).

How do we get a loss curve?

We could manually plot the loss values output from our model training.

Or we could programmatically get the values thanks to the History object.

This object is returned by the fit method of tf.keras.Model instances.

And we've already got one!

It's saved to history_0 (the model history for model_0).

The History.history attribute contains a record of the training loss values and evaluation metrics for each epoch.

Let's check it out.

In [102]:

Copied!

# Inspect History.history attribute for model_0
history_0.history
# Inspect History.history attribute for model_0
history_0.history

Out[102]:

{'loss': [3.926330089569092,
  1.9898805618286133,
  1.0152279138565063,
  0.599678099155426,
  0.4040333032608032],
 'accuracy': [0.32249999046325684,
  0.7900000214576721,
  0.9058333039283752,
  0.9483333230018616,
  0.9708333611488342],
 'val_loss': [2.996889591217041,
  1.8436286449432373,
  1.2817054986953735,
  1.0173338651657104,
  0.8792150616645813],
 'val_accuracy': [0.5548951029777527,
  0.7062937021255493,
  0.7701631784439087,
  0.7945221662521362,
  0.8107225894927979]}

Wonderful!

We've got a history of our model training over time.

It looks like everything is moving in the right direction.

Loss is going down whilst accuracy is going up.

How about we adhere to the data explorer's motto and write a function to visualize, visualize, visualize!

We'll call the function plot_model_loss_curves() and it'll take a History.history object as input and then plot loss and accuracy curves using matplotlib.

In [103]:

Copied!





def plot_model_loss_curves(history: tf.keras.callbacks.History) -> None:
  """Takes a History object and plots loss and accuracy curves."""

  # Get the accuracy values
  acc = history.history["accuracy"]
  val_acc = history.history["val_accuracy"]

  # Get the loss values
  loss = history.history["loss"]
  val_loss = history.history["val_loss"]

  # Get the number of epochs
  epochs_range = range(len(acc))

  # Create accuracy curves plot
  plt.figure(figsize=(14, 7))
  plt.subplot(1, 2, 1)
  plt.plot(epochs_range, acc, label="Training Accuracy")
  plt.plot(epochs_range, val_acc, label="Validation Accuracy")
  plt.legend(loc="lower right")
  plt.title("Training and Validation Accuracy")
  plt.xlabel("Epoch")
  plt.ylabel("Accuracy")

  # Create loss curves plot
  plt.subplot(1, 2, 2)
  plt.plot(epochs_range, loss, label="Training Loss")
  plt.plot(epochs_range, val_loss, label="Validation Loss")
  plt.legend(loc="upper right")
  plt.title("Training and Validation Loss")
  plt.xlabel("Epoch")
  plt.ylabel("Loss")

  plt.show()

plot_model_loss_curves(history=history_0)
def plot_model_loss_curves(history: tf.keras.callbacks.History) -> None:
  """Takes a History object and plots loss and accuracy curves."""

  # Get the accuracy values
  acc = history.history["accuracy"]
  val_acc = history.history["val_accuracy"]

  # Get the loss values
  loss = history.history["loss"]
  val_loss = history.history["val_loss"]

  # Get the number of epochs
  epochs_range = range(len(acc))

  # Create accuracy curves plot
  plt.figure(figsize=(14, 7))
  plt.subplot(1, 2, 1)
  plt.plot(epochs_range, acc, label="Training Accuracy")
  plt.plot(epochs_range, val_acc, label="Validation Accuracy")
  plt.legend(loc="lower right")
  plt.title("Training and Validation Accuracy")
  plt.xlabel("Epoch")
  plt.ylabel("Accuracy")

  # Create loss curves plot
  plt.subplot(1, 2, 2)
  plt.plot(epochs_range, loss, label="Training Loss")
  plt.plot(epochs_range, val_loss, label="Validation Loss")
  plt.legend(loc="upper right")
  plt.title("Training and Validation Loss")
  plt.xlabel("Epoch")
  plt.ylabel("Loss")

  plt.show()

plot_model_loss_curves(history=history_0)

Woohoo! Now those are some nice looking curves.

Our model is doing exactly what we'd like it to do.

The accuracy is moving up while the loss is going down.

Overfitting and underfitting (when your model doesn't perform how you'd like)¶

You may be wondering why there's a gap between the training and validation loss curves.

Ideally, the two lines would closely follow each other.

In our case, the validation loss doesn't decrease as low as the training loss.

This is known as overfitting, a common problem in machine learning where a model learns the training data very well but doesn't generalize to other unseen data.

You can think of this as a university student memorizing the course materials but failing to apply that knowledge to problems that aren't in the course materials (real-world problems).

The reverse of overfitting is underfitting, which is when a model fails to learn anything useful. For example, it never manages to increase accuracy or decrease loss.

Good news is, our model isn't underfitting (it's performing at ~80% accuracy on unseen data).

I'll leave "ways to fix overfitting" as an extension.

But one of the best ways is to use more data.

And guess what?

We've got plenty more!

Reminder, these results were achieved using only 10% of the training data.

Before we train a model with more data, there's another way to quickly evaluate our model on a given dataset.

And that's using the tf.keras.Model.evaluate() method.

How about we try it on our model_0?

We'll save the outputs to a model_0_results variable so we can use them later.

In [104]:

Copied!

# Evaluate model_0, see: https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate
model_0_results = model_0.evaluate(x=test_ds)
model_0_results
# Evaluate model_0, see: https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate
model_0_results = model_0.evaluate(x=test_ds)
model_0_results

269/269 [==============================] - 13s 47ms/step - loss: 0.8792 - accuracy: 0.8107

Out[104]:

[0.8792150616645813, 0.8107225894927979]

Beautiful!

Evaluating our model on the test data shows it's performing at ~80% accuracy despite only seeing 10% of the training data.

We can also get the metrics used by our model with the metrics_names attribute.

In [105]:

Copied!

# Get our model's metrics names
model_0.metrics_names
# Get our model's metrics names
model_0.metrics_names

Out[105]:

['loss', 'accuracy']

9. Model 1 - Train a model on 100% of the training data¶

Time to step it up a notch!

We've trained a model on 10% of the training data (to see if it works and it did!), now let's train a model on 100% of the training data and see what happens.

But before we do...

What do you think will happen?

If our model was able to perform well on only 10% of the data, how do you think it will go on 100% of the data?

These types of questions are good to think about in the world of machine learning.

After all, that's why the machine learner's motto is experiment, experiment, experiment!

Let's follow our three steps from before:

Create a model (using our create_model()) function.
Compile our model (selecting our optimizer, loss function and evaluation metric).
Fit our model (this time on 100% of the data for 5 epochs).

Note: Fitting our model on such a large amount of data will take a long time without a GPU. If you're using Google Colab, you can access a GPU via Runtime -> Change runtime type -> Hardware accelerator -> GPU.

In [106]:

Copied!





# 1. Create model_1 (the next iteration of model_0)
model_1 = create_model(num_classes=len(class_names),
                       model_name="model_1")

# 2. Compile model
model_1.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                loss="categorical_crossentropy",
                metrics=["accuracy"])

# 3. Fit model
epochs=5
history_1 = model_1.fit(x=train_ds,
                        epochs=epochs,
                        validation_data=test_ds)
# 1. Create model_1 (the next iteration of model_0)
model_1 = create_model(num_classes=len(class_names),
                       model_name="model_1")

# 2. Compile model
model_1.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                loss="categorical_crossentropy",
                metrics=["accuracy"])

# 3. Fit model
epochs=5
history_1 = model_1.fit(x=train_ds,
                        epochs=epochs,
                        validation_data=test_ds)

Epoch 1/5
375/375 [==============================] - 43s 84ms/step - loss: 1.2725 - accuracy: 0.7607 - val_loss: 0.4849 - val_accuracy: 0.8756
Epoch 2/5
375/375 [==============================] - 30s 80ms/step - loss: 0.3667 - accuracy: 0.9013 - val_loss: 0.4041 - val_accuracy: 0.8770
Epoch 3/5
375/375 [==============================] - 30s 79ms/step - loss: 0.2641 - accuracy: 0.9287 - val_loss: 0.3731 - val_accuracy: 0.8832
Epoch 4/5
375/375 [==============================] - 30s 80ms/step - loss: 0.2043 - accuracy: 0.9483 - val_loss: 0.3708 - val_accuracy: 0.8819
Epoch 5/5
375/375 [==============================] - 30s 80ms/step - loss: 0.1606 - accuracy: 0.9633 - val_loss: 0.3753 - val_accuracy: 0.8767

Woah!

Was your intuition correct?

Did what you thought would happen actually happen?

It looks like all that extra data helped our model quite a bit, it's now performing at close to ~90% accuracy on the test set!

Question: How many epochs should I fit for?

Generally with transfer learning you can get pretty good results quite quickly, however, you may want to look into training for longer (more epochs) as an experiment to see whether your model improves or not. What we've performed is a transfer learning technique called feature extraction, however, you may want to look further into fine-tuning (training the whole model to your own dataset) whole model and using callbacks (functions that take place during model training) such as Early Stopping to prevent the model from training so long its performance begins to degrade.

Evaluate Model 1 on the test data¶

How about we evaluate our model_1?

Let's first by plotting loss curves with the data contained within history_1.

In [107]:

Copied!

# Plot model_1 loss curves
plot_model_loss_curves(history=history_1)
# Plot model_1 loss curves
plot_model_loss_curves(history=history_1)

Hmm, looks like our model performed well, however the validation accuracy and loss seemed to flatten out.

Whereas, the training accuracy and loss seemed to keep improving.

This is a sign of overfitting (model performing much better on the training set than the validation/test set).

However, since our model looks to be performing quite well I'll leave this overfitting problem as a research project for extra-curriculum.

For now, let's evaluate our model on the test dataset using the evaluate() method.

In [108]:

Copied!

# Evaluate model_1
model_1_results = model_1.evaluate(test_ds)
# Evaluate model_1
model_1_results = model_1.evaluate(test_ds)

269/269 [==============================] - 12s 46ms/step - loss: 0.3753 - accuracy: 0.8767

Nice!

Looks like that extra data boosted our models performance from ~80% on the test set to ~90% on test set (note: exact numbers here may vary due to the inherit randomness in machine learning models).

Extension: Putting it all together

As a potential extension, you may want to try practicing putting all of the steps we've been through so far together. As in, loading the data, creating the model, compiling the model, fitting the model and evaluating the model. That's what I've found is one of the best ways to learn ML problems, replicating a system end to end.

10. Make and evaluate predictions of the best model¶

Now we've trained a model, it's time to make predictions with it!

That's the whole goal of machine learning.

Train a model on existing data, to make predictions on new data.

Our test data is supposed to simulate new data, data our model has never seen before.

We can make predictions with the tf.keras.Model.predict() method, passing it our test_ds (short for test dataset) variable.

In [109]:

Copied!

# This will output logits (as long as softmax activation isn't in the model)
test_preds = model_1.predict(test_ds)

# Note: If not using activation="softmax" in last layer of model, may need to turn them into prediction probabilities (easier to understand)
# test_preds = tf.keras.activations.softmax(tf.constant(test_preds), axis=-1)
# This will output logits (as long as softmax activation isn't in the model)
test_preds = model_1.predict(test_ds)

# Note: If not using activation="softmax" in last layer of model, may need to turn them into prediction probabilities (easier to understand)
# test_preds = tf.keras.activations.softmax(tf.constant(test_preds), axis=-1)

269/269 [==============================] - 13s 44ms/step

Let's inspect our test_preds by first checking its shape.

In [110]:

Copied!

test_preds.shape
test_preds.shape

Out[110]:

(8580, 120)

Okay, looks like our test_pred variable contains 8580 values (one for each test sample) with 120 elements (one value for each dog class).

Let's inspect a single test prediction and see what it looks like.

In [111]:

Copied!





# Get a "random" variable between all of the test samples
random.seed(42)
random_test_index = random.randint(0, test_preds.shape[0] - 1)
print(f"[INFO] Random test index: {random_test_index}")

# Inspect a single test prediction sample
random_test_pred_sample = test_preds[random_test_index]

print(f"[INFO] Random test pred sample shape: {random_test_pred_sample.shape}")
print(f"[INFO] Random test pred sample argmax: {tf.argmax(random_test_pred_sample)}")
print(f"[INFO] Random test pred sample label: {dog_names[tf.argmax(random_test_pred_sample)]}")
print(f"[INFO] Random test pred sample max prediction probability: {tf.reduce_max(random_test_pred_sample)}")
print(f"[INFO] Random test pred sample prediction probability values:\n{random_test_pred_sample}")
# Get a "random" variable between all of the test samples
random.seed(42)
random_test_index = random.randint(0, test_preds.shape[0] - 1)
print(f"[INFO] Random test index: {random_test_index}")

# Inspect a single test prediction sample
random_test_pred_sample = test_preds[random_test_index]

print(f"[INFO] Random test pred sample shape: {random_test_pred_sample.shape}")
print(f"[INFO] Random test pred sample argmax: {tf.argmax(random_test_pred_sample)}")
print(f"[INFO] Random test pred sample label: {dog_names[tf.argmax(random_test_pred_sample)]}")
print(f"[INFO] Random test pred sample max prediction probability: {tf.reduce_max(random_test_pred_sample)}")
print(f"[INFO] Random test pred sample prediction probability values:\n{random_test_pred_sample}")

[INFO] Random test index: 1824
[INFO] Random test pred sample shape: (120,)
[INFO] Random test pred sample argmax: 24
[INFO] Random test pred sample label: brittany_spaniel
[INFO] Random test pred sample max prediction probability: 0.9248308539390564
[INFO] Random test pred sample prediction probability values:
[3.0155065e-06 4.2946940e-05 3.2878995e-06 3.1306336e-05 1.7298260e-06
 1.3368123e-05 2.8498230e-06 6.8758955e-06 2.6828552e-06 4.6089318e-04
 9.8374185e-06 1.9263330e-06 7.6487186e-07 6.1217276e-04 1.2198443e-06
 5.9309714e-06 2.4797799e-05 2.5847612e-06 4.9912862e-05 3.1809162e-07
 1.0326848e-06 2.7293386e-06 2.1035332e-06 5.2793930e-06 9.2483085e-01
 2.6070888e-06 1.6410323e-06 1.4008251e-06 2.0515323e-05 2.1309786e-05
 1.4602327e-06 3.8456672e-04 7.4974610e-05 4.4831428e-05 5.5091264e-06
 2.1345174e-07 2.9732748e-06 5.5520386e-06 8.7954652e-07 1.6277906e-03
 5.3978354e-02 9.6090174e-05 9.6672220e-06 4.4037843e-06 2.5557700e-05
 6.3994042e-07 1.6738920e-06 4.6715216e-04 4.1448075e-06 6.4118845e-05
 2.0398900e-06 3.6135450e-06 4.4963690e-05 2.8406910e-05 3.4689847e-07
 6.2964758e-04 9.1336078e-05 5.2363583e-05 1.2731762e-06 2.4212743e-06
 1.5872080e-06 6.3476455e-06 6.2880179e-07 6.6757898e-06 1.6635622e-06
 4.3550008e-07 2.3698403e-05 1.4149221e-05 3.8156581e-05 1.0464001e-05
 5.0107906e-06 1.7395665e-06 2.8848885e-07 4.2622072e-05 3.2712339e-07
 1.8591476e-07 2.2874669e-05 7.9814470e-07 2.3121322e-05 1.6275973e-06
 4.6186727e-07 7.6188849e-07 3.2468931e-06 3.1449999e-05 2.9600946e-05
 3.8992380e-06 2.8564186e-06 4.1459539e-06 6.0877244e-07 2.5443229e-05
 5.4467969e-06 5.4184858e-07 2.8361776e-04 9.0548929e-05 8.8840829e-07
 9.1714105e-07 1.9990568e-07 1.7958368e-05 7.7042150e-06 2.4126435e-05
 1.9759838e-05 8.2941342e-06 2.5857928e-05 6.1904398e-06 1.4601937e-06
 1.5800337e-05 6.0928446e-06 5.0209674e-05 1.4067524e-05 2.3544631e-05
 1.4134421e-06 9.8844721e-05 9.1535941e-05 2.4448002e-03 5.8540131e-06
 1.2547853e-02 1.3779800e-05 8.0164841e-07 2.5093528e-05 3.7180773e-05]

Okay looks like each individual sample of our test predictions is a tensor of prediction probabilities.

In essence, each element is a probability between 0 and 1 as to how confident our model is whether the prediction is correct or not.

A prediction probability of 1 means the model is 100% confident the given sample belongs to that class.

A prediction probability of 0 means the model isn't assigning any value value to that class at all.

And then all the other values fill in between.

Note: Just because a model's prediction probability for a particular sample is closer to 1 on a certain class (e.g. 0.9999) doesn't mean it is correct. A prediction can have a high probability but still be incorrect. We'll see this later on in the "most wrong" section.

The maximum value of our prediction probabilities tensor is what the model considers is the most likely prediction given the specific sample.

We take the index of the maximum value (using tf.argmax) and index on the list of dog names to get the predicted class name.

Note: tf.argmax or "argmax" for short gets the index of where the maximum value occurs in a tensor along a specified dimension. We can use tf.reduce_max to get the maximum value itself.

To make our predictions easier to compare to the test dataset, let's unbundle our test_ds object into two separate arrays called test_ds_images and test_ds_labels.

We can do this by looping through the samples in our test_ds object and appending each to a list (we'll do this with a list comprehension).

Then we can join those lists together into an array with np.concatenate.

In [112]:

Copied!





import numpy as np

# Extract test images and labels from test_ds
test_ds_images = np.concatenate([images for images, labels in test_ds], axis=0)
test_ds_labels = np.concatenate([labels for images, labels in test_ds], axis=0)

# How many images and labels do we have?
len(test_ds_images), len(test_ds_labels)
import numpy as np

# Extract test images and labels from test_ds
test_ds_images = np.concatenate([images for images, labels in test_ds], axis=0)
test_ds_labels = np.concatenate([labels for images, labels in test_ds], axis=0)

# How many images and labels do we have?
len(test_ds_images), len(test_ds_labels)

Out[112]:

(8580, 8580)

Perfect!

Now we've got a way to compare our predictions on a given image (in test_ds_images) to its appropriate label in test_ds_labels.

This is one of the main reasons we didn't shuffle the test dataset.

Because now our predictions tensor has the same indexes as our test_ds_images and test_ds_labels arrays.

Meaning if we chose to compare sample number 42, everything would line up.

In fact, let's try just that.

In [113]:

Copied!





# Set target index
target_index = 42 # try changing this to another value and seeing how the model performs on other samples

# Get test image
test_image = test_ds_images[target_index]

# Get truth label (index of max in test label)
test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]

# Get prediction probabilities
test_image_pred_probs = test_preds[target_index]

# Get index of class with highest prediction probability
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]

# Plot the image
plt.figure(figsize=(5, 4))
plt.imshow(test_image.astype("uint8"))

# Create sample title with prediction probability value
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""

# Colour the title based on correctness of pred
plt.title(title,
          color="green" if test_image_truth_label == test_image_pred_class else "red")
plt.axis("off");
# Set target index
target_index = 42 # try changing this to another value and seeing how the model performs on other samples

# Get test image
test_image = test_ds_images[target_index]

# Get truth label (index of max in test label)
test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]

# Get prediction probabilities
test_image_pred_probs = test_preds[target_index]

# Get index of class with highest prediction probability
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]

# Plot the image
plt.figure(figsize=(5, 4))
plt.imshow(test_image.astype("uint8"))

# Create sample title with prediction probability value
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""

# Colour the title based on correctness of pred
plt.title(title,
          color="green" if test_image_truth_label == test_image_pred_class else "red")
plt.axis("off");

Woohoo!!! Look at that!

Looks like our model got the prediction right, according to the test data, sample number 42 is in fact an Affenpinscher.

Doing a quick search on Google for Affenpinscher seems to return similar looking dogs too.

Our model is working!

For sample 42 at least...

As an exercise you could try to change the target index above, perhaps to your favourite number and see how the model goes.

But we could also write some code to test a number of different samples at a time.

Visualizing predictions from our best trained model¶

We could sit there looking at single image predictions of dogs all day.

Or we could write code to look at multiple at a time...

Let's do the latter!

In [114]:

Copied!





# Choose a random 10 indexes from the test data and compare the values
import random

random.seed(42) # try changing the random seed or commenting it out for different values
random_indexes = random.sample(range(len(test_ds_images)), 10)

# Create a plot with multiple subplots
fig, axes = plt.subplots(2, 5, figsize=(15, 7))

# Loop through the axes of the plot
for i, ax in enumerate(axes.flatten()):
  target_index = random_indexes[i] # get a random index (this is another reason we didn't shuffle the test set)

  # Get relevant target image, label, prediction and prediction probabilities
  test_image = test_ds_images[target_index]
  test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
  test_image_pred_probs = test_preds[target_index]
  test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]

  # Plot the image
  ax.imshow(test_image.astype("uint8"))

  # Create sample title
  title = f"""True: {test_image_truth_label}
  Pred: {test_image_pred_class}
  Prob: {np.max(test_image_pred_probs):.2f}"""

  # Colour the title based on correctness of pred
  ax.set_title(title,
               color="green" if test_image_truth_label == test_image_pred_class else "red")
  ax.axis("off")
# Choose a random 10 indexes from the test data and compare the values
import random

random.seed(42) # try changing the random seed or commenting it out for different values
random_indexes = random.sample(range(len(test_ds_images)), 10)

# Create a plot with multiple subplots
fig, axes = plt.subplots(2, 5, figsize=(15, 7))

# Loop through the axes of the plot
for i, ax in enumerate(axes.flatten()):
  target_index = random_indexes[i] # get a random index (this is another reason we didn't shuffle the test set)

  # Get relevant target image, label, prediction and prediction probabilities
  test_image = test_ds_images[target_index]
  test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
  test_image_pred_probs = test_preds[target_index]
  test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]

  # Plot the image
  ax.imshow(test_image.astype("uint8"))

  # Create sample title
  title = f"""True: {test_image_truth_label}
  Pred: {test_image_pred_class}
  Prob: {np.max(test_image_pred_probs):.2f}"""

  # Colour the title based on correctness of pred
  ax.set_title(title,
               color="green" if test_image_truth_label == test_image_pred_class else "red")
  ax.axis("off")

Woah, looks like our model does quite well!

Try commenting out the random.seed() line and inspecting a few more dog photos, you might notice that model doesn't get too many wrong!

Finding the accruacy per class¶

Our model's overall accuracy is ~90%.

This is an outstanding result.

But what about the accuracy per class?

As in, how did the boxer class perform?

Or the australian_terrier?

You'll see on the original Stanford Dogs Dataset website that the authors reported the accuracy per class of each of the dog breeds. Their best performing class, african_hunting_dog achieved close to 60% accuracy (about ~58% if I'm reading the graph correctly).

Results from the original Stanford Dogs Dataset paper (2011). Let's see if the model we trained performs better than it.

How about we try and replicate the same plot with our own results?

First, let's create a DataFrame with information about our test predictions and test samples.

We'll start by getting the argmax of the test predictions as well as the test labels.

Then we'll get the maximum prediction probabilities for each sample.

And then we'll put it all into a DataFrame!

In [115]:

Copied!





# Get argmax labels of test predictions and test ground truth
test_preds_labels = test_preds.argmax(axis=-1)
test_ds_labels_argmax = test_ds_labels.argmax(axis=-1)

# Get highest prediction probability of test predictions
test_pred_probs_max = tf.reduce_max(test_preds, axis=-1).numpy() # extract NumPy since pandas doesn't handle TensorFlow Tensors

# Create DataFram of test results
test_results_df = pd.DataFrame({"test_pred_label": test_preds_labels,
                                "test_pred_prob": test_pred_probs_max,
                                "test_pred_class_name": [class_names[test_pred_label] for test_pred_label in test_preds_labels],
                                "test_truth_label": test_ds_labels_argmax,
                                "test_truth_class_name": [class_names[test_truth_label] for test_truth_label in test_ds_labels_argmax]})

# Create a column whether or not the prediction matches the label
test_results_df["correct"] = test_results_df["test_pred_class_name"] == test_results_df["test_truth_class_name"]

test_results_df.head()
# Get argmax labels of test predictions and test ground truth
test_preds_labels = test_preds.argmax(axis=-1)
test_ds_labels_argmax = test_ds_labels.argmax(axis=-1)

# Get highest prediction probability of test predictions
test_pred_probs_max = tf.reduce_max(test_preds, axis=-1).numpy() # extract NumPy since pandas doesn't handle TensorFlow Tensors

# Create DataFram of test results
test_results_df = pd.DataFrame({"test_pred_label": test_preds_labels,
                                "test_pred_prob": test_pred_probs_max,
                                "test_pred_class_name": [class_names[test_pred_label] for test_pred_label in test_preds_labels],
                                "test_truth_label": test_ds_labels_argmax,
                                "test_truth_class_name": [class_names[test_truth_label] for test_truth_label in test_ds_labels_argmax]})

# Create a column whether or not the prediction matches the label
test_results_df["correct"] = test_results_df["test_pred_class_name"] == test_results_df["test_truth_class_name"]

test_results_df.head()

Out[115]:

	test_pred_label	test_pred_prob	test_pred_class_name	test_truth_class_name	correct
0	0	0.974350	affenpinscher	affenpinscher	True
1	0	0.694450	affenpinscher	affenpinscher	True
2	0	0.993829	affenpinscher	affenpinscher	True
3	44	0.691742	flat_coated_retriever	affenpinscher	False
4	0	0.989754	affenpinscher	affenpinscher	True

What a cool looking DataFrame!

Now we can perform some further analysis.

Such as getting the accuracy per class.

We can do so by grouping the test_results_df via the "test_truth_class_name" column and then taking the mean of the "correct" column.

We can then create a new DataFrame based on this view and sort the values by correctness (e.g. the classes with the highest performance should be up the top).

In [116]:

Copied!





# Calculate accuracy per class
accuracy_per_class = test_results_df.groupby("test_truth_class_name")["correct"].mean()

# Create new DataFrame to sort classes by accuracy
accuracy_per_class_df = pd.DataFrame(accuracy_per_class).reset_index().sort_values("correct", ascending=False)
accuracy_per_class_df.head()
# Calculate accuracy per class
accuracy_per_class = test_results_df.groupby("test_truth_class_name")["correct"].mean()

# Create new DataFrame to sort classes by accuracy
accuracy_per_class_df = pd.DataFrame(accuracy_per_class).reset_index().sort_values("correct", ascending=False)
accuracy_per_class_df.head()

Out[116]:

	test_truth_class_name	correct
10	bedlington_terrier	1.000000
62	keeshond	1.000000
30	chow	0.989583
92	saint_bernard	0.985714
2	african_hunting_dog	0.985507

Woah! Looks like we've got a fair few dog classes with close to (or exactly) 100% accuracy!

That's outstanding!

Now let's recreate the horizontal bar plot used on the original Stanford Dogs research paper page.

In [117]:

Copied!





# Let's create a horizontal bar chart to replicate a similar plot to the original Stanford Dogs page
plt.figure(figsize=(10, 17))
plt.barh(y=accuracy_per_class_df["test_truth_class_name"],
         width=accuracy_per_class_df["correct"])
plt.xlabel("Accuracy")
plt.ylabel("Class Name")
plt.title("Dog Vision Accuracy per Class")
plt.ylim(-0.5, len(accuracy_per_class_df["test_truth_class_name"]) - 0.5)  # Adjust y-axis limits to reduce white space
plt.gca().invert_yaxis()  # This will display the first class at the top
plt.tight_layout()
plt.show()
# Let's create a horizontal bar chart to replicate a similar plot to the original Stanford Dogs page
plt.figure(figsize=(10, 17))
plt.barh(y=accuracy_per_class_df["test_truth_class_name"],
         width=accuracy_per_class_df["correct"])
plt.xlabel("Accuracy")
plt.ylabel("Class Name")
plt.title("Dog Vision Accuracy per Class")
plt.ylim(-0.5, len(accuracy_per_class_df["test_truth_class_name"]) - 0.5)  # Adjust y-axis limits to reduce white space
plt.gca().invert_yaxis()  # This will display the first class at the top
plt.tight_layout()
plt.show()

Goodness me!

Looks like our model performs incredibly well across all the vast majority of classes.

Comparing it to the original Stanford Dogs horizontal bar graph we can see that their best performing class got close to 60% accuracy.

However, it's only when we take a look at our worst performing classes do we see a handful of classes just under 60% accuracy.

In [118]:

Copied!

# Inspecting our worst performing classes (note how only a couple of classes perform at ~55% accuracy or below)
accuracy_per_class_df.tail()
# Inspecting our worst performing classes (note how only a couple of classes perform at ~55% accuracy or below)
accuracy_per_class_df.tail()

Out[118]:

	test_truth_class_name	correct
104	staffordshire_bullterrier	0.672727
76	miniature_poodle	0.654545
90	rhodesian_ridgeback	0.638889
71	malamute	0.615385
101	siberian_husky	0.271739

What an awesome result!

We've now replicated and even vastly improved a Stanford research paper.

You should be proud!

Now we've seen how well our model performs, how about we check where its performed poorly?

Finding the most wrong examples¶

A great way to inspect your models errors is to find the examples where the prediction had a high probability but the prediction was wrong.

This is often called the "most wrong" samples.

As in the model was very confident but wrong.

Let's filter for the top 100 most wrong by sorting the incorrect predictions by the "test_pred_prob" column.

In [119]:

Copied!

# Get most wrong
top_100_most_wrong = test_results_df[test_results_df["correct"] == 0].sort_values("test_pred_prob", ascending=False)[:100]
top_100_most_wrong.head()
# Get most wrong
top_100_most_wrong = test_results_df[test_results_df["correct"] == 0].sort_values("test_pred_prob", ascending=False)[:100]
top_100_most_wrong.head()

Out[119]:

	test_pred_label	test_pred_prob	test_pred_class_name	test_truth_label	test_truth_class_name	correct
2727	75	0.997043	miniature_pinscher	38	doberman	False
5480	44	0.995325	flat_coated_retriever	78	newfoundland	False
6884	54	0.994142	groenendael	95	schipperke	False
4155	55	0.987126	ibizan_hound	60	italian_greyhound	False
1715	85	0.984834	pekinese	22	brabancon_griffon	False

One way would be to inspect these most wrong predictions would be to go through the different breeds one by one and see why the model would've confused them.

Such as comparing miniature_pinscher to doberman (two quite similar looking dog breeds).

Alternatively, we could get a random 10 samples and plot them to see what they look like.

Let's do the latter!

In [120]:

Copied!

# Get 10 random indexes of "most wrong" predictions
top_100_most_wrong.sample(n=10).index
# Get 10 random indexes of "most wrong" predictions
top_100_most_wrong.sample(n=10).index

Out[120]:

Index([2001, 1715, 8112, 1642, 5480, 6383, 7363, 4155, 7895, 4105], dtype='int64')

How about we plot these indexes?

In [121]:

Copied!





# Choose a random 10 indexes from the test data and compare the values
import random

random_most_wrong_indexes = top_100_most_wrong.sample(n=10).index

# Iterate through test results and plot them
# Note: This is why we don't shuffle the test data, so that it's in original order when we evaluate it.
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
  target_index = random_most_wrong_indexes[i]

  # Get relevant target image, label, prediction and prediction probabilities
  test_image = test_ds_images[target_index]
  test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
  test_image_pred_probs = test_preds[target_index]
  test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]

  # Plot the image
  ax.imshow(test_image.astype("uint8"))

  # Create sample title
  title = f"""True: {test_image_truth_label}
  Pred: {test_image_pred_class}
  Prob: {np.max(test_image_pred_probs):.2f}"""

  # Colour the title based on correctness of pred
  ax.set_title(title,
               color="green" if test_image_truth_label == test_image_pred_class else "red",
               fontsize=10)
  ax.axis("off")
# Choose a random 10 indexes from the test data and compare the values
import random

random_most_wrong_indexes = top_100_most_wrong.sample(n=10).index

# Iterate through test results and plot them
# Note: This is why we don't shuffle the test data, so that it's in original order when we evaluate it.
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
  target_index = random_most_wrong_indexes[i]

  # Get relevant target image, label, prediction and prediction probabilities
  test_image = test_ds_images[target_index]
  test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
  test_image_pred_probs = test_preds[target_index]
  test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]

  # Plot the image
  ax.imshow(test_image.astype("uint8"))

  # Create sample title
  title = f"""True: {test_image_truth_label}
  Pred: {test_image_pred_class}
  Prob: {np.max(test_image_pred_probs):.2f}"""

  # Colour the title based on correctness of pred
  ax.set_title(title,
               color="green" if test_image_truth_label == test_image_pred_class else "red",
               fontsize=10)
  ax.axis("off")

Inspecting the "most wrong" examples, it's easy to see where the model got confused.

These samples can show us where we might want to collect more data or correct our data's labels.

Speaking of confused, how about we make a confusion matrix for further evaluation?

Create a confusion matrix¶

A confusion matrix helps to visualize which classes a predicted compared to which classes it should've predicted (truth vs. predictions).

We can create one using Scikit-Learn's sklearn.metrics.confusion_matrix and passing in our y_true and y_pred values.

And then we can display it using sklearn.metrics.ConfusionMatrixDisplay.

Note: Since we have 120 different classes, running the code bellow to show the confusion matrix plot may take a minute or so to load (it's quite a big plot!).

In [122]:

Copied!





from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Create a confusion matrix
confusion_matrix_dog_preds = confusion_matrix(y_true=test_ds_labels_argmax, # requires all labels to be in same format (e.g. not one-hot)
                                              y_pred=test_preds_labels)
# Create a confusion matrix plot
confusion_matrix_display = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix_dog_preds,
                                                  display_labels=class_names)
fig, ax = plt.subplots(figsize=(25, 25))
ax.set_title("Dog Vision Confusion Matrix")
confusion_matrix_display.plot(xticks_rotation="vertical",
                              cmap="Blues",
                              colorbar=False,
                              ax=ax);
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Create a confusion matrix
confusion_matrix_dog_preds = confusion_matrix(y_true=test_ds_labels_argmax, # requires all labels to be in same format (e.g. not one-hot)
                                              y_pred=test_preds_labels)
# Create a confusion matrix plot
confusion_matrix_display = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix_dog_preds,
                                                  display_labels=class_names)
fig, ax = plt.subplots(figsize=(25, 25))
ax.set_title("Dog Vision Confusion Matrix")
confusion_matrix_display.plot(xticks_rotation="vertical",
                              cmap="Blues",
                              colorbar=False,
                              ax=ax);

Now that's one big confusion matrix!

It looks like most of the darker blue boxes are down the middle diagonal (we we'd like them to be).

But there are a few instances where the model confuses classes such as scottish_deerhound and irish_wolfhound.

And looking up those two breeds we can see that they look visually similar.

11. Save and load the best model¶

We've covered a lot of ground from loading data to training and evaluating a model.

But what if you wanted to use that model somewhere else?

Such as on a website or in an application?

The first step is saving it to file.

We can save our model using the tf.keras.Model.save() method and specifying the filepath as well as the save_format parameters.

We'll use filepath="dog_vision_model.keras" as well as save_format="keras' to save our model to the new and versatile .keras format.

Let's save our best performing model_1.

Note: You may also see models being saved with the SavedModel format as well as HDF5 formats, however, it's recommended to use the newer .keras format. See the TensorFlow documentation on saving and loading a model for more.

In [123]:

Copied!





# Save the model to .keras
model_save_path = "dog_vision_model.keras"
model_1.save(filepath=model_save_path,
             save_format="keras")
# Save the model to .keras
model_save_path = "dog_vision_model.keras"
model_1.save(filepath=model_save_path,
             save_format="keras")

Model saved!

And we can load it back in using the tf.keras.models.load_model() method.

In [124]:

Copied!

# Load the model
loaded_model = tf.keras.models.load_model(filepath=model_save_path)
# Load the model
loaded_model = tf.keras.models.load_model(filepath=model_save_path)

And now we can evaluate our loaded_model to make sure it performs well on the test dataset.

In [125]:

Copied!

# Evaluate the loaded model
loaded_model_results = loaded_model.evaluate(test_ds)
# Evaluate the loaded model
loaded_model_results = loaded_model.evaluate(test_ds)

269/269 [==============================] - 15s 47ms/step - loss: 0.3753 - accuracy: 0.8767

How about we check if the loaded_model_results are the same as the model_1_results?

In [126]:

Copied!

assert model_1_results == loaded_model_results
assert model_1_results == loaded_model_results

Our trained model and loaded model results are the same!

We could now use our dog_vision_model.keras file in an application to predict a dog breed based on an image.

Note: If you're using Google Colab, remember that after a period of time if you Google Colab instance gets disconnected, it will delete all local files. So if you want to keep your dog_vision_model.keras be sure to download it or copy it to Google Drive.

12. Make predictions on custom images with the best model¶

Now what fun would it be if we only made predictions on the test dataset?

How about we see how our model goes on real world images?

That's the whole goal of machine learning right? To see how your model goes in the real world?

Well, let's make some predictions on custom images!

Specifically, let's try our best model on images of my dogs (Bella 🐶 and Seven 7️⃣, yes, Seven is her actual name) and an extra wildcard image.

We can download the photos from the course GitHub.

In [127]:

Copied!

# Download a set of custom images from GitHub and unzip them
!wget -nc https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip
!unzip dog-photos.zip
# Download a set of custom images from GitHub and unzip them
!wget -nc https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip
!unzip dog-photos.zip

--2024-04-26 01:43:26--  https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/images/dog-photos.zip [following]
--2024-04-26 01:43:26--  https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/images/dog-photos.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1091355 (1.0M) [application/zip]
Saving to: ‘dog-photos.zip’

dog-photos.zip      100%[===================>]   1.04M  --.-KB/s    in 0.05s   

2024-04-26 01:43:27 (21.6 MB/s) - ‘dog-photos.zip’ saved [1091355/1091355]

Archive:  dog-photos.zip
  inflating: dog-photo-4.jpeg        
  inflating: dog-photo-1.jpeg        
  inflating: dog-photo-2.jpeg        
  inflating: dog-photo-3.jpeg

Wonderful! We can inspect our images in the file browser and see that they're under the name dog-photo-*.jpeg.

How about we iterate through them and visualize each one?

In [128]:

Copied!





# Create list of paths for custom dog images
custom_image_paths = ["dog-photo-1.jpeg",
                      "dog-photo-2.jpeg",
                      "dog-photo-3.jpeg",
                      "dog-photo-4.jpeg"]

# Iterate through list of dog images and plot each one
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
  ax.imshow(plt.imread(custom_image_paths[i]))
  ax.axis("off")
  ax.set_title(custom_image_paths[i])
# Create list of paths for custom dog images
custom_image_paths = ["dog-photo-1.jpeg",
                      "dog-photo-2.jpeg",
                      "dog-photo-3.jpeg",
                      "dog-photo-4.jpeg"]

# Iterate through list of dog images and plot each one
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
  ax.imshow(plt.imread(custom_image_paths[i]))
  ax.axis("off")
  ax.set_title(custom_image_paths[i])

What?

The first three photos look well and good but we can see dog-photo-4.jpeg is a photo of me in a black hoodie pulling a blue steel face.

We'll see why this is later.

For now, let's use our loaded_model to try and make a prediction on the first dog image (dog-photo-1.jpeg)!

We can do so with the predict() method.

In [129]:

Copied!

# Try and make a prediction on the first dog image
loaded_model.predict("dog-photo-1.jpeg")
# Try and make a prediction on the first dog image
loaded_model.predict("dog-photo-1.jpeg")

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-129-336b90293288> in <cell line: 2>()
      1 # Try and make a prediction on the first dog image
----> 2 loaded_model.predict("dog-photo-1.jpeg")

/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor_shape.py in __getitem__(self, key)
    960       else:
    961         if self._v2_behavior:
--> 962           return self._dims[key]
    963         else:
    964           return self.dims[key]

IndexError: tuple index out of range

Oh no!

We get an error:

IndexError: tuple index out of range

This is a little hard to understand. But we can see the code is trying to get the shape of our image.

However, we didn't pass an image to the predict() method.

We only passed a filepath.

Our model expects inputs in the same format it was trained on.

So let's load our image and resize it.

We can do so with tf.keras.utils.load_img().

In [130]:

Copied!





# Load the image (into PIL format)
custom_image = tf.keras.utils.load_img(
  path="dog-photo-1.jpeg",
  color_mode="rgb",
  target_size=IMG_SIZE, # (224, 224) or (img_height, img_width)
)

type(custom_image), custom_image
# Load the image (into PIL format)
custom_image = tf.keras.utils.load_img(
  path="dog-photo-1.jpeg",
  color_mode="rgb",
  target_size=IMG_SIZE, # (224, 224) or (img_height, img_width)
)

type(custom_image), custom_image

Out[130]:

(PIL.Image.Image, <PIL.Image.Image image mode=RGB size=224x224>)

Excellent, we've loaded our first custom image.

But now let's turn our image into a tensor (our model was trained on image tensors, so it expects image tensors as input).

We can convert our image from PIL format to array format with tf.keras.utils.img_to_array().

In [131]:

Copied!

# Turn the image into a tensor
custom_image_tensor = tf.keras.utils.img_to_array(custom_image)
custom_image_tensor.shape
# Turn the image into a tensor
custom_image_tensor = tf.keras.utils.img_to_array(custom_image)
custom_image_tensor.shape

Out[131]:

(224, 224, 3)

Nice! We've got an image tensor of shape (224, 224, 3).

How about we make a prediction on it?

In [132]:

Copied!

# Make a prediction on our custom image tensor
loaded_model.predict(custom_image_tensor)
# Make a prediction on our custom image tensor
loaded_model.predict(custom_image_tensor)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-132-bd82d1e41fed> in <cell line: 2>()
      1 # Make a prediction on our custom image tensor
----> 2 loaded_model.predict(custom_image_tensor)

/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py in tf__predict_function(iterator)
     13                 try:
     14                     do_return = True
---> 15                     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16                 except:
     17                     do_return = False

ValueError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2440, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2425, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2413, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2381, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(32, 224, 3)

What?!?

We get another error...

ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(32, 224, 3)

Hmm.

Looks like our model is expecting a batch size dimension on our input tensor.

We can do this by either turning the input tensor into a single element array or by using tf.expand_dims(input, axis=0) to expand the dimenion of the tensor on 0th axis.

In [133]:

Copied!

# Option 1: Add batch dimension to custom_image_tensor
print(f"Shape of custom image tensor: {np.array([custom_image_tensor]).shape}")
print(f"Shape of custom image tensor: {tf.expand_dims(custom_image_tensor, axis=0).shape}")
# Option 1: Add batch dimension to custom_image_tensor
print(f"Shape of custom image tensor: {np.array([custom_image_tensor]).shape}")
print(f"Shape of custom image tensor: {tf.expand_dims(custom_image_tensor, axis=0).shape}")

Shape of custom image tensor: (1, 224, 224, 3)
Shape of custom image tensor: (1, 224, 224, 3)

Wonderful! We've now got a custom image tensor of shape (1, 224, 224, 3) ((batch_size, img_height, img_width, colour_channels)).

Let's try and predict!

In [134]:

Copied!

# Get prediction probabilities from our mdoel
pred_probs = loaded_model.predict(tf.expand_dims(custom_image_tensor, axis=0))
pred_probs
# Get prediction probabilities from our mdoel
pred_probs = loaded_model.predict(tf.expand_dims(custom_image_tensor, axis=0))
pred_probs

1/1 [==============================] - 2s 2s/step

Out[134]:

array([[1.83611644e-06, 3.09535017e-06, 3.86047805e-06, 3.19048486e-05,
        1.66974694e-03, 1.27542022e-04, 7.03033629e-06, 1.19856362e-04,
        1.01050091e-05, 3.87266744e-04, 6.44192414e-06, 1.67636438e-06,
        8.94749770e-04, 5.01931618e-06, 1.60283549e-03, 9.41093604e-05,
        4.67637838e-05, 8.51367513e-05, 5.67736897e-05, 6.14693909e-06,
        2.67342989e-06, 1.47549901e-04, 4.17501433e-05, 3.90995192e-05,
        9.50478498e-05, 1.47656752e-02, 3.08718845e-05, 1.58209339e-04,
        8.39364156e-03, 1.17800606e-03, 2.69454729e-04, 1.02170045e-04,
        7.42143384e-05, 8.22680071e-04, 1.73064705e-04, 8.98789040e-06,
        6.77722392e-06, 2.46034167e-03, 1.21447938e-05, 3.06540052e-04,
        1.12927992e-04, 1.30907722e-06, 1.19819895e-04, 3.28008295e-03,
        4.22435085e-04, 2.56334723e-04, 6.35078293e-04, 6.96951101e-05,
        1.82968670e-05, 6.66733533e-02, 1.65604251e-06, 4.85742465e-04,
        3.82422912e-03, 4.36909148e-04, 1.34899176e-06, 4.04351122e-05,
        2.30197293e-05, 7.29483800e-05, 1.31009811e-05, 1.30437169e-04,
        1.27625071e-05, 3.21804691e-06, 6.78410470e-06, 3.72191658e-03,
        9.23305777e-07, 4.05427454e-06, 1.32554891e-02, 8.34832132e-01,
        1.84010264e-06, 5.39118366e-04, 2.44915718e-05, 1.35658804e-04,
        9.53144918e-04, 3.80869096e-05, 3.43683018e-06, 3.57066506e-06,
        2.41459438e-05, 2.93612948e-06, 1.27533756e-04, 2.15716864e-05,
        3.21038242e-05, 7.87725276e-06, 1.70349504e-05, 4.27997729e-05,
        5.72475437e-06, 1.81680916e-05, 1.28094471e-04, 7.12008550e-05,
        8.24760180e-04, 6.14038622e-03, 4.27179504e-03, 3.55221750e-03,
        1.20739173e-03, 4.15856484e-04, 1.61429329e-04, 1.58363022e-04,
        3.78229856e-06, 1.03004022e-05, 2.00551622e-05, 1.21213234e-04,
        2.68000053e-06, 1.00253812e-04, 4.04065868e-05, 9.84299404e-05,
        1.29673525e-03, 3.07669543e-05, 1.62672077e-05, 1.17529435e-05,
        3.74953932e-04, 4.74653389e-05, 1.00191637e-05, 1.36496616e-04,
        3.76833777e-05, 1.55215133e-02, 2.33796614e-04, 1.01105807e-05,
        8.56942424e-05, 1.37508148e-04, 3.79100857e-06, 1.04301716e-05]],
      dtype=float32)

It worked!!!

Our model output a tensor of prediction probabilities.

We can find the predicted label by taking the argmax of the pred_probs tensor.

And we get the predicted class name by indexing on the class_names list using the predicted label.

In [135]:

Copied!





# Get the predicted class label
pred_label = tf.argmax(pred_probs, axis=-1).numpy()[0]

# Get the predicted class name
pred_class_name = class_names[pred_label]

print(f"Predicted class label: {pred_label}")
print(f"Predicted class name: {pred_class_name}")
# Get the predicted class label
pred_label = tf.argmax(pred_probs, axis=-1).numpy()[0]

# Get the predicted class name
pred_class_name = class_names[pred_label]

print(f"Predicted class label: {pred_label}")
print(f"Predicted class name: {pred_class_name}")

Predicted class label: 67
Predicted class name: labrador_retriever

Ho ho! That's looking good!

In summary, a model wants to make predictions on data in the same shape and format it was trained on.

So if you trained a model on image tensors with a certain shape and datatype, your model will want to make predictions on the same kind of image tensors with the same shape and datatype.

How about we try make predictions on multiple images?

To do so, let's make a function which replicates the workflow from above.

In [136]:

Copied!





def pred_on_custom_image(image_path: str,  # Path to the image file
                         model,  # Trained TensorFlow model for prediction
                         target_size: tuple[int, int] = (224, 224),  # Desired size of the image for input to the model
                         class_names: list = None,  # List of class names (optional for plotting)
                         plot: bool = True): # Whether to plot the image and predicted class
  """
  Loads an image, preprocesses it, makes a prediction using a provided model,
  and optionally plots the image with the predicted class.

  Args:
      image_path (str): Path to the image file.
      model: Trained TensorFlow model for prediction.
      target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
      class_names (list, optional): List of class names for plotting. Defaults to None.
      plot (bool, optional): Whether to plot the image and predicted class. Defaults to True.

  Returns:
     str: The predicted class.
  """

  # Prepare and load image
  custom_image = tf.keras.utils.load_img(
    path=image_path,
    color_mode="rgb",
    target_size=target_size,
  )

  # Turn the image into a tensor
  custom_image_tensor = tf.keras.utils.img_to_array(custom_image)

  # Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
  custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)

  # Make a prediction with the target model
  pred_probs = model.predict(custom_image_tensor)

  # pred_probs = tf.keras.activations.softmax(tf.constant(pred_probs))
  pred_class = class_names[tf.argmax(pred_probs, axis=-1).numpy()[0]]

  # Plot if we want
  if not plot:
    return pred_class, pred_probs
  else:
    plt.figure(figsize=(5, 3))
    plt.imshow(plt.imread(image_path))
    plt.title(f"pred: {pred_class}\nprob: {tf.reduce_max(pred_probs):.3f}")
    plt.axis("off")
def pred_on_custom_image(image_path: str,  # Path to the image file
                         model,  # Trained TensorFlow model for prediction
                         target_size: tuple[int, int] = (224, 224),  # Desired size of the image for input to the model
                         class_names: list = None,  # List of class names (optional for plotting)
                         plot: bool = True): # Whether to plot the image and predicted class
  """
  Loads an image, preprocesses it, makes a prediction using a provided model,
  and optionally plots the image with the predicted class.

  Args:
      image_path (str): Path to the image file.
      model: Trained TensorFlow model for prediction.
      target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
      class_names (list, optional): List of class names for plotting. Defaults to None.
      plot (bool, optional): Whether to plot the image and predicted class. Defaults to True.

  Returns:
     str: The predicted class.
  """

  # Prepare and load image
  custom_image = tf.keras.utils.load_img(
    path=image_path,
    color_mode="rgb",
    target_size=target_size,
  )

  # Turn the image into a tensor
  custom_image_tensor = tf.keras.utils.img_to_array(custom_image)

  # Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
  custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)

  # Make a prediction with the target model
  pred_probs = model.predict(custom_image_tensor)

  # pred_probs = tf.keras.activations.softmax(tf.constant(pred_probs))
  pred_class = class_names[tf.argmax(pred_probs, axis=-1).numpy()[0]]

  # Plot if we want
  if not plot:
    return pred_class, pred_probs
  else:
    plt.figure(figsize=(5, 3))
    plt.imshow(plt.imread(image_path))
    plt.title(f"pred: {pred_class}\nprob: {tf.reduce_max(pred_probs):.3f}")
    plt.axis("off")

What a good looking function!

How about we try it out on dog-photo-2.jpeg?

In [137]:

Copied!





# Make prediction on custom dog photo 2
pred_on_custom_image(image_path="dog-photo-2.jpeg",
                     model=loaded_model,
                     class_names=class_names)
# Make prediction on custom dog photo 2
pred_on_custom_image(image_path="dog-photo-2.jpeg",
                     model=loaded_model,
                     class_names=class_names)

1/1 [==============================] - 0s 27ms/step

Woohoo!!! Our model got it right!

Let's repeat the process for our other custom images.

In [138]:

Copied!





# Predict on multiple images
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
  image_path = custom_image_paths[i]
  pred_class, pred_probs = pred_on_custom_image(image_path=image_path,
                                                model=loaded_model,
                                                class_names=class_names,
                                                plot=False)
  ax.imshow(plt.imread(image_path))
  ax.set_title(f"pred: {pred_class}\nprob: {tf.reduce_max(pred_probs):.3f}")
  ax.axis("off");
# Predict on multiple images
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
  image_path = custom_image_paths[i]
  pred_class, pred_probs = pred_on_custom_image(image_path=image_path,
                                                model=loaded_model,
                                                class_names=class_names,
                                                plot=False)
  ax.imshow(plt.imread(image_path))
  ax.set_title(f"pred: {pred_class}\nprob: {tf.reduce_max(pred_probs):.3f}")
  ax.axis("off");

1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 26ms/step
1/1 [==============================] - 0s 25ms/step
1/1 [==============================] - 0s 28ms/step

Epic!!

Our Dog Vision 🐶👁 model has come to life!

Looks like our model got it right for 3/4 of our custom dog photos (my dogs Bella and Seven are labrador retrievers, with a potential mix of something else).

But the model seemed to also think the photo of me was a soft_coated_wheaten_terrier (note: due to the randomness of machine learning, your result may be different here, if so, please let me know, I'd love to see what other kinds of dogs the model thinks I am :D).

You might be wondering, why does it do this?

It's because our model has been strictly trained to always predict a dog breed no matter what image it recieves.

So no matter what image we pass to our model, it will always predict a certain dog breed.

You can try this with your own images.

How would you fix this?

One way would be to train another model to predict whether the input image is of a dog or is not of a dog.

And then only letting our Dog Vision 🐶👁 model predict on the images that are of dogs.

Example of combining multiple machine learning models to create a workflow. One model for detecting food (Food Not Food) and another model for identifying what food is in the image (FoodVision, similar to Dog Vision). If an app is designed to take photos of food, taking photos of objects that aren't food and having them identified as food can be a poor customer experience. Source: Nutrify.

These are some of the workflows you'll have to think about when you eventually deploy your own machine learning models.

Machine learning models are often very powerful.

But they aren't perfect.

Implementing guidelines and checks around them is still a very active area of research.

13. Key Takeaways¶

Data, data, data! In any machine learning problem, getting a dataset and preparing it so that it is in a usable format will likely be the first and often most important step (hence why we spent so much time getting the data ready). It will also be an ongoing process, as although we've worked with thousands of dog images, our models could still be improved. And as we saw going from training with 10% of the data to 100% of the data, one of the best ways to improve a model is with more data. Explore your data early and often.
When starting out, use transfer learning where possible. For most new problems, you should generally look to see if a pretrained model exists and see if you can adapt it to your use case. Ask yourself: What format is my data in? What are my ideal inputs and outputs? Is there a pretrained model for my use case?
TensorFlow and Keras provide building blocks for neural networks which are powerful machine learning models capable of learning patterns in a wide range of data from text to audio to images and more.
Experiment, experiment, experiment! It's highly unlikely you'll ever get the best performing model on your first try. Machine learning is very experimental by nature. This includes experimenting on the data, the model, the training setup and the outputs (how does your model work in practice?). Always keep this front of mind in any machine learning project. Your results are never stationary and can often always be improved.

Extensions & Exercises¶

The following are a series of exercises and extensions which build on what we've covered throughout this module.

I'd highly recommend going thorugh each one and spending time practicing what you've learned.

This is where the real knowledge is built. Trying things out for yourself.

Try a prediction with our trained model on your own images of dogs and see if the model is correct.
Try training another model from tf.keras.applications (e.g. ConvNeXt) and see if it performs better than EfficientNetV2.
Try training a model on your own images in different classes, for example, apple vs. banana vs. orange. You could download images from the internet and sort them into different folders and then load them how we've done in the data loading section. Or you could take photos of your own and build a model to differentiate between them.
For more advanced model training, you may want to look into the concept of "Callbacks", these are functions which run during the model training. TensorFlow and Keras have a series of built-in callbacks which can be helpful for training. Have a read of the tf.keras.callbacks.Callback documentation and see which ones may be useful to you.
We touched on the concept of overfitting when we trained our model. This is when a model performs far better on the training set than on the test set. The concept of trying to prevent overfitting is known as regularization. Spend 20-minutes researching "ways to prevent overfitting" and write a list of 2-3 techniques and how they might come into play with our model training. Tip: One of the most common regularization techniques in computer vision is data augmentation (also see the brief example below).
One of the most important parts of machine learning is having good data. The next most important part is loading that data in a way that can used to train models as fast and efficiently as possible. For more on this, I'd highly recommend reading more about the tf.data API (this API is TensorFlow focused, however, the concepts can be bridged to other dataloading needs) as well as reviewing the tf.data best practices (better performance with the tf.data API).
Right now our model works well, however, we have to write code to interact with it. You could turn it into a small machine learning app using Gradio so people can upload their own images of dogs and see what the model predicts. See the example for image classification with TensorFlow and Keras for an idea of what you could build. See an example of this below as well as a running demo of Dog Vision on Hugging Face.

In this project we've only really scratched the surface of what's possible with TensorFlow/Keras and deep learning.

For a more comprehensive overview of TensorFlow/Keras, see the following:

14-hour TensorFlow Tutorial on YouTube (this is the first 14 hours of the ZTM TensorFlow course).
Zero to Mastery TensorFlow for Deep Learning course (a 50+ hour course diving into many applications of TensorFlow and deep learning).

Extension example: data augmentation¶

Data augmentation is a regularization technique to help prevent overfitting.

It's designed to alter training images to artifically increase the diversity of the training dataset and hopefully help to generalize better to test images as well as real-life images.

For example, we want our models to be able to identify the same breed of dog in an image regardless if the dog is facing left or right.

So one simple data augmentation technique is to randomly flip the image horizontally so the model learns to recognize the same dog from different points of view.

You can repeat this for many different types of image modifications such as rotation, zone, colour alterations and more.

The following code is a brief example of how to incorporate a data augmentation layer into a model (note that in practice data augmentation is only applied during training time and not during testing/prediction time, this is set automatically within layers aimed at data augmentation).

For more, see the TensorFlow guide on data augmentation.

In [139]:

Copied!





from tensorflow.keras import layers

# Note: Could functionize all of this

# Setup hyperparameters
img_size = 224
num_classes = 120

# Create data augmentation layer
data_augmentation_layer = tf.keras.Sequential(
    [
        layers.RandomFlip("horizontal"), # randomly flip image across horizontal axis
        layers.RandomRotation(factor=0.2), # randomly rotate image
        layers.RandomZoom(height_factor=0.2, width_factor=0.2) # randomly zoom into image
        # More augmentation can go here
    ],
    name="data_augmentation"
)

# Setup base model
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False,
    weights='imagenet',
    input_shape=(img_size, img_size, 3),
    include_preprocessing=True
)

# Freeze the base model
base_model.trainable = False

# Create new model
inputs = tf.keras.Input(shape=(224, 224, 3))

# Create data augmentation
x = data_augmentation_layer(inputs)

# Craft model
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(num_classes,
                                name="output_layer",
                                activation="softmax")(x) # Note: If you have "softmax" activation, use from_logits=False in loss function
model_2 = tf.keras.Model(inputs, outputs, name="model_2")

# Uncomment for full model summary with augmentation layers
# model_2.summary()
from tensorflow.keras import layers

# Note: Could functionize all of this

# Setup hyperparameters
img_size = 224
num_classes = 120

# Create data augmentation layer
data_augmentation_layer = tf.keras.Sequential(
    [
        layers.RandomFlip("horizontal"), # randomly flip image across horizontal axis
        layers.RandomRotation(factor=0.2), # randomly rotate image
        layers.RandomZoom(height_factor=0.2, width_factor=0.2) # randomly zoom into image
        # More augmentation can go here
    ],
    name="data_augmentation"
)

# Setup base model
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False,
    weights='imagenet',
    input_shape=(img_size, img_size, 3),
    include_preprocessing=True
)

# Freeze the base model
base_model.trainable = False

# Create new model
inputs = tf.keras.Input(shape=(224, 224, 3))

# Create data augmentation
x = data_augmentation_layer(inputs)

# Craft model
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(num_classes,
                                name="output_layer",
                                activation="softmax")(x) # Note: If you have "softmax" activation, use from_logits=False in loss function
model_2 = tf.keras.Model(inputs, outputs, name="model_2")

# Uncomment for full model summary with augmentation layers
# model_2.summary()

Extension Example: Gradio App Demo¶

This is a modified version of the Gradio Image Classification Tutorial with TensorFlow and Keras.

You can see a guide on Hugging Face for how to host it on Hugging Face Spaces (a place where you can host and share your machine learning apps).

First we'll install Gradio.

In [140]:

Copied!

!pip install -q gradio
import gradio as gr
!pip install -q gradio
import gradio as gr

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.2/12.2 MB 34.0 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.9/91.9 kB 12.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 314.4/314.4 kB 33.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 10.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 141.1/141.1 kB 18.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 91.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.2/47.2 kB 4.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 8.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.9/129.9 kB 16.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 9.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 8.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.9/71.9 kB 9.7 MB/s eta 0:00:00
  Building wheel for ffmpy (setup.py) ... done
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.

Then we'll download the saved model (the same model we trained during the Dog Vision notebook) along with the assosciated labels.

I've stored my saved model as well as the Stanford Dogs class names on Hugging Face.

You can see my files at huggingface.co/spaces/mrdbourke/dog_vision.

In [141]:

Copied!





import tensorflow as tf

# Download saved model and labels from Hugging Face
!wget -q https://huggingface.co/spaces/mrdbourke/dog_vision/resolve/main/dog_vision_model_demo.keras
!wget -q https://huggingface.co/spaces/mrdbourke/dog_vision/resolve/main/stanford_dogs_class_names.txt

# Load model
model_save_path = "dog_vision_model_demo.keras"
loaded_model_for_demo = tf.keras.models.load_model(model_save_path)

# Load labels
with open("stanford_dogs_class_names.txt", "r") as f:
  class_names = [line.strip() for line in f.readlines()]
import tensorflow as tf

# Download saved model and labels from Hugging Face
!wget -q https://huggingface.co/spaces/mrdbourke/dog_vision/resolve/main/dog_vision_model_demo.keras
!wget -q https://huggingface.co/spaces/mrdbourke/dog_vision/resolve/main/stanford_dogs_class_names.txt

# Load model
model_save_path = "dog_vision_model_demo.keras"
loaded_model_for_demo = tf.keras.models.load_model(model_save_path)

# Load labels
with open("stanford_dogs_class_names.txt", "r") as f:
  class_names = [line.strip() for line in f.readlines()]

The prediction function should take in an image and return a dictionary of classes and their prediction probabilities.

In [142]:

Copied!





# Create prediction function
def pred_on_custom_image(image, # input image (preprocessed by Gradio's Image input to be numpy.array)
                         model: tf.keras.Model = loaded_model_for_demo,  # Trained TensorFlow model for prediction
                         target_size: int = 224,  # Desired size of the image for input to the model
                         class_names: list = class_names): # List of class names
  """
  Loads an image, preprocesses it, makes a prediction using a provided model,
  and returns a dictionary of prediction probabilities per class name.

  Args:
      image: Input image.
      model: Trained TensorFlow model for prediction.
      target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
      class_names (list, optional): List of class names for plotting. Defaults to None.

  Returns:
     Dict[str: float]: A dictionary of string class names and their respective prediction probability.
  """

  # Note: gradio.inputs.Image handles opening the image
  # # Prepare and load image
  # custom_image = tf.keras.utils.load_img(
  #   path=image_path,
  #   color_mode="rgb",
  #   target_size=target_size,
  # )

  # Create resizing layer to resize the image
  resize = tf.keras.layers.Resizing(height=target_size,
                                    width=target_size)

  # Turn the image into a tensor and resize it
  custom_image_tensor = resize(tf.keras.utils.img_to_array(image))

  # Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
  custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)

  # Make a prediction with the target model
  pred_probs = model.predict(custom_image_tensor)[0]

  # Predictions get returned as a dictionary of {label: pred_prob}
  pred_probs_dict = {class_names[i]: float(pred_probs[i]) for i in range(len(class_names))}

  return pred_probs_dict

interface_title = "Dog Vision 🐶👁️"
interface_description = "Identify different dogs in images with deep learning. Model trained with TensorFlow/Keras."
interface = gr.Interface(fn=pred_on_custom_image,
                         inputs=gr.Image(),
                         outputs=gr.Label(num_top_classes=3),
                         examples=["dog-photo-1.jpeg",
                                    "dog-photo-2.jpeg",
                                    "dog-photo-3.jpeg",
                                    "dog-photo-4.jpeg"],
                         title=interface_title,
                         description=interface_description)

# Uncomment to launch the interface directly in a notebook
# interface.launch(debug=True)
# Create prediction function
def pred_on_custom_image(image, # input image (preprocessed by Gradio's Image input to be numpy.array)
                         model: tf.keras.Model = loaded_model_for_demo,  # Trained TensorFlow model for prediction
                         target_size: int = 224,  # Desired size of the image for input to the model
                         class_names: list = class_names): # List of class names
  """
  Loads an image, preprocesses it, makes a prediction using a provided model,
  and returns a dictionary of prediction probabilities per class name.

  Args:
      image: Input image.
      model: Trained TensorFlow model for prediction.
      target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
      class_names (list, optional): List of class names for plotting. Defaults to None.

  Returns:
     Dict[str: float]: A dictionary of string class names and their respective prediction probability.
  """

  # Note: gradio.inputs.Image handles opening the image
  # # Prepare and load image
  # custom_image = tf.keras.utils.load_img(
  #   path=image_path,
  #   color_mode="rgb",
  #   target_size=target_size,
  # )

  # Create resizing layer to resize the image
  resize = tf.keras.layers.Resizing(height=target_size,
                                    width=target_size)

  # Turn the image into a tensor and resize it
  custom_image_tensor = resize(tf.keras.utils.img_to_array(image))

  # Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
  custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)

  # Make a prediction with the target model
  pred_probs = model.predict(custom_image_tensor)[0]

  # Predictions get returned as a dictionary of {label: pred_prob}
  pred_probs_dict = {class_names[i]: float(pred_probs[i]) for i in range(len(class_names))}

  return pred_probs_dict

interface_title = "Dog Vision 🐶👁️"
interface_description = "Identify different dogs in images with deep learning. Model trained with TensorFlow/Keras."
interface = gr.Interface(fn=pred_on_custom_image,
                         inputs=gr.Image(),
                         outputs=gr.Label(num_top_classes=3),
                         examples=["dog-photo-1.jpeg",
                                    "dog-photo-2.jpeg",
                                    "dog-photo-3.jpeg",
                                    "dog-photo-4.jpeg"],
                         title=interface_title,
                         description=interface_description)

# Uncomment to launch the interface directly in a notebook
# interface.launch(debug=True)

Save the following code to an app.py file for running on Hugging Face spaces.

Finally, you can see the running demo on Hugging Face.

Try it out with your own images of dogs and see Dog Vision 🐶👁️ come to life!

In [143]:

Copied!

from IPython.display import HTML

# Embed the Hugging Face Space as an iframe
html_string = """
<iframe src="https://mrdbourke-dog-vision.hf.space" frameborder="0" width="850" height="850"></iframe>
"""

display(HTML(html_string))
from IPython.display import HTML

# Embed the Hugging Face Space as an iframe
html_string = """

"""

display(HTML(html_string))

The following will write the whole cell to a Python file called app.py, this can uploaded to Hugging Face and run as a Space. As long as all available files (e.g. model file and class names file) are available.

In [144]:

Copied!





# %%writefile app.py
# import gradio as gr
# import tensorflow as tf

# # Load model
# model_save_path = "dog_vision_model_demo.keras"
# loaded_model_for_demo = tf.keras.models.load_model(model_save_path)

# # Load labels
# with open("stanford_dogs_class_names.txt", "r") as f:
#   class_names = [line.strip() for line in f.readlines()]

# # Create prediction function
# def pred_on_custom_image(image, # input image (preprocessed by Gradio's Image input to be numpy.array)
#                          model: tf.keras.Model =loaded_model_for_demo,  # Trained TensorFlow model for prediction
#                          target_size: int = 224,  # Desired size of the image for input to the model
#                          class_names: list = class_names): # List of class names
#   """
#   Loads an image, preprocesses it, makes a prediction using a provided model,
#   and returns a dictionary of prediction probabilities per class name.

#   Args:
#       image: Input image.
#       model: Trained TensorFlow model for prediction.
#       target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
#       class_names (list, optional): List of class names for plotting. Defaults to None.

#   Returns:
#      Dict[str: float]: A dictionary of string class names and their respective prediction probability.
#   """

#   # Note: gradio.inputs.Image handles opening the image
#   # # Prepare and load image
#   # custom_image = tf.keras.utils.load_img(
#   #   path=image_path,
#   #   color_mode="rgb",
#   #   target_size=target_size,
#   # )

#   # Create resizing layer to resize the image
#   resize = tf.keras.layers.Resizing(height=target_size,
#                                     width=target_size)

#   # Turn the image into a tensor and resize it
#   custom_image_tensor = resize(tf.keras.utils.img_to_array(image))

#   # Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
#   custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)

#   # Make a prediction with the target model
#   pred_probs = model.predict(custom_image_tensor)[0]

#   # Predictions get returned as a dictionary of {label: pred_prob}
#   pred_probs_dict = {class_names[i]: float(pred_probs[i]) for i in range(len(class_names))}

#   return pred_probs_dict

# # Create Gradio interface
# interface_title = "Dog Vision 🐶👁️"
# interface_description = "Identify different dogs in images with deep learning. Model trained with TensorFlow/Keras."
# interface = gr.Interface(fn=pred_on_custom_image,
#                          inputs=gr.Image(),
#                          outputs=gr.Label(num_top_classes=3),
#                          examples=["dog-photo-1.jpeg",
#                                     "dog-photo-2.jpeg",
#                                     "dog-photo-3.jpeg",
#                                     "dog-photo-4.jpeg"],
#                          title=interface_title,
#                          description=interface_description)
# interface.launch(debug=True)
# %%writefile app.py
# import gradio as gr
# import tensorflow as tf

# # Load model
# model_save_path = "dog_vision_model_demo.keras"
# loaded_model_for_demo = tf.keras.models.load_model(model_save_path)

# # Load labels
# with open("stanford_dogs_class_names.txt", "r") as f:
#   class_names = [line.strip() for line in f.readlines()]

# # Create prediction function
# def pred_on_custom_image(image, # input image (preprocessed by Gradio's Image input to be numpy.array)
#                          model: tf.keras.Model =loaded_model_for_demo,  # Trained TensorFlow model for prediction
#                          target_size: int = 224,  # Desired size of the image for input to the model
#                          class_names: list = class_names): # List of class names
#   """
#   Loads an image, preprocesses it, makes a prediction using a provided model,
#   and returns a dictionary of prediction probabilities per class name.

#   Args:
#       image: Input image.
#       model: Trained TensorFlow model for prediction.
#       target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
#       class_names (list, optional): List of class names for plotting. Defaults to None.

#   Returns:
#      Dict[str: float]: A dictionary of string class names and their respective prediction probability.
#   """

#   # Note: gradio.inputs.Image handles opening the image
#   # # Prepare and load image
#   # custom_image = tf.keras.utils.load_img(
#   #   path=image_path,
#   #   color_mode="rgb",
#   #   target_size=target_size,
#   # )

#   # Create resizing layer to resize the image
#   resize = tf.keras.layers.Resizing(height=target_size,
#                                     width=target_size)

#   # Turn the image into a tensor and resize it
#   custom_image_tensor = resize(tf.keras.utils.img_to_array(image))

#   # Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
#   custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)

#   # Make a prediction with the target model
#   pred_probs = model.predict(custom_image_tensor)[0]

#   # Predictions get returned as a dictionary of {label: pred_prob}
#   pred_probs_dict = {class_names[i]: float(pred_probs[i]) for i in range(len(class_names))}

#   return pred_probs_dict

# # Create Gradio interface
# interface_title = "Dog Vision 🐶👁️"
# interface_description = "Identify different dogs in images with deep learning. Model trained with TensorFlow/Keras."
# interface = gr.Interface(fn=pred_on_custom_image,
#                          inputs=gr.Image(),
#                          outputs=gr.Label(num_top_classes=3),
#                          examples=["dog-photo-1.jpeg",
#                                     "dog-photo-2.jpeg",
#                                     "dog-photo-3.jpeg",
#                                     "dog-photo-4.jpeg"],
#                          title=interface_title,
#                          description=interface_description)
# interface.launch(debug=True)