07 Milestone Project 1: 🍔👁 Food Vision Big™¶
In the previous notebook (transfer learning part 3: scaling up) we built Food Vision mini: a transfer learning model which beat the original results of the Food101 paper with only 10% of the data.
But you might be wondering, what would happen if we used all the data?
Well, that's what we're going to find out in this notebook!
We're going to be building Food Vision Big™, using all of the data from the Food101 dataset.
Yep. All 75,750 training images and 25,250 testing images.
And guess what...
This time we've got the goal of beating DeepFood, a 2016 paper which used a Convolutional Neural Network trained for 2-3 days to achieve 77.4% top-1 accuracy.
🔑 Note: Top-1 accuracy means "accuracy for the top softmax activation value output by the model" (because softmax ouputs a value for every class, but top-1 means only the highest one is evaluated). Top-5 accuracy means "accuracy for the top 5 softmax activation values output by the model", in other words, did the true label appear in the top 5 activation values? Top-5 accuracy scores are usually noticeably higher than top-1.
🍔👁 Food Vision Big™ | 🍔👁 Food Vision mini | |
---|---|---|
Dataset source | TensorFlow Datasets | Preprocessed download from Kaggle |
Train data | 75,750 images | 7,575 images |
Test data | 25,250 images | 25,250 images |
Mixed precision | Yes | No |
Data loading | Performanant tf.data API | TensorFlow pre-built function |
Target results | 77.4% top-1 accuracy (beat DeepFood paper) | 50.76% top-1 accuracy (beat Food101 paper) |
Table comparing difference between Food Vision Big (this notebook) versus Food Vision mini (previous notebook).
Alongside attempting to beat the DeepFood paper, we're going to learn about two methods to significantly improve the speed of our model training:
- Prefetching
- Mixed precision training
But more on these later.
What we're going to cover¶
- Using TensorFlow Datasets to download and explore data
- Creating preprocessing function for our data
- Batching & preparing datasets for modelling (making our datasets run fast)
- Creating modelling callbacks
- Setting up mixed precision training
- Building a feature extraction model (see transfer learning part 1: feature extraction)
- Fine-tuning the feature extraction model (see transfer learning part 2: fine-tuning)
- Viewing training results on TensorBoard
How you should approach this notebook¶
You can read through the descriptions and the code (it should all run, except for the cells which error on purpose), but there's a better option.
Write all of the code yourself.
Yes. I'm serious. Create a new notebook, and rewrite each line by yourself. Investigate it, see if you can break it, why does it break?
You don't have to write the text descriptions but writing the code yourself is a great way to get hands-on experience.
Don't worry if you make mistakes, we all do. The way to get better and make less mistakes is to write more code.
📖 Resources:
- See the full set of course materials on GitHub: https://github.com/mrdbourke/tensorflow-deep-learning
- See updates to this notebook on GitHub: https://github.com/mrdbourke/tensorflow-deep-learning/discussions/550
Check GPU¶
For this notebook, we're going to be doing something different.
We're going to be using mixed precision training.
Mixed precision training was introduced in TensorFlow 2.4.0 (a very new feature at the time of writing).
What does mixed precision training do?
Mixed precision training uses a combination of single precision (float32) and half-preicison (float16) data types to speed up model training (up 3x on modern GPUs).
We'll talk about this more later on but in the meantime you can read the TensorFlow documentation on mixed precision for more details.
For now, before we can move forward if we want to use mixed precision training, we need to make sure the GPU powering our Google Colab instance (if you're using Google Colab) is compataible.
For mixed precision training to work, you need access to a GPU with a compute compability score of 7.0+.
Google Colab offers several kinds of GPU.
However, some of them aren't compatiable with mixed precision training.
Therefore to make sure you have access to mixed precision training in Google Colab, you can check your GPU compute capability score on Nvidia's developer website.
As of May 2023, the GPUs available on Google Colab which allow mixed precision training are:
- NVIDIA A100 (available with Google Colab Pro)
- NVIDIA Tesla T4
🔑 Note: You can run the cell below to check your GPU name and then compare it to list of GPUs on NVIDIA's developer page to see if it's capable of using mixed precision training.
# Get GPU name
!nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-269f6413-0643-12da-9e68-ef2cb8b4aad3)
Since mixed precision training was introduced in TensorFlow 2.4.0, make sure you've got at least TensorFlow 2.4.0+.
# Note: As of May 2023, there have been some issues with TensorFlow versions 2.9-2.12
# with the following code.
# However, these seemed to have been fixed in version 2.13+.
# TensorFlow version 2.13 is available in tf-nightly as of May 2023 (will be default in Google Colab soon).
# Therefore, to prevent errors we'll install tf-nightly first.
# See more here: https://github.com/mrdbourke/tensorflow-deep-learning/discussions/550
# Install tf-nightly (required until 2.13.0+ is the default in Google Colab)
!pip install -U -q tf-nightly
# Check TensorFlow version (should be minimum 2.4.0+ but 2.13.0+ is better)
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")
# Add timestamp
import datetime
print(f"Notebook last run (end-to-end): {datetime.datetime.now()}")
TensorFlow version: 2.14.0-dev20230518 Notebook last run (end-to-end): 2023-05-19 02:54:07.955201
Get helper functions¶
We've created a series of helper functions throughout the previous notebooks in the course. Instead of rewriting them (tedious), we'll import the helper_functions.py
file from the GitHub repo.
# Get helper functions file
import os
if not os.path.exists("helper_functions.py"):
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
else:
print("[INFO] 'helper_functions.py' already exists, skipping download.")
--2023-05-19 02:13:56-- https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 10246 (10K) [text/plain] Saving to: ‘helper_functions.py’ helper_functions.py 100%[===================>] 10.01K --.-KB/s in 0s 2023-05-19 02:13:56 (100 MB/s) - ‘helper_functions.py’ saved [10246/10246]
# Import series of helper functions for the notebook (we've created/used these in previous notebooks)
from helper_functions import create_tensorboard_callback, plot_loss_curves, compare_historys
Use TensorFlow Datasets to Download Data¶
In previous notebooks, we've downloaded our food images (from the Food101 dataset) from Google Storage.
And this is a typical workflow you'd use if you're working on your own datasets.
However, there's another way to get datasets ready to use with TensorFlow.
For many of the most popular datasets in the machine learning world (often referred to and used as benchmarks), you can access them through TensorFlow Datasets (TFDS).
What is TensorFlow Datasets?
A place for prepared and ready-to-use machine learning datasets.
Why use TensorFlow Datasets?
- Load data already in Tensors
- Practice on well established datasets
- Experiment with differet data loading techniques (like we're going to use in this notebook)
- Experiment with new TensorFlow features quickly (such as mixed precision training)
Why not use TensorFlow Datasets?
- The datasets are static (they don't change, like your real-world datasets would)
- Might not be suited for your particular problem (but great for experimenting)
To begin using TensorFlow Datasets we can import it under the alias tfds
.
# Get TensorFlow Datasets
import tensorflow_datasets as tfds
To find all of the available datasets in TensorFlow Datasets, you can use the list_builders()
method.
After doing so, we can check to see if the one we're after ("food101"
) is present.
# Get all available datasets in TFDS
datasets_list = tfds.list_builders()
# Set our target dataset and see if it exists
target_dataset = "food101"
print(f"'{target_dataset}' in TensorFlow Datasets: {target_dataset in datasets_list}")
'food101' in TensorFlow Datasets: True
Beautiful! It looks like the dataset we're after is available (note there are plenty more available but we're on Food101).
To get access to the Food101 dataset from the TFDS, we can use the tfds.load()
method.
In particular, we'll have to pass it a few parameters to let it know what we're after:
name
(str) : the target dataset (e.g."food101"
)split
(list, optional) : what splits of the dataset we're after (e.g.["train", "validation"]
)- the
split
parameter is quite tricky. See the documentation for more.
- the
shuffle_files
(bool) : whether or not to shuffle the files on download, defaults toFalse
as_supervised
(bool) :True
to download data samples in tuple format ((data, label)
) orFalse
for dictionary formatwith_info
(bool) :True
to download dataset metadata (labels, number of samples, etc)
🔑 Note: Calling the
tfds.load()
method will start to download a target dataset to disk if thedownload=True
parameter is set (default). This dataset could be 100GB+, so make sure you have space.
# Load in the data (takes about 5-6 minutes in Google Colab)
(train_data, test_data), ds_info = tfds.load(name="food101", # target dataset to get from TFDS
split=["train", "validation"], # what splits of data should we get? note: not all datasets have train, valid, test
shuffle_files=True, # shuffle files on download?
as_supervised=True, # download data in tuple format (sample, label), e.g. (image, label)
with_info=True) # include dataset metadata? if so, tfds.load() returns tuple (data, ds_info)
Wonderful! After a few minutes of downloading, we've now got access to entire Food101 dataset (in tensor format) ready for modelling.
Now let's get a little information from our dataset, starting with the class names.
Getting class names from a TensorFlow Datasets dataset requires downloading the "dataset_info
" variable (by using the as_supervised=True
parameter in the tfds.load()
method, note: this will only work for supervised datasets in TFDS).
We can access the class names of a particular dataset using the dataset_info.features
attribute and accessing names
attribute of the the "label"
key.
# Features of Food101 TFDS
ds_info.features
FeaturesDict({ 'image': Image(shape=(None, None, 3), dtype=uint8), 'label': ClassLabel(shape=(), dtype=int64, num_classes=101), })
# Get class names
class_names = ds_info.features["label"].names
class_names[:10]
['apple_pie', 'baby_back_ribs', 'baklava', 'beef_carpaccio', 'beef_tartare', 'beet_salad', 'beignets', 'bibimbap', 'bread_pudding', 'breakfast_burrito']
Exploring the Food101 data from TensorFlow Datasets¶
Now we've downloaded the Food101 dataset from TensorFlow Datasets, how about we do what any good data explorer should?
In other words, "visualize, visualize, visualize".
Let's find out a few details about our dataset:
- The shape of our input data (image tensors)
- The datatype of our input data
- What the labels of our input data look like (e.g. one-hot encoded versus label-encoded)
- Do the labels match up with the class names?
To do, let's take one sample off the training data (using the .take()
method) and explore it.
# Take one sample off the training data
train_one_sample = train_data.take(1) # samples are in format (image_tensor, label)
Because we used the as_supervised=True
parameter in our tfds.load()
method above, data samples come in the tuple format structure (data, label)
or in our case (image_tensor, label)
.
# What does one sample of our training data look like?
train_one_sample
<_TakeDataset element_spec=(TensorSpec(shape=(None, None, 3), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>
Let's loop through our single training sample and get some info from the image_tensor
and label
.
# Output info about our training sample
for image, label in train_one_sample:
print(f"""
Image shape: {image.shape}
Image dtype: {image.dtype}
Target class from Food101 (tensor form): {label}
Class name (str form): {class_names[label.numpy()]}
""")
Image shape: (512, 512, 3) Image dtype: <dtype: 'uint8'> Target class from Food101 (tensor form): 90 Class name (str form): spaghetti_bolognese
Because we set the shuffle_files=True
parameter in our tfds.load()
method above, running the cell above a few times will give a different result each time.
Checking these you might notice some of the images have different shapes, for example (512, 342, 3)
and (512, 512, 3)
(height, width, color_channels).
Let's see what one of the image tensors from TFDS's Food101 dataset looks like.
# What does an image tensor from TFDS's Food101 look like?
image
<tf.Tensor: shape=(512, 512, 3), dtype=uint8, numpy= array([[[ 12, 13, 7], [ 12, 13, 7], [ 13, 14, 8], ..., [ 21, 11, 0], [ 21, 11, 0], [ 21, 11, 0]], [[ 12, 13, 7], [ 11, 12, 6], [ 11, 12, 6], ..., [ 21, 11, 0], [ 21, 11, 0], [ 21, 11, 0]], [[ 7, 8, 2], [ 7, 8, 2], [ 7, 8, 2], ..., [ 22, 12, 2], [ 21, 11, 1], [ 20, 10, 0]], ..., [[188, 191, 184], [188, 191, 184], [188, 191, 184], ..., [243, 248, 244], [243, 248, 244], [242, 247, 243]], [[187, 190, 183], [189, 192, 185], [190, 193, 186], ..., [241, 245, 244], [241, 245, 244], [241, 245, 244]], [[186, 189, 182], [189, 192, 185], [191, 194, 187], ..., [238, 242, 241], [239, 243, 242], [239, 243, 242]]], dtype=uint8)>
# What are the min and max values?
tf.reduce_min(image), tf.reduce_max(image)
(<tf.Tensor: shape=(), dtype=uint8, numpy=0>, <tf.Tensor: shape=(), dtype=uint8, numpy=255>)
Alright looks like our image tensors have values of between 0 & 255 (standard red, green, blue colour values) and the values are of data type unit8
.
We might have to preprocess these before passing them to a neural network. But we'll handle this later.
In the meantime, let's see if we can plot an image sample.
Plot an image from TensorFlow Datasets¶
We've seen our image tensors in tensor format, now let's really adhere to our motto.
"Visualize, visualize, visualize!"
Let's plot one of the image samples using matplotlib.pyplot.imshow()
and set the title to target class name.
# Plot an image tensor
import matplotlib.pyplot as plt
plt.imshow(image)
plt.title(class_names[label.numpy()]) # add title to image by indexing on class_names list
plt.axis(False);
Delicious!
Okay, looks like the Food101 data we've got from TFDS is similar to the datasets we've been using in previous notebooks.
Now let's preprocess it and get it ready for use with a neural network.
Create preprocessing functions for our data¶
In previous notebooks, when our images were in folder format we used the method tf.keras.utils.image_dataset_from_directory()
to load them in.
Doing this meant our data was loaded into a format ready to be used with our models.
However, since we've downloaded the data from TensorFlow Datasets, there are a couple of preprocessing steps we have to take before it's ready to model.
More specifically, our data is currently:
- In
uint8
data type - Comprised of all differnet sized tensors (different sized images)
- Not scaled (the pixel values are between 0 & 255)
Whereas, models like data to be:
- In
float32
data type - Have all of the same size tensors (batches require all tensors have the same shape, e.g.
(224, 224, 3)
) - Scaled (values between 0 & 1), also called normalized
To take care of these, we'll create a preprocess_img()
function which:
- Resizes an input image tensor to a specified size using
tf.image.resize()
- Converts an input image tensor's current datatype to
tf.float32
usingtf.cast()
🔑 Note: Pretrained EfficientNetBX models in
tf.keras.applications.efficientnet
(what we're going to be using) have rescaling built-in. But for many other model architectures you'll want to rescale your data (e.g. get its values between 0 & 1). This could be incorporated inside your "preprocess_img()
" function (like the one below) or within your model as atf.keras.layers.Rescaling
layer.
# Make a function for preprocessing images
def preprocess_img(image, label, img_shape=224):
"""
Converts image datatype from 'uint8' -> 'float32' and reshapes image to
[img_shape, img_shape, color_channels]
"""
image = tf.image.resize(image, [img_shape, img_shape]) # reshape to img_shape
return tf.cast(image, tf.float32), label # return (float32_image, label) tuple
Our preprocess_img()
function above takes image and label as input (even though it does nothing to the label) because our dataset is currently in the tuple structure (image, label)
.
Let's try our function out on a target image.
# Preprocess a single sample image and check the outputs
preprocessed_img = preprocess_img(image, label)[0]
print(f"Image before preprocessing:\n {image[:2]}...,\nShape: {image.shape},\nDatatype: {image.dtype}\n")
print(f"Image after preprocessing:\n {preprocessed_img[:2]}...,\nShape: {preprocessed_img.shape},\nDatatype: {preprocessed_img.dtype}")
Image before preprocessing: [[[12 13 7] [12 13 7] [13 14 8] ... [21 11 0] [21 11 0] [21 11 0]] [[12 13 7] [11 12 6] [11 12 6] ... [21 11 0] [21 11 0] [21 11 0]]]..., Shape: (512, 512, 3), Datatype: <dtype: 'uint8'> Image after preprocessing: [[[11.586735 12.586735 6.586735 ] [11.714286 12.714286 6.714286 ] [ 8.857142 9.857142 4.8571424 ] ... [20.714308 11.142836 1.2857144 ] [20.668371 10.668372 0. ] [21. 11. 0. ]] [[ 2.3571415 3.3571415 0.1428566 ] [ 3.1530607 4.153061 0.07653028] [ 3.0561223 4.0561223 0. ] ... [26.071407 18.071407 7.0714073 ] [24.785702 14.785702 4.7857018 ] [22.499966 12.499966 2.4999657 ]]]..., Shape: (224, 224, 3), Datatype: <dtype: 'float32'>
Excellent! Looks like our preprocess_img()
function is working as expected.
The input image gets converted from uint8
to float32
and gets reshaped from its current shape to (224, 224, 3)
.
How does it look?
# We can still plot our preprocessed image as long as we
# divide by 255 (for matplotlib capatibility)
plt.imshow(preprocessed_img/255.)
plt.title(class_names[label])
plt.axis(False);