Going Modular: Part 2 (script mode)#

This notebook is part 2/2 of section 05. Going Modular.

For reference, the two parts are:

  1. 05. Going Modular: Part 1 (cell mode) - this notebook is run as a traditional Jupyter Notebook/Google Colab notebook and is a condensed version of notebook 04.

  2. 05. Going Modular: Part 2 (script mode) - this notebook is the same as number 1 but with added functionality to turn each of the major sections into Python scripts, such as, data_setup.py and train.py.

Why two parts?

Because sometimes the best way to learn something is to see how it differs from something else.

If you run each notebook side-by-side you’ll see how they differ and that’s where the key learnings are.

What is script mode?#

Script mode uses Jupyter Notebook cell magic (special commands) to turn specific cells into Python scripts.

For example if you run the following code in a cell, you’ll create a Python file called hello_world.py:

%%writefile hello_world.py
print("hello world, machine learning is fun!")

You could then run this Python file on the command line with:

python hello_world.py

>>> hello world, machine learning is fun!

The main cell magic we’re interested in using is %%writefile.

Putting %%writefile filename at the top of a cell in Jupyter or Google Colab will write the contents of that cell to a specified filename.

Question: Do I have to create Python files like this? Can’t I just start directly with a Python file and skip using a Google Colab notebook?

Answer: Yes. This is only one way of creating Python scripts. If you know the kind of script you’d like to write, you could start writing it straight away. But since using Jupyter/Google Colab notebooks is a popular way of starting off data science and machine learning projects, knowing about the %%writefile magic command is a handy tip.

What has script mode got to do with PyTorch?#

If you’ve written some useful code in a Jupyter Notebook or Google Colab notebook, chances are you’ll want to use that code again.

And turning your useful cells into Python scripts (.py files) means you can use specific pieces of your code in other projects.

This practice is not PyTorch specific.

But it’s how you’ll see many different online PyTorch repositories structured.

PyTorch in the wild#

For example, if you find a PyTorch project on GitHub, it may be structured in the following way:

β”œβ”€β”€ pytorch_project/
β”‚   β”œβ”€β”€ data_setup.py
β”‚   β”œβ”€β”€ engine.py
β”‚   β”œβ”€β”€ model.py
β”‚   β”œβ”€β”€ train.py
β”‚   └── utils.py
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ model_1.pth
β”‚   └── model_2.pth
└── data/
    β”œβ”€β”€ data_folder_1/
    └── data_folder_2/

Here, the top level directory is called pytorch_project but you could call it whatever you want.

Inside there’s another directory called pytorch_project which contains several .py files, the purposes of these may be:

  • data_setup.py - a file to prepare data (and download data if needed).

  • engine.py - a file containing various training functions.

  • model_builder.py or model.py - a file to create a PyTorch model.

  • train.py - a file to leverage all other files and train a target PyTorch model.

  • utils.py - a file dedicated to helpful utility functions.

And the models and data directories could hold PyTorch models and data files respectively (though due to the size of models and data files, it’s unlikely you’ll find the full versions of these on GitHub, these directories are present above mainly for demonstration purposes).

Note: There are many different ways to structure a Python project and subsequently a PyTorch project. This isn’t a guide on how to structure your projects, only an example of how you might come across PyTorch projects in the wild. For more on structuring Python projects, see Real Python’s Python Application Layouts: A Reference guide.

What’s the difference between this notebook (Part 2) and the cell mode notebook (Part 1)?#

This notebook, 05 Going Modular: Part 2 (script mode), creates Python scripts out of the cells created in part 1.

Running this notebook end-to-end will result in having a directory structure very similar to the pytorch_project structure above.

You’ll notice each section in Part 2 (script mode) has an extra subsection (e.g. 2.1, 3.1, 4.1) for turning cell code into script code.

What we’re going to cover#

By the end of this notebook you should finish with a directory structure of:

β”œβ”€β”€ going_modular/
β”‚   β”œβ”€β”€ data_setup.py
β”‚   β”œβ”€β”€ engine.py
β”‚   β”œβ”€β”€ model_builder.py
β”‚   β”œβ”€β”€ train.py
β”‚   └── utils.py
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ 05_going_modular_cell_mode_tinyvgg_model.pth
β”‚   └── 05_going_modular_script_mode_tinyvgg_model.pth
└── data/
    └── pizza_steak_sushi/
        β”œβ”€β”€ train/
        β”‚   β”œβ”€β”€ pizza/
        β”‚   β”‚   β”œβ”€β”€ image01.jpeg
        β”‚   β”‚   └── ...
        β”‚   β”œβ”€β”€ steak/
        β”‚   └── sushi/
        └── test/
            β”œβ”€β”€ pizza/
            β”œβ”€β”€ steak/
            └── sushi/

Using this directory structure, you should be able to train a model from within a notebook with the command:

!python going_modular/train.py

Or from the command line with:

python going_modular/train.py

In essence, we will have turned our helpful notebook code into reusable modular code.

Creating a folder for storing Python scripts#

Since we’re going to be creating Python scripts out of our most useful code cells, let’s create a folder for storing those scripts.

We’ll call the folder going_modular and create it using Python’s os.makedirs() method.

import os

os.makedirs("going_modular", exist_ok=True)

Get data#

We’re going to start by downloading the same data we used in notebook 04, the pizza_steak_sushi dataset with images of pizza, steak and sushi.

import os
import zipfile

from pathlib import Path

import requests

# Setup path to data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exist, download it and prepare it... 
if image_path.is_dir():
    print(f"{image_path} directory exists.")
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)
# Download pizza, steak, sushi data
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")

# Unzip pizza, steak, sushi data
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...") 

# Remove zip file
os.remove(data_path / "pizza_steak_sushi.zip")
data/pizza_steak_sushi directory exists.
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...
# Setup train and testing paths
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

Create Datasets and DataLoaders#

Let’s turn our data into PyTorch Dataset’s and DataLoader’s and find out a few useful attributes from them such as classes and their lengths.

from torchvision import datasets, transforms

# Create simple transform
data_transform = transforms.Compose([ 
    transforms.Resize((64, 64)),

# Use ImageFolder to create dataset(s)
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

test_data = datasets.ImageFolder(root=test_dir, 

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")
Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data/pizza_steak_sushi/train
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=None)
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data/pizza_steak_sushi/test
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=None)
# Get class names as a list
class_names = train_data.classes
['pizza', 'steak', 'sushi']
# Can also get class names as a dict
class_dict = train_data.class_to_idx
{'pizza': 0, 'steak': 1, 'sushi': 2}
# Check the lengths
len(train_data), len(test_data)
(225, 75)
# Turn train and test Datasets into DataLoaders
from torch.utils.data import DataLoader

train_dataloader = DataLoader(dataset=train_data, 
                              batch_size=1, # how many samples per batch?
                              num_workers=1, # how many subprocesses to use for data loading? (higher = more)
                              shuffle=True) # shuffle the data?

test_dataloader = DataLoader(dataset=test_data, 
                             shuffle=False) # don't usually need to shuffle testing data

train_dataloader, test_dataloader
(<torch.utils.data.dataloader.DataLoader at 0x7fca2e344760>,
 <torch.utils.data.dataloader.DataLoader at 0x7fca2e3445b0>)
# Check out single image size/shape
img, label = next(iter(train_dataloader))

# Batch size will now be 1, try changing the batch_size parameter above and see what happens
print(f"Image shape: {img.shape} -> [batch_size, color_channels, height, width]")
print(f"Label shape: {label.shape}")
Image shape: torch.Size([1, 3, 64, 64]) -> [batch_size, color_channels, height, width]
Label shape: torch.Size([1])

Create Datasets and DataLoaders (script mode)#

Rather than rewriting all of the code above everytime we wanted to load data, we can turn it into a script called data_setup.py.

Let’s capture all of the above functionality into a function called create_dataloaders().

%%writefile going_modular/data_setup.py
Contains functionality for creating PyTorch DataLoaders for 
image classification data.
import os

from torch.utils.data import DataLoader
from torchvision import datasets, transforms

NUM_WORKERS = os.cpu_count()

def create_dataloaders(
    train_dir: str, 
    test_dir: str, 
    transform: transforms.Compose, 
    batch_size: int, 
    num_workers: int=NUM_WORKERS
  """Creates training and testing DataLoaders.

  Takes in a training directory and testing directory path and turns
  them into PyTorch Datasets and then into PyTorch DataLoaders.

    train_dir: Path to training directory.
    test_dir: Path to testing directory.
    transform: torchvision transforms to perform on training and testing data.
    batch_size: Number of samples per batch in each of the DataLoaders.
    num_workers: An integer for number of workers per DataLoader.

    A tuple of (train_dataloader, test_dataloader, class_names).
    Where class_names is a list of the target classes.
    Example usage:
      train_dataloader, test_dataloader, class_names = \
        = create_dataloaders(train_dir=path/to/train_dir,
  # Use ImageFolder to create dataset(s)
  train_data = datasets.ImageFolder(train_dir, transform=transform)
  test_data = datasets.ImageFolder(test_dir, transform=transform)

  # Get class names
  class_names = train_data.classes

  # Turn images into data loaders
  train_dataloader = DataLoader(
  test_dataloader = DataLoader(

  return train_dataloader, test_dataloader, class_names
Overwriting going_modular/data_setup.py

Making a model (TinyVGG)#

We’re going to use the same model we used in notebook 04: TinyVGG from the CNN Explainer website.

The only change here from notebook 04 is that a docstring has been added using Google’s Style Guide for Python.

import torch

from torch import nn 

class TinyVGG(nn.Module):
    """Creates the TinyVGG architecture.

    Replicates the TinyVGG architecture from the CNN explainer website in PyTorch.
    See the original architecture here: https://poloclub.github.io/cnn-explainer/

    input_shape: An integer indicating number of input channels.
    hidden_units: An integer indicating number of hidden units between layers.
    output_shape: An integer indicating number of output units.
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        self.conv_block_1 = nn.Sequential(
        self.conv_block_2 = nn.Sequential(
          nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
          nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
        self.classifier = nn.Sequential(
          # Where did this in_features shape come from? 
          # It's because each layer of our network compresses and changes the shape of our inputs data.
    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x
        # return self.classifier(self.block_2(self.block_1(x))) # <- leverage the benefits of operator fusion

Now let’s create an instance of TinyVGG and put it on the target device.

Note: If you’re using Google Colab, and you’d like to use a GPU (recommended), you can turn one on via going to Runtime -> Change runtime type -> Hardware accelerator -> GPU.

import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# Instantiate an instance of the model
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB) 
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=1690, out_features=3, bias=True)

Let’s check out our model by doing a dummy forward pass.

# 1. Get a batch of images and labels from the DataLoader
img_batch, label_batch = next(iter(train_dataloader))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
with torch.inference_mode():
    pred = model_0(img_single.to(device))
# 4. Print out what's happening and convert model logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")
Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[ 0.0208, -0.0019,  0.0095]], device='cuda:0')

Output prediction probabilities:
tensor([[0.3371, 0.3295, 0.3333]], device='cuda:0')

Output prediction label:
tensor([0], device='cuda:0')

Actual label:

Making a model (TinyVGG) (script mode)#

Over the past few notebooks (notebook 03 and notebook 04), we’ve built the TinyVGG model a few times.

So it makes sense to put the model into its file so we can reuse it again and again.

Let’s put our TinyVGG() model class into a script called model_builder.py with the line %%writefile going_modular/model_builder.py.

%%writefile going_modular/model_builder.py
Contains PyTorch model code to instantiate a TinyVGG model.
import torch

from torch import nn

class TinyVGG(nn.Module):
    """Creates the TinyVGG architecture.

    Replicates the TinyVGG architecture from the CNN explainer website in PyTorch.
    See the original architecture here: https://poloclub.github.io/cnn-explainer/

    input_shape: An integer indicating number of input channels.
    hidden_units: An integer indicating number of hidden units between layers.
    output_shape: An integer indicating number of output units.
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        self.conv_block_1 = nn.Sequential(
        self.conv_block_2 = nn.Sequential(
          nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
          nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
        self.classifier = nn.Sequential(
          # Where did this in_features shape come from? 
          # It's because each layer of our network compresses and changes the shape of our inputs data.
    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x
        # return self.classifier(self.block_2(self.block_1(x))) # <- leverage the benefits of operator fusion
Overwriting going_modular/model_builder.py

Create an instance of TinyVGG (from the script).

import torch

from going_modular import model_builder

device = "cuda" if torch.cuda.is_available() else "cpu"

# Instantiate an instance of the model from the "model_builder.py" script
model_1 = model_builder.TinyVGG(input_shape=3, # number of color channels (3 for RGB) 
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=1690, out_features=3, bias=True)

Do a dummy forward pass on model_1.

# 1. Get a batch of images and labels from the DataLoader
img_batch, label_batch = next(iter(train_dataloader))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
with torch.inference_mode():
    pred = model_1(img_single.to(device))
# 4. Print out what's happening and convert model logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")
Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[ 0.0208, -0.0019,  0.0095]], device='cuda:0')

Output prediction probabilities:
tensor([[0.3371, 0.3295, 0.3333]], device='cuda:0')

Output prediction label:
tensor([0], device='cuda:0')

Actual label:

Creating train_step() and test_step() functions and train() to combine them#

Rather than writing them again, we can reuse the train_step() and test_step() functions from notebook 04.

The same goes for the train() function we created.

The only difference here is that these functions have had docstrings added to them in Google’s Python Functions and Methods Style Guide.

Let’s start by making train_step().

from typing import Tuple

def train_step(model: torch.nn.Module, 
               dataloader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               optimizer: torch.optim.Optimizer,
               device: torch.device) -> Tuple[float, float]:
    """Trains a PyTorch model for a single epoch.

    Turns a target PyTorch model to training mode and then
    runs through all of the required training steps (forward
    pass, loss calculation, optimizer step).

    model: A PyTorch model to be trained.
    dataloader: A DataLoader instance for the model to be trained on.
    loss_fn: A PyTorch loss function to minimize.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    A tuple of training loss and training accuracy metrics.
    In the form (train_loss, train_accuracy). For example:

    (0.1112, 0.8743)
    # Put model in train mode

    # Setup train loss and train accuracy values
    train_loss, train_acc = 0, 0

    # Loop through data loader data batches
    for batch, (X, y) in enumerate(dataloader):
        # Send data to target device
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X)

        # 2. Calculate  and accumulate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 

        # 3. Optimizer zero grad

        # 4. Loss backward

        # 5. Optimizer step

        # Calculate and accumulate accuracy metric across all batches
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item()/len(y_pred)

    # Adjust metrics to get average loss and accuracy per batch 
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

Now we’ll do test_step().

def test_step(model: torch.nn.Module, 
              dataloader: torch.utils.data.DataLoader, 
              loss_fn: torch.nn.Module,
              device: torch.device) -> Tuple[float, float]:
    """Tests a PyTorch model for a single epoch.

    Turns a target PyTorch model to "eval" mode and then performs
    a forward pass on a testing dataset.

    model: A PyTorch model to be tested.
    dataloader: A DataLoader instance for the model to be tested on.
    loss_fn: A PyTorch loss function to calculate loss on the test data.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    A tuple of testing loss and testing accuracy metrics.
    In the form (test_loss, test_accuracy). For example:

    (0.0223, 0.8985)
    # Put model in eval mode

    # Setup test loss and test accuracy values
    test_loss, test_acc = 0, 0

    # Turn on inference context manager
    with torch.inference_mode():
        # Loop through DataLoader batches
        for batch, (X, y) in enumerate(dataloader):
            # Send data to target device
            X, y = X.to(device), y.to(device)

            # 1. Forward pass
            test_pred_logits = model(X)

            # 2. Calculate and accumulate loss
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()

            # Calculate and accumulate accuracy
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))

    # Adjust metrics to get average loss and accuracy per batch 
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc

And we’ll combine train_step() and test_step() into train().

from typing import Dict, List

from tqdm.auto import tqdm

def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List[float]]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
                  train_acc: [...],
                  test_loss: [...],
                  test_acc: [...]} 
    For example if training for epochs=2: 
                 {train_loss: [2.0616, 1.0537],
                  train_acc: [0.3945, 0.3945],
                  test_loss: [1.2641, 1.5706],
                  test_acc: [0.3400, 0.2973]} 
    # Create empty results dictionary
    results = {"train_loss": [],
      "train_acc": [],
      "test_loss": [],
      "test_acc": []

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
        test_loss, test_acc = test_step(model=model,

        # Print out what's happening
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"

        # Update results dictionary

    # Return the filled results at the end of the epochs
    return results

Creating train_step() and test_step() functions and train() to combine them (script mode)#

To create a script for train_step(), test_step() and train(), we’ll combine their code all into a single cell.

We’ll then write that cell to a file called engine.py because these functions will be the β€œengine” of our training pipeline.

We can do so with the magic line %%writefile going_modular/engine.py.

We’ll also make sure to put all the imports we need (torch, typing, and tqdm) at the top of the cell.

%%writefile going_modular/engine.py
Contains functions for training and testing a PyTorch model.
from typing import Dict, List, Tuple

import torch

from tqdm.auto import tqdm

def train_step(model: torch.nn.Module, 
               dataloader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               optimizer: torch.optim.Optimizer,
               device: torch.device) -> Tuple[float, float]:
    """Trains a PyTorch model for a single epoch.

    Turns a target PyTorch model to training mode and then
    runs through all of the required training steps (forward
    pass, loss calculation, optimizer step).

    model: A PyTorch model to be trained.
    dataloader: A DataLoader instance for the model to be trained on.
    loss_fn: A PyTorch loss function to minimize.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    A tuple of training loss and training accuracy metrics.
    In the form (train_loss, train_accuracy). For example:

    (0.1112, 0.8743)
    # Put model in train mode

    # Setup train loss and train accuracy values
    train_loss, train_acc = 0, 0

    # Loop through data loader data batches
    for batch, (X, y) in enumerate(dataloader):
        # Send data to target device
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X)

        # 2. Calculate  and accumulate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 

        # 3. Optimizer zero grad

        # 4. Loss backward

        # 5. Optimizer step

        # Calculate and accumulate accuracy metric across all batches
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item()/len(y_pred)

    # Adjust metrics to get average loss and accuracy per batch 
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

def test_step(model: torch.nn.Module, 
              dataloader: torch.utils.data.DataLoader, 
              loss_fn: torch.nn.Module,
              device: torch.device) -> Tuple[float, float]:
    """Tests a PyTorch model for a single epoch.

    Turns a target PyTorch model to "eval" mode and then performs
    a forward pass on a testing dataset.

    model: A PyTorch model to be tested.
    dataloader: A DataLoader instance for the model to be tested on.
    loss_fn: A PyTorch loss function to calculate loss on the test data.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    A tuple of testing loss and testing accuracy metrics.
    In the form (test_loss, test_accuracy). For example:

    (0.0223, 0.8985)
    # Put model in eval mode

    # Setup test loss and test accuracy values
    test_loss, test_acc = 0, 0

    # Turn on inference context manager
    with torch.inference_mode():
        # Loop through DataLoader batches
        for batch, (X, y) in enumerate(dataloader):
            # Send data to target device
            X, y = X.to(device), y.to(device)

            # 1. Forward pass
            test_pred_logits = model(X)

            # 2. Calculate and accumulate loss
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()

            # Calculate and accumulate accuracy
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))

    # Adjust metrics to get average loss and accuracy per batch 
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc

def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List[float]]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
              train_acc: [...],
              test_loss: [...],
              test_acc: [...]} 
    For example if training for epochs=2: 
             {train_loss: [2.0616, 1.0537],
              train_acc: [0.3945, 0.3945],
              test_loss: [1.2641, 1.5706],
              test_acc: [0.3400, 0.2973]} 
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
        test_loss, test_acc = test_step(model=model,

        # Print out what's happening
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"

        # Update results dictionary

    # Return the filled results at the end of the epochs
    return results
Overwriting going_modular/engine.py

Creating a function to save the model#

Let’s setup a function to save our model to a directory.

from pathlib import Path

def save_model(model: torch.nn.Module,
               target_dir: str,
               model_name: str):
    """Saves a PyTorch model to a target directory.

    model: A target PyTorch model to save.
    target_dir: A directory for saving the model to.
    model_name: A filename for the saved model. Should include
      either ".pth" or ".pt" as the file extension.

    Example usage:
    # Create target directory
    target_dir_path = Path(target_dir)

    # Create model save path
    assert model_name.endswith(".pth") or model_name.endswith(".pt"), "model_name should end with '.pt' or '.pth'"
    model_save_path = target_dir_path / model_name

    # Save the model state_dict()
    print(f"[INFO] Saving model to: {model_save_path}")

Creating a function to save the model (script mode)#

How about we add our save_model() function to a script called utils.py which is short for β€œutilities”.

We can do so with the magic line %%writefile going_modular/utils.py.

%%writefile going_modular/utils.py
Contains various utility functions for PyTorch model training and saving.
from pathlib import Path

import torch

def save_model(model: torch.nn.Module,
               target_dir: str,
               model_name: str):
    """Saves a PyTorch model to a target directory.

    model: A target PyTorch model to save.
    target_dir: A directory for saving the model to.
    model_name: A filename for the saved model. Should include
      either ".pth" or ".pt" as the file extension.

    Example usage:
    # Create target directory
    target_dir_path = Path(target_dir)

    # Create model save path
    assert model_name.endswith(".pth") or model_name.endswith(".pt"), "model_name should end with '.pt' or '.pth'"
    model_save_path = target_dir_path / model_name

    # Save the model state_dict()
    print(f"[INFO] Saving model to: {model_save_path}")
Overwriting going_modular/utils.py

Train, evaluate and save the model#

Let’s leverage the functions we’ve got above to train, test and save a model to file.

# Set random seeds

# Set number of epochs

# Recreate an instance of TinyVGG
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB) 

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_0.parameters(), lr=0.001)

# Start the timer
from timeit import default_timer as timer 
start_time = timer()

# Train model_0 
model_0_results = train(model=model_0, 

# End the timer and print out how long it took
end_time = timer()
print(f"[INFO] Total training time: {end_time-start_time:.3f} seconds")

# Save the model
Epoch: 1 | train_loss: 1.0956 | train_acc: 0.3867 | test_loss: 1.0630 | test_acc: 0.4133
Epoch: 2 | train_loss: 1.0141 | train_acc: 0.5111 | test_loss: 1.0210 | test_acc: 0.4533
Epoch: 3 | train_loss: 0.9591 | train_acc: 0.5644 | test_loss: 0.9961 | test_acc: 0.4400
Epoch: 4 | train_loss: 0.8994 | train_acc: 0.5778 | test_loss: 0.9986 | test_acc: 0.4533
Epoch: 5 | train_loss: 0.8652 | train_acc: 0.6267 | test_loss: 1.0010 | test_acc: 0.5467
[INFO] Total training time: 5.461 seconds
[INFO] Saving model to: models/05_going_modular_cell_mode_tinyvgg_model.pth

Train, evaluate and save the model (script mode)#

Let’s combine all of our modular files into a single script train.py.

This will allow us to run all of the functions we’ve written with a single line of code on the command line:

python going_modular/train.py

Or if we’re running it in a notebook:

!python going_modular/train.py

We’ll go through the following steps:

  1. Import the various dependencies, namely torch, os, torchvision.transforms and all of the scripts from the going_modular directory, data_setup, engine, model_builder, utils.

  • Note: Since train.py will be inside the going_modular directory, we can import the other modules via import ... rather than from going_modular import ....

  1. Setup various hyperparameters such as batch size, number of epochs, learning rate and number of hidden units (these could be set in the future via Python’s argparse).

  2. Setup the training and test directories.

  3. Setup device-agnostic code.

  4. Create the necessary data transforms.

  5. Create the DataLoaders using data_setup.py.

  6. Create the model using model_builder.py.

  7. Setup the loss function and optimizer.

  8. Train the model using engine.py.

  9. Save the model using utils.py.

%%writefile going_modular/train.py
Trains a PyTorch image classification model using device-agnostic code.

import os

import torch

from torchvision import transforms

import data_setup, engine, model_builder, utils

# Setup hyperparameters

# Setup directories
train_dir = "data/pizza_steak_sushi/train"
test_dir = "data/pizza_steak_sushi/test"

# Setup target device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Create transforms
data_transform = transforms.Compose([
  transforms.Resize((64, 64)),

# Create DataLoaders with help from data_setup.py
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(

# Create model with help from model_builder.py
model = model_builder.TinyVGG(

# Set loss and optimizer
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),

# Start training with help from engine.py

# Save the model with help from utils.py
Overwriting going_modular/train.py

Now our final directory structure looks like:


Now to put it all together!

Let’s run our train.py file from the command line with:

!python going_modular/train.py
!python going_modular/train.py
  0%|                                                     | 0/5 [00:00<?, ?it/s]Epoch: 1 | train_loss: 1.1131 | train_acc: 0.2852 | test_loss: 1.1138 | test_acc: 0.2604
 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                    | 1/5 [00:01<00:04,  1.06s/it]Epoch: 2 | train_loss: 1.0851 | train_acc: 0.4102 | test_loss: 1.1238 | test_acc: 0.1979
 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                           | 2/5 [00:01<00:02,  1.20it/s]Epoch: 3 | train_loss: 1.0837 | train_acc: 0.4141 | test_loss: 1.1459 | test_acc: 0.1979
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  | 3/5 [00:02<00:01,  1.33it/s]Epoch: 4 | train_loss: 1.1104 | train_acc: 0.2930 | test_loss: 1.1318 | test_acc: 0.1979
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 4/5 [00:03<00:00,  1.40it/s]Epoch: 5 | train_loss: 1.0833 | train_acc: 0.2930 | test_loss: 1.0883 | test_acc: 0.3712
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:03<00:00,  1.35it/s]
[INFO] Saving model to: models/05_going_modular_script_mode_tinyvgg_model.pth