Finite

173_Time Series Forecasting with LSTMs(2)

elif — Wed, 22 May 2024 22:21:56 +0900

Continuing from the previous post(172_Time Series Forecasting with LSTMs).

def train_model(
  model, 
  train_data, 
  train_labels, 
  test_data=None, 
  test_labels=None
):
  loss_fn = torch.nn.MSELoss(reduction='sum')

  optimiser = torch.optim.Adam(model.parameters(), lr=1e-3)
  num_epochs = 60

  train_hist = np.zeros(num_epochs)
  test_hist = np.zeros(num_epochs)

  for t in range(num_epochs):
    model.reset_hidden_state()

    y_pred = model(X_train)

    loss = loss_fn(y_pred.float(), y_train)

    if test_data is not None:
      with torch.no_grad():
        y_test_pred = model(X_test)
        test_loss = loss_fn(y_test_pred.float(), y_test)
      test_hist[t] = test_loss.item()

      if t % 10 == 0:  
        print(f'Epoch {t} train loss: {loss.item()} test loss: {test_loss.item()}')
    elif t % 10 == 0:
      print(f'Epoch {t} train loss: {loss.item()}')

    train_hist[t] = loss.item()
    
    optimiser.zero_grad()

    loss.backward()

    optimiser.step()
  
  return model.eval(), train_hist, test_hist

The code below defines a function to train the given model. The loss function used is MSELoss, and the optimization algorithm is Adam with the number of epochs set to 60. The for loop iterates through the specified number of epochs, initializing the model's hidden state, comparing predictions to actual values to calculate the loss. If test data is provided, it calculates and stores the test loss, printing it every 10 epochs. The training loss is stored, and backpropagation is used to compute the gradients and update the weights.

model = CoronaVirusPredictor(
  n_features=1,
  n_hidden=512,
  seq_len=seq_length,
  n_layers=2
)
model, train_hist, test_hist = train_model(
  model,
  X_train,
  y_train,
  X_test,
  y_test
)

plt.plot(train_hist, label="Training loss")
plt.plot(test_hist, label="Test loss")
plt.ylim((0, 0.01))
plt.legend();

The result data appears a bit unusual. Ideally, the training loss should be lower than the test loss, but regardless of how small the value, the test loss does not decrease and seems to oscillate. It seems this issue arises because the number of time series data points is too small.

The current model can only predict the first future value. Therefore, to predict multiple future values, we can use the predicted value as the input for the next day's prediction, continuing this process iteratively.

with torch.no_grad():
  test_seq = X_test[:1]
  preds = []
  for _ in range(len(X_test)):
    y_test_pred = model(test_seq)
    pred = torch.flatten(y_test_pred).item()
    preds.append(pred)
    new_seq = test_seq.numpy().flatten()
    new_seq = np.append(new_seq, [pred])
    new_seq = new_seq[1:]
    test_seq = torch.as_tensor(new_seq).view(1, seq_length, 1).float()


true_cases = scaler.inverse_transform(
    np.expand_dims(y_test.flatten().numpy(), axis=0)
).flatten()

predicted_cases = scaler.inverse_transform(
  np.expand_dims(preds, axis=0)
).flatten()

The code below performs predictions on the test data using the trained model and reverts the predicted results back to the original scale. It iterates through the test data, calculates the model's predictions using the current sequence, adds the predicted value to the sequence, and removes the first value to maintain the sequence length. The predictions are then transformed back to the original scale to allow comparison between the predicted and actual values. This helps evaluate how well the model predicts future values of the time series data.

plt.plot(
  daily_cases.index[:len(train_data)], 
  scaler.inverse_transform(train_data).flatten(),
  label='Historical Daily Cases'
)

plt.plot(
  daily_cases.index[len(train_data):len(train_data) + len(true_cases)], 
  true_cases,
  label='Real Daily Cases'
)

plt.plot(
  daily_cases.index[len(train_data):len(train_data) + len(true_cases)], 
  predicted_cases, 
  label='Predicted Daily Cases'
)

plt.legend();

The performance of the predictions is not good. This is likely because the model has a very high dependency on the first predicted value, and the limited amount of data exacerbates this issue.

scaler = MinMaxScaler()

scaler = scaler.fit(np.expand_dims(daily_cases, axis=1))

all_data = scaler.transform(np.expand_dims(daily_cases, axis=1))

X_all, y_all = create_sequences(all_data, seq_length)

X_all = torch.from_numpy(X_all).float()
y_all = torch.from_numpy(y_all).float()

model = CoronaVirusPredictor(
  n_features=1, 
  n_hidden=512, 
  seq_len=seq_length, 
  n_layers=2
)
model, train_hist, _ = train_model(model, X_all, y_all)

DAYS_TO_PREDICT = 12

with torch.no_grad():
  test_seq = X_all[:1]
  preds = []
  for _ in range(DAYS_TO_PREDICT):
    y_test_pred = model(test_seq)
    pred = torch.flatten(y_test_pred).item()
    preds.append(pred)
    new_seq = test_seq.numpy().flatten()
    new_seq = np.append(new_seq, [pred])
    new_seq = new_seq[1:]
    test_seq = torch.as_tensor(new_seq).view(1, seq_length, 1).float()

predicted_cases = scaler.inverse_transform(np.expand_dims(preds, axis=0)).flatten()

predicted_index = pd.date_range(
  start=daily_cases.index[-1],
  periods=DAYS_TO_PREDICT + 1,
  closed='right'
)

predicted_cases = pd.Series(
  data=predicted_cases,
  index=predicted_index
)

plt.plot(daily_cases, label='Historical Daily Cases')
plt.plot(predicted_cases, label='Predicted Daily Cases')
plt.legend();

We have explored an example of predicting future values using time series data. However, forecasting time series data can be quite challenging, and the accuracy may be low. For detailed explanations and code, please refer to the references below.

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

172_Time Series Forecasting with LSTMs

elif — Tue, 21 May 2024 23:45:22 +0900

Time series data typically captures a series of data points recorded at consistent intervals. When dealing with such time series data, Long Short-Term Memory (LSTM) models have become a highly useful and widely adopted approach. LSTM, a type of recurrent neural network (RNN), is effective for processing data sequences. In this blog post, we will explore how to use LSTM to predict future coronavirus cases based on actual data. This example uses the explanations and code from the book mentioned in the references below for detailed information.

import torch

import os
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.preprocessing import MinMaxScaler
from pandas.plotting import register_matplotlib_converters
from torch import nn, optim

The list of required libraries is as mentioned above, and since they have been explained in previous posts, individual explanations will not be provided. The data used is from the Johns Hopkins University Center for Systems Science and Engineering, which includes daily reported cases by country. Here, we will use only the time series data for confirmed cases.

# !wget https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv
!gdown --id 1AsfdLrGESCQnRW5rbMz56A1KBc3Fe5aV
df = pd.read_csv('time_series_19-covid-Confirmed.csv')
df.head()

The data includes information such as provience/state, country, latitude, and longitude, but this information is not needed. Additionally, since the number of cases is cumulative, we will modify it to ensure it is not cumulative.

df = df.iloc[:,4:]
df.head()

daily_cases = df.sum(axis=0)
daily_cases.index = pd.to_datetime(daily_cases.index)
plt.plot(daily_cases)

It can be observed that the data is cumulative.

daily_cases = daily_cases.diff().fillna(daily_cases[0]).astype(np.int64)
plt.plot(daily_cases)

Using diff, we can easily adjust the data to make it non-cumulative.

test_data_size = 14
train_data = daily_cases[:-test_data_size]
test_data = daily_cases[-test_data_size:]

scaler = MinMaxScaler()
scaler = scaler.fit(np.expand_dims(train_data,axis = 1))
train_data = scaler.transform(np.expand_dims(train_data, axis = 1))
test_data = scaler.transform(np.expand_dims(test_data, axis = 1))

def create_sequences(data, seq_length):
  xs = []
  ys = []
  for i in range(len(data) - seq_length - 1):
    x = data[i:(i+seq_length)]
    y = data[i+seq_length]
    xs.append(x)
    ys.append(y)
    return np.array(xs), np.array(ys)

seq_length = 5
X_train, y_train = create_sequences(train_data, seq_length)
X_test, y_test = create_sequences(test_data, seq_length)

X_train = torch.from_numpy(X_train).float()
y_train = torch.from_numpy(y_train).float()

X_test = torch.from_numpy(X_test).float()
y_test = torch.from_numpy(y_test).float()

The code splits the time seires data into training and testing datasets, normalizes the data, and creates sequences to prepare it for model training. Finally, it converts the data into PyTorch tensors for use as model input.

class CoronaVirusPredictor(nn.Module):

  def __init__(self, n_features, n_hidden, seq_len, n_layers=2):
    super(CoronaVirusPredictor, self).__init__()

    self.n_hidden = n_hidden
    self.seq_len = seq_len
    self.n_layers = n_layers

    self.lstm = nn.LSTM(
      input_size=n_features,
      hidden_size=n_hidden,
      num_layers=n_layers,
      dropout=0.5
    )

    self.linear = nn.Linear(in_features=n_hidden, out_features=1)

  def reset_hidden_state(self):
    self.hidden = (
        torch.zeros(self.n_layers, self.seq_len, self.n_hidden),
        torch.zeros(self.n_layers, self.seq_len, self.n_hidden)
    )

  def forward(self, sequences):
    lstm_out, self.hidden = self.lstm(
      sequences.view(len(sequences), self.seq_len, -1),
      self.hidden
    )
    last_time_step = \
      lstm_out.view(self.seq_len, len(sequences), self.n_hidden)[-1]
    y_pred = self.linear(last_time_step)
    return y_pred

The code below defines an LSTM-based neural network model. Using nn.LSTM, the model structure and layers can be simply defined and initialized. The reset_hidden_state function initializes the hidden state and cell state by creating tensors with all values set to zero. The forward function passes the input through the LSTM layer to obtain the output and updates the hidden state. The last_time_step extracts the output from the last time step of the LSTM, and this is passed through the final linear layer to compute the prediction. Continued in the next post..

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

171_Image Classification using Torchvision(4)

elif — Mon, 20 May 2024 21:51:43 +0900

Continuing from the previous post(170_Image Classification using Torchvision(3)).

def show_confusion_matrix(confusion_matrix, class_names):
    cm = confusion_matrix.copy()

    cell_counts = cm.flatten()

    cm_row_norm = cm / cm.sum(axis=1)[:, np.newaxis]

    row_percentages = ["{0:.2f}".format(value) for value in cm_row_norm.flatten()]

    cell_labels = [f"{cnt}\n{per}" for cnt, per in zip(cell_counts, row_percentages)]
    cell_labels = np.asarray(cell_labels).reshape(cm.shape[0], cm.shape[1])

    df_cm = pd.DataFrame(cm_row_norm, index=class_names, columns=class_names)

    hmap = sns.heatmap(df_cm, annot=cell_labels, fmt="", cmap="Blues")
    hmap.yaxis.set_ticklabels(hmap.yaxis.get_ticklabels(), rotation=0, ha='right')
    hmap.xaxis.set_ticklabels(hmap.xaxis.get_ticklabels(), rotation=30, ha='right')
    plt.ylabel('True Sign')
    plt.xlabel('Predicted Sign');


cm = confusion_matrix(y_test, y_pred)
show_confusion_matrix(cm, class_names)

The above code defines a function to visualize the confusion matrix and uses it to evaluate the model's prediction performance. It flattens the confusion matrix into a 1D array to get the values of each cell, then normalizes each row by dividing by the sum of that row to convert the counts to proportions. It formats each value to two decimal places, converts it to a 1D array, combines the actual count and proportion for each cell to create labels, and reshapes this label array back to the original shape of the confusion matrix. The normalized confusion matrix is converted to a Pandas DataFrame with class names set for rows and columns, and a heatmap is generated using Seaborn.

Now, we'll see how to classify new data that is not included in the dataset. The example image to be used is as follows.

def predict_proba(model, image_path):
  img = Image.open(image_path)
  img = img.convert('RGB')
  img = transforms['test'](img).unsqueeze(0)

  pred = model(img.to(device))
  pred = F.softmax(pred, dim=1)
  return pred.detach().cpu().numpy().flatten()


pred = predict_proba(base_model, 'stop-sign.jpg')
pred

Plotted for easier understanding, it looks as follows.

def show_prediction_confidence(prediction, class_names):
    pred_df = pd.DataFrame({
    'class_names': class_names,
    'values': prediction
    })
    sns.barplot(x='values', y='class_names', data=pred_df, orient='h')
    plt.xlim([0, 1]);
show_prediction_confidence(pred, class_names)

The model correctly predicted the stop sign. When performing predictions on new data not included in the dataset, the model showed good performance. Of course, more examples should be examined, but for the blog post, this should suffice. Detailed information and code can be found in the references below.

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

170_Image Classification using Torchvision(3)

elif — Sun, 19 May 2024 23:06:49 +0900

Continuing from the previous post(169_Image Classification using Torchvision(2)).

%%time

base_model, history = train_model(base_model, data_loaders, dataset_sizes, device)

To make the results easier to interpret, add a visualization function.

def plot_training_history(history):
  fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

  ax1.plot(history['train_loss'], label='train loss')
  ax1.plot(history['val_loss'], label='validation loss')

  ax1.xaxis.set_major_locator(MaxNLocator(integer=True))
  ax1.set_ylim([-0.05, 1.05])
  ax1.legend()
  ax1.set_ylabel('Loss')
  ax1.set_xlabel('Epoch')

  ax2.plot(history['train_acc'], label='train accuracy')
  ax2.plot(history['val_acc'], label='validation accuracy')

  ax2.xaxis.set_major_locator(MaxNLocator(integer=True))
  ax2.set_ylim([-0.05, 1.05])
  ax2.legend()

  ax2.set_ylabel('Accuracy')
  ax2.set_xlabel('Epoch')

  fig.suptitle('Training history')
  
plot_training_history(history)

The original code is written like this, but the following error occurred.

can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

The error ocurred because train_acc and val_acc in history were set to cuda. Therefore, I modified the code to move them to cpu for execution.

def to_numpy(tensor):
    if isinstance(tensor, torch.Tensor):
        return tensor.cpu().numpy()
    return tensor

def plot_training_history(history):
    train_loss = [to_numpy(x) for x in history['train_loss']]
    val_loss = [to_numpy(x) for x in history['val_loss']]
    train_acc = [to_numpy(x) for x in history['train_acc']]
    val_acc = [to_numpy(x) for x in history['val_acc']]
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

    ax1.plot(train_loss, label='train loss')
    ax1.plot(val_loss, label='validation loss')

    ax1.xaxis.set_major_locator(MaxNLocator(integer=True))
    ax1.set_ylim([-0.05, 1.05])
    ax1.legend()
    ax1.set_ylabel('Loss')
    ax1.set_xlabel('Epoch')

    ax2.plot(train_acc, label='train accuracy')
    ax2.plot(val_acc, label='validation accuracy')

    ax2.xaxis.set_major_locator(MaxNLocator(integer=True))
    ax2.set_ylim([-0.05, 1.05])
    ax2.legend()

    ax2.set_ylabel('Accuracy')
    ax2.set_xlabel('Epoch')

    fig.suptitle('Training history')

plot_training_history(history)

The pre-trained model showed high accuracy and low loss even after only 3 epochs.

def show_predictions(model, class_names, n_images=6):
  model = model.eval()
  images_handeled = 0
  plt.figure()

  with torch.no_grad():
    for i, (inputs, labels) in enumerate(data_loaders['test']):
      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)
      _, preds = torch.max(outputs, 1)

      for j in range(inputs.shape[0]):
        images_handeled += 1
        ax = plt.subplot(2, n_images//2, images_handeled)
        ax.set_title(f'predicted: {class_names[preds[j]]}')
        imshow(inputs.cpu().data[j])
        ax.axis('off')

        if images_handeled == n_images:
          return
      
show_predictions(base_model, class_names, n_images=8)

The above code defines a function that uses the given model to perform predictions on the test data and visualizes the predicted results. In the first for loop, it iterates through the data in batches using the test data loader, selecting the class index with the highest value in the output as the predicted value. In the second for loop, it iterates through each image within the batch and continues until the specified number of visualized images is reached.

It can be observed that the model accurately predicts even very blurry or obscured parts.

def get_predictions(model, data_loader):
  model = model.eval()
  predictions = []
  real_values = []
  with torch.no_grad():
    for inputs, labels in data_loader:
      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)
      _, preds = torch.max(outputs, 1)
      predictions.extend(preds)
      real_values.extend(labels)
  predictions = torch.as_tensor(predictions).cpu()
  real_values = torch.as_tensor(real_values).cpu()
  return predictions, real_values

y_pred, y_test = get_predictions(base_model, data_loaders['test'])
print(classification_report(y_test, y_pred, target_names=class_names))

The above code defines a function that uses the given model to perform predictions on the data loader and returns the actual and predicted values. Then, it uses this function to make predictions on the test data and prints a report evaluating the model's performance. The report shows that the model is surprisingly accurate. Continued in the next post..

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

169_Image Classification using Torchvision(2)

elif — Sat, 18 May 2024 22:48:31 +0900

Continuing from the previous post(168_Image Classification using Torchvision).

image_datasets = {
  d: ImageFolder(f'{DATA_DIR}/{d}', transforms[d]) for d in DATASETS
}

data_loaders = {
  d: DataLoader(image_datasets[d], batch_size=4, shuffle=True, num_workers=4) 
  for d in DATASETS
}

Here, to facilitate easier training, PyTorch datasets and data loaders are created for each image dataset folder. Data loaders provide data in batches. The image_datasets dictionary loads the image data for each dataset and stores an ImageFolder object with the corresponding transformations applied. The data_loaders dictionary stores a DataLoader object for each dataset, which loads data in batches. This setup efficiently provides data during model training and evaluation.

def imshow(inp, title=None):
  inp = inp.numpy().transpose((1, 2, 0))
  mean = np.array([mean_nums])
  std = np.array([std_nums])
  inp = std * inp + mean
  inp = np.clip(inp, 0, 1)
  plt.imshow(inp)
  if title is not None:
    plt.title(title)
  plt.axis('off')

inputs, classes = next(iter(data_loaders['train']))
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

The following code provides examples of images with applied transformations. To display the images correctly, the applied normalization needs to be reversed, and the color channels need to be rearranged. The imshow function takes the image tensor (inp) and a title (title) as input. It converts the PyTorch tensor to a numpy array and changes the dimensions. To de-normalize, it creates numpy arrays for the mean and standard deviation values and uses them to convert the normalized image back to its original values.

def create_model(n_classes):
  model = models.resnet34(pretrained=True)

  n_features = model.fc.in_features
  model.fc = nn.Linear(n_features, n_classes)

  return model.to(device)

base_model = create_model(len(class_names))

Instead of constructing a model from scratch, use the architecture of the ResNet model. Additionally, utilize the weights of a model pre-trained on the ImageNet dataset. The create_model function takes the number of classes to classify as input and loads a pre-trained ResNet-34 model. This model uses weights pre-trained on the ImageNet dataset. The function retrieves the number of input features for the last fully connected layer of the ResNet-34 model via model.fc.in_features and replaces the final fully connected layer with a new one using nn.Linear to match the number of classes to be classified.

def train_epoch(
  model, 
  data_loader, 
  loss_fn, 
  optimizer, 
  device, 
  scheduler, 
  n_examples
):
  model = model.train()

  losses = []
  correct_predictions = 0
  
  for inputs, labels in data_loader:
    inputs = inputs.to(device)
    labels = labels.to(device)

    outputs = model(inputs)

    _, preds = torch.max(outputs, dim=1)
    loss = loss_fn(outputs, labels)

    correct_predictions += torch.sum(preds == labels)
    losses.append(loss.item())

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

  scheduler.step()

  return correct_predictions.double() / n_examples, np.mean(losses)

train_epoch is a function that trains the given model for one epoch. It sets the model to training mode and initializes a list to store the loss values for each batch and a variable to count the number of correctly predicted samples. Then, it iterates over the data loader in batches. The input data is passed to the model to compute the output values, and the class index with the highest value is selected as the predicted value. The output values are compared with the actual labels to compute the loss, the number of correctly predicted samples is accumulated, and the current batch's loss value is added to the list. The loss is then backpropagated to compute the gradients, and the weights are updated using the optimization algorithm, followed by gradient reset. Finally, the learning rate scheduler is updated to adjust the learning rate, and the function returns the accuracy and average loss values.

def eval_model(model, data_loader, loss_fn, device, n_examples):
  model = model.eval()

  losses = []
  correct_predictions = 0

  with torch.no_grad():
    for inputs, labels in data_loader:
      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)

      _, preds = torch.max(outputs, dim=1)

      loss = loss_fn(outputs, labels)

      correct_predictions += torch.sum(preds == labels)
      losses.append(loss.item())

  return correct_predictions.double() / n_examples, np.mean(losses)

The eval_model function evaluates the given model. It switches the model to evaluation mode and iterates over the data loader in batches to perform model predictions, calculating loss and accuracy. torch.no_grad() is used to disable gradient calculation during evaluation, which reduces memory usage and speeds up the evaluation process.

def train_model(model, data_loaders, dataset_sizes, device, n_epochs=3):
  optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
  scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
  loss_fn = nn.CrossEntropyLoss().to(device)

  history = defaultdict(list)
  best_accuracy = 0

  for epoch in range(n_epochs):

    print(f'Epoch {epoch + 1}/{n_epochs}')
    print('-' * 10)

    train_acc, train_loss = train_epoch(
      model,
      data_loaders['train'],    
      loss_fn, 
      optimizer, 
      device, 
      scheduler, 
      dataset_sizes['train']
    )

    print(f'Train loss {train_loss} accuracy {train_acc}')

    val_acc, val_loss = eval_model(
      model,
      data_loaders['val'],
      loss_fn,
      device,
      dataset_sizes['val']
    )

    print(f'Val   loss {val_loss} accuracy {val_acc}')
    print()

    history['train_acc'].append(train_acc)
    history['train_loss'].append(train_loss)
    history['val_acc'].append(val_acc)
    history['val_loss'].append(val_loss)

    if val_acc > best_accuracy:
      torch.save(model.state_dict(), 'best_model_state.bin')
      best_accuracy = val_acc

  print(f'Best val accuracy: {best_accuracy}')
  
  model.load_state_dict(torch.load('best_model_state.bin'))

  return model, history

The train_model function trains the model for a specified number of epochs and evaluates the training and validation performance at each epoch. It saves the model with the best performance and ultimately returns the best-performing model and the training history. Here, the optimizer uses the Stochastic Gradient Descent (SGD) optimization algorithm, and the loss function is CrossEntropyLoss. In the loop, the model is trained and evaluated for each epoch, and the performance is printed. The accuracy and loss are added to history, and if the current validation accuracy is higher than the best validation accuracy, the model weights are saved to the file best_model_state.bin. Finally, the highest validation accuracy is printed, and the saved model state is loaded to retrieve the best model. Continued in the next post..

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

168_Image Classification using Torchvision

elif — Fri, 17 May 2024 23:53:45 +0900

I found an example of image classification using transfer learning, and I plan to follow the code to study it. The reference book is mentioned at the end of this post. The explanations are detailed, making it a good resource for studying. First, the required libraries are as follows.

import torch, torchvision

from pathlib import Path
import numpy as np
import cv2
import pandas as pd
from tqdm import tqdm
import PIL.Image as Image
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from matplotlib.ticker import MaxNLocator
from torch.optim import lr_scheduler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from glob import glob
import shutil
from collections import defaultdict

from torch import nn, optim

import torch.nn.functional as F
import torchvision.transforms as T
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from torchvision import models

pathlib.Path for easy file path and file system operations, cv2 for image and video processing, Image module of Python Imaging Library (PIL) to open, convert, and save images, seaborn for data visualization and graph generation, pylab.rcParams to adjust basic settings and styles of graphs, MaxNLocator from Matplotlib to automatically adjust tick positions for better readability, lr_scheduler in PyTorch to dynamically adjust the learning rate, glob to find file paths matching specific patterns, shutil for file and directory operations, collections.defaultdict to create dictionaries with default values if the key is missing, torchvision.datasets.ImageFolder to easily load image datasets organized in a directory structure, and torchvision.models which includes various deep learning model architectures, including pre-trained models.

The dataset consists of over 50000 annotated images of more than 40 traffic signs, which can be downloaded from here(https://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset).

train_folders = sorted(glob('GTSRB/Final_Training/Images/*'))

Use sorted and glob to store the file paths of the downloaded images.

def load_image(img_path, resize=True):
  img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)

  if resize:
    img = cv2.resize(img, (64, 64), interpolation = cv2.INTER_AREA)

  return img

def show_image(img_path):
  img = load_image(img_path)
  plt.imshow(img)
  plt.axis('off')

def show_sign_grid(image_paths):
  images = [load_image(img) for img in image_paths]
  images = torch.as_tensor(images)
  images = images.permute(0, 3, 1, 2)
  grid_img = torchvision.utils.make_grid(images, nrow=11)
  plt.figure(figsize=(24, 12))
  plt.imshow(grid_img.permute(1, 2, 0))
  plt.axis('off');

The load_image function takes the path of an image as input and returns the output. The resize parameter determines whether to resize the image after loading it. cv2.imread reads the image from the specified path, and cv2.cvtColor converts the image from BGR format to RGB format. If resize is True, the image is resized to 64x64.

The show_image function simply loads and displays an image.

The show_sign_grid function uses a for loop to load each image file from the image paths into a list, converts this list to a tensor, and changes its dimensions using permute (batch size, channel, height, width). Then, it arranges the images into a grid with 11 images per row using make_grid and displays the images.

The images can be visualized as follows.

sample_images = [np.random.choice(glob(f'{tf}/*ppm')) for tf in train_folders]
show_sign_grid(sample_images)

class_names = ['priority_road', 'give_way', 'stop', 'no_entry']
class_indices = [12, 13, 14, 17]

!rm -rf data

DATA_DIR = Path('data')

DATASETS = ['train', 'val', 'test']

for ds in DATASETS:
  for cls in class_names:
    (DATA_DIR / ds / cls).mkdir(parents=True, exist_ok=True)
    
for i, cls_index in enumerate(class_indices):
  image_paths = np.array(glob(f'{train_folders[cls_index]}/*.ppm'))
  class_name = class_names[i]
  print(f'{class_name}: {len(image_paths)}')
  np.random.shuffle(image_paths)

  ds_split = np.split(
    image_paths, 
    indices_or_sections=[int(.8*len(image_paths)), int(.9*len(image_paths))]
  )

  dataset_data = zip(DATASETS, ds_split)

  for ds, images in dataset_data:
    for img_path in images:
      shutil.copy(img_path, f'{DATA_DIR}/{ds}/{class_name}/')

Generate lists of class names and indices using class_names and class_indices. Set the base directory for saving the data, and divide the dataset into training, validation, and test sets. Then, create directories for each dataset type and each class using a for loop. After the loop, each dataset type will have a folder for each class.

In the final for loop, iterate over each class index to get the paths of .ppm image files in the specified directory, and retrieve the current class name from class_names. Split the image file paths into 80% for training, 10% for validation, and 10% for testing. Use zip to match dataset types with the split image file paths, and iterate through each dataset type and corresponding image file paths, copying them to the appropriate dataset directory. This code prepares the dataset for model training and evaluation.

mean_nums = [0.485, 0.456, 0.406]
std_nums = [0.229, 0.224, 0.225]

transforms = {'train': T.Compose([
  T.RandomResizedCrop(size=256),
  T.RandomRotation(degrees=15),
  T.RandomHorizontalFlip(),
  T.ToTensor(),
  T.Normalize(mean_nums, std_nums)
]), 'val': T.Compose([
  T.Resize(size=256),
  T.CenterCrop(size=224),
  T.ToTensor(),
  T.Normalize(mean_nums, std_nums)
]), 'test': T.Compose([
  T.Resize(size=256),
  T.CenterCrop(size=224),
  T.ToTensor(),
  T.Normalize(mean_nums, std_nums)
]),
}

The code uses torchvision.transforms for image preprocessing and data augmentation. First, mean_nums and std_nums represent the mean and standard deviation values for each image channel to perform normalization. The transforms dictionary defines the transformations to be applied to each dataset. The purpose of Compose is to apply multiple transformations sequentially.

RandomResizedCrop randomly crops the image and resizes it to 265x256, RandomRotation randomly rotates the image between -15 and 15 degrees, and RandomHorizontalFlip randomly flips the image horizontally. After augmentation, the image is converted to a PyTorch tensor and normalized using the given mean and standard deviation for each channel. Continued in the next post..

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

167_Simple Regression Model using PyTorch(2)

elif — Thu, 16 May 2024 22:42:17 +0900

Continuing from the last posting, I will write the code and provide explanations.

def calculate_accuracy(y_true, y_pred):
    predicted = y_pred.ge(.5).view(-1)
    return (y_true == predicted).sum().float() / len(y_true)

def round_tensor(t, decimal_places = 3):
    return round(t.item(), decimal_places)

The calculate_accuracy function calculates and returns the model's prediction accuracy by determining if each element in y_pred is greater than or equal to 0.5 and converting these elements to True or False. The ge function checks if each element is greater than or equal to 0.5. The view(-1) function changes the tensor to a 1-dimensional tensor. The number of True values is then counted, converted to a float, and divided by the total number of values to calculate accuracy. The round_tensor function takes a tensor t to be rounded and sets the number of decimal places according to decimal_places. It uses t.item() to convert the tensor t's value to a Python number and rounds it. These two functions calculate the model's prediction accuracy and round the calculated value to the specified number of decimal places to present the results concisely.

for epoch in range(1000):
    y_pred = net(X_train)
    y_pred = torch.squeeze(y_pred)
    train_loss = criterion(y_pred, y_train)
    
    if epoch % 100 == 0:
        train_acc = calculate_accuracy(y_train, y_pred)
        y_test_pred = net(X_test)
        y_test_pred = torch.squeeze(y_test_pred)
        test_loss = criterion(y_test_pred, y_test)
        test_acc = calculate_accuracy(y_test, y_test_pred)
        print(f'''epoch {epoch} Train set - loss: {round_tensor(train_loss)}, accuracy: {round_tensor(train_acc)}
        Test  set - loss: {round_tensor(test_loss)}, accuracy: {round_tensor(test_acc)}
        ''')
        
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

In the training loop for training the neural network, training is conducted for 1000 epochs, using X_train as input to the network to calculate predictions. The squeeze function is used here to convert the tensor to a 1-dimensional tensor. The criterion compares the predicted values y_pred and the actual values y_train to calculate the loss. Every 100 epochs, the loss and accuracy on both the training and test sets are calculated and printed. train_acc calculates the accuracy for the training data, while y_test_pred uses X_test as input to the model to get the predicted values for the test data. test_acc calculates the accuracy for the test data. optimizer.zero_grad() resets the gradients from the previous step, train_loss.backward() calculates the gradients of the loss function through backpropagation, and optimizer.step() updates the model parameters using the calculated gradients.

This process allows the model to be trained iteratively while monitoring its performance.

classes = ['No rain', 'Raining']
y_pred = net(X_test)
y_pred = y_pred.ge(.5).view(-1).cpu()
y_test = y_test.cpu()
print(classification_report(y_test,y_pred,target_names=classes))

Set the names of the classified classes, here defined as 'No rain' and 'Raining'. Then, move the actual values and predicted values to the CPU and use classification_report to print the classification report.

In the case of 'No rain', the model works well, while for the 'Raining' class, the model does not perform well and produces unreliable predictions.

cm = confusion_matrix(y_test, y_pred)
df_cm = pd.DataFrame(cm, index=classes, columns=classes)
hmap = sns.heatmap(df_cm, annot=True, fmt = 'd')
hmap.yaxis.set_ticklabels(hmap.yaxis.get_ticklabels(), rotation=0, ha = 'right')
hmap.yaxis.set_ticklabels(hmap.yaxis.get_ticklabels(), rotation=30, ha = 'right')
plt.ylabel('True label')
plt.xlabel('Predicted label')

Use the confusion_matrix function to calculate the matrix comparing actual values and predicted values. Then, convert it into a DataFrame for easier visualization and use the heatmap function to visualize the confusion matrix as a heatmap. This allows for an intuitive understanding of the model's performance.

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)

166_Simple Regression Model using PyTorch

elif — Wed, 15 May 2024 19:47:52 +0900

Today, I read a book that is good for studying by following along, and it contains detailed code, so I plan to follow and analyze the code. So far, I have dealt with classification problems a lot, but now I am going to implement a simple regression model using Pytorch. The required libraries are as follows.

import torch
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from torch import nn, optim
import torch.nn.functional as F

Seaborn is a Python library convenient for visualizing statistical data, pylab is used alongside Matplotlib to easily adjust the style and settings of graphs, and the metrics module from sklearn provides various indicators for evaluating model performance. Here, I'll use it to generate a confusion matrix and a classification report to evaluate the model's performance. The required data is "rain in Australia", and the .csv file can be downloaded via(https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package).

df = pd.read_csv('weatherAUS.csv')
cols = ['Rainfall', 'Humidity3pm', 'Pressure9am', 'RainToday', 'RainTomorrow']
df = df[cols]
df['RainToday'].replace({'No':0, 'Yes':1}, inplace = True)
df['RainTomorrow'].replace({'No':0, 'Yes':1}, inplace = True)
df = df.dropna(how='any')

Load the .csv file into a DataFrame. Then, store the necessary column names in a list and select only those columns from the DataFrame to reconstruct it. Use the replace function to convert the values in the 'RainToday' and 'RainTomorrow' columns from 'No' to 0 and 'Yes' to 1, using the inplace option to directly modify the original DataFrame. Use the dropna function to remove all rows with missing values.

X = df[['Rainfall','Humidity3pm','RainToday','Pressure9am']]
y = df[['RainTomorrow']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=RANDOM_SEED)
X_train = torch.from_numpy(X_train.to_numpy()).float()
y_train = torch.squeeze(torch.from_numpy(y_train.to_numpy()).float())
X_test = torch.from_numpy(X_test.to_numpy()).float()
y_test = torch.squeeze(torch.from_numpy(y_test.to_numpy()).float())

Next, store the independent variables from the DataFrame in 'X' and the dependent variable in 'y'. Use train_test_split to divide the data into training and testing sets, with 20% of the total data used as the test set. Also, fix the random seed value to make the results reproducible. Then, convert the data to PyTorch tensors by first converting the Pandas DataFrame to a numpy array and then converting it to a float-type PyTorch tensor. The purpose of using squeeze is to reduce the dimensions of the tensor, changing it from a 2-dimensional tensor to a 1-dimensional tensor.

class NET(nn.Module):
    def __init__(self, n_feature):
        super(NET, self).__init__()
        self.fc1 = nn.Linear(n_feature, 5)
        self.fc2 = nn.Linear(5,3)
        self.fc3 = nn.Linear(3,1)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return torch.sigmoid(self.fc3(x))

net = NET(X_train.shape[1])
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
X_train = X_train.to(device)
y_train = y_train.to(device)
X_test = X_test.to(device)
y_test = y_test.to(device)
net = net.to(device)
criterion = criterion.to(device)

Next, define the neural network. It is a simple fully connected layer with 3 layers, using ReLU and Sigmoid for activation functions. The loss function used here is Binary Cross Entropy Loss (BCELoss), and the optimization algorithm is Adam. Then, move both the data and the model to the GPU for computation. Continued in the next post..

165_Debugging with Learning and Validation Curves

elif — Tue, 14 May 2024 16:29:01 +0900

If a model is too complex relative to the given training dataset, it tends to overfit the training data and may not generalize well to unseen data. To reduce the degree of overfitting, collecting more training examples can be beneficial. However, in many real-world scenarios, collecting more data is often very challenging. By plotting the model's training and validation accuracy as a function of the training dataset size, it becomes easier to detect whether the model is experiencing high variance or high bias issues and to determine if collecting more data could help resolve these problems.

First, here is how to use scikit-learn's 'learning_curve' function to evaluate the model.

import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve

pipe_lr = make_pipeline(StandardScaler(),
                        LogisticRegression(penalty='l2',
                                           max_iter = 10000))
train_sizes, train_scores, test_scores = \
    learning_curve(estimator=pipe_lr,
                   X=X_train,
                   y=y_train,
                   train_sizes=np.linspace(
                       0.1, 1.0, 10),
                   cv = 10,
                   n_jobs=1)

train_mean = np.mean(train_scores, axis = 1)
train_std = np.std(train_scores, axis =1)
test_mean = np.mean(test_scores, axis = 1)
test_std = np.std(test_scores, axis=1)
plt.plot(train_sizes, train_mean, color = 'blue', marker = 'o', markersize = 5, label = 'Training accuracy')
plt.fill_between(train_sizes, train_mean+train_std, train_mean-train_std, alpha = 0.15, color = 'blue')
plt.plot(train_sizes, test_mean, color = 'green', linestyle='--', marker='s', markersize = 5, label = 'Validation accuracy')
plt.fill_between(train_sizes, test_mean+test_std, test_mean-test_std, alpha = 0.15, color = 'green')
plt.grid()
plt.xlabel('Number of training examples')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.ylim([0.8,1.03])
plt.show()

The 'train_sizes' parameter of the 'learning_curve' function controls the number of training examples used to generate the learning curve. In this example, we used 10 evenly spaced intervals for the training dataset size. By default, the 'learning_curve' function uses stratified $k$-fold cross-validation to calculate the classifier's cross-validation accuracy, and we set the cv parameter to 10 to perform 10-fold stratified cross-validation.

Next, we compute the average accuracy from the cross-validated training and test scores for various training dataset sizes and plot these values. We also use the 'fill_between' function to add the standard deviation of the mean accuracy to the plot, representing the variance of the estimates.

When learning more than 250 examples during training, we can observe quite good performance on both the training and validation datasets. Additionally, when the training dataset has fewer than 250 examples, the training accuracy increases while the gap between the validation accuracy and training accuracy widens, indicating an increasing degree of overfitting.

Validation curves are useful tools for improving model performance by addressing issues such as overfitting or underfitting. Validation curves are related to learning curves, but instead of plotting training and testing accuracy as a function of sample size, they plot accuracy as a function of changing model parameter values.

from sklearn.model_selection import validation_curve

param_range = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]
train_scores, test_scores = validation_curve(
    estimator = pipe_lr,
    X = X_train,
    y = y_train,
    param_name = 'logisticregression__C',
    param_range = param_range,
    cv = 10
)

train_mean = np.mean(train_scores, axis = 1)
train_std = np.std(train_scores, axis = 1)
test_mean = np.mean(test_scores, axis = 1)
test_std = np.std(test_scores, axis = 1)
plt.plot(param_range, train_mean,
         color = 'blue', marker = 'o',
         markersize = 5, label = 'Training accuracy')
plt.fill_between(param_range, train_mean + train_std,
                 train_mean - train_std, alpha = 0.15,
                 color = 'blue')
plt.plot(param_range, test_mean,
         color = 'green', linestyle='--',
         marker = 's', markersize = 5,
         label = 'Validation accuracy')
plt.fill_between(param_range,
                 test_mean+test_std,
                 test_mean-test_std,
                 alpha = 0.15, color = 'green')
plt.grid()
plt.xscale('log')
plt.legend(loc = 'lower right')
plt.xlabel('Parameter C')
plt.ylabel('Accuracy')
plt.ylim([0.8,1.0])
plt.show()

Similar to the learning curve function, the 'validation_curve' function by default uses stratified $k$-fold cross-validation to estimate the classifier's performance. In this case, we access the 'LogisticRegression' object within the pipeline through the inverse regularization parameter of the logistic regression classifier specified over a range of values using the 'param_range' parameter. Similar to the previous code, we plot the mean training and cross-validation accuracy along with the respective standard deviations.

The differences in accuracy with changes in parameter values are subtle, but we can observe that as the regularization strength increases, the model tends to slightly underfit the data. Conversely, when the C value is large, the regularization strength decreases, causing the model to slightly overfit.

164_K-fold Cross Validation

elif — Mon, 13 May 2024 23:04:51 +0900

Various methods exist for general cross-validation techniques, such as hold-out cross-validation and k-fold cross-validation. These techniques help to reliably estimate the generalization performance of a model, that is, how well the model performs on unseen data.

A classical and popular approach to estimate the generalization performance of machine learning models is the hold-out method. The initial dataset is divided into a training dataset and a test dataset, where the training dataset is used to train the model, and the test dataset is used to estimate the generalization performance. However, in typical machine learning processes, various parameter settings are adjusted and compared to improve prediction performance on unseen data. This process is called model selection, which means selecting the optimal hyperparameter values for a given problem. However, if the same test dataset is repeatedly used during the model selection process, there is a high possibility that the test dataset becomes part of the training data, leading to model overfitting.

Therefore, the data is divided into three parts: training, validation, and test datasets. The training dataset is used to fit various models, and the performance on the validation dataset is used for model selection. The advantage of having a test dataset that the model has not seen before during training and model selection is that it provides a less biased estimate of the model's generalization ability on new data. Once the hyperparameter tuning is complete, the generalization performance of the model is estimated using the test dataset.

A drawback of the hold-out method is that the performance estimate can be highly sensitive to how the training dataset is partitioned. Therefore, a more robust technique, $k$-fold cross-validation, is often used for performance estimation. In this method, the training data is divided into $k$ subsets, and the hold-out method is repeated $k$ times.

Specifically, in $k$-fold cross-validation, the training dataset is randomly divided into $k$ folds. Out of these, $k$_1 folds are used for model training, and the remaining fold is used for performance evaluation. This procedure is repeated $k$ times, resulting in $k$ models and performance estimates. Then, the average performance across the different independent test folds is calculated, providing a performance estimate that is less sensitive to the partitioning of the training data compared to the hold-out method. Consequently, $k$-fold cross-validation is generally used to find the optimal hyperparameter values that provide satisfactory generalization performance.

In summary, $k$-fold cross-validation makes better use of the dataset compared to the hold-out method that uses a validation set. This is because every data point is used for evaluation in $k$-fold cross-validation. Increasing the value of $k$ means more training data is used in each iteration, which reduces the bias when averaging the individual model estimates to estimate generalization performance. However, as $k$ increases, the execution time of the cross-validation algorithm also increases, and the variance of the estimates can rise due to the training folds becoming more similar to each other.

A slightly improved approach to standard $k$-fold cross-validation is stratified $k$-fold cross-validation. This method can provide better bias and variance estimates, especially in cases of imbalanced class ratios. Using scikit-learn's 'StratifiedKFold' iterator, it can be implemented as follows

import numpy as np
from sklearn.model_selection import StratifiedKFold
kfold = StratifiedKFold(n_splits=10).split(X_train,y_train)
scores = []
for k, (train,test) in enumerate(kfold):
    pipe_lr.fit(X_train[train], y_train[train])
    score = pipe_lr.score(X_train[test], y_train[test])
    scores.append(score)
    print(f'Fold: {k+1:02d}, '
          f'Class distr.: {np.bincount(y_train[train])}, '
          f'Acc.: {score:.3f}')

Here, the 'StratifiedKFold' iterator from the 'sklearn.model_selection' module is initialized with the 'y_train' class labels of the training dataset, and the number of folds is specified through the 'n_splits' parameter. When iterating through the $k$ folds using the kfold iterator, a logistic regression pipeline is fitted using the train indices. The pipeline ensures that the examples are appropriately scaled in each iteration, and the accuracy score of the model is calculated using the test indices. These scores are collected to compute the mean accuracy and the standard deviation of the estimates.

scikit-learn also allows for a more concise implementation to evaluate the model using stratified $k$-fold cross-validation by using the $k$-fold cross-validation scorer.

from sklearn.model_selection import cross_val_score
scores = cross_val_score(estimator=pipe_lr,
                         X = X_train,
                         y = y_train,
                         cv = 10,
                         n_jobs=1)
print(f'CV accuracy scores: {scores}')
print(f'CV accuracy: {np.mean(scores):.3f}'
      f'+/- {np.std(scores):.3f}')

Here, the 'pipe_lr' is defined as follows.

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

pipe_lr = make_pipeline(StandardScaler(),
                        PCA(n_components=2),
                        LogisticRegression())

ref: Raschka, Sebastian, et al. Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python. Packt Publishing Ltd, 2022.