Deep Learning

169_Image Classification using Torchvision(2)

elif 2024. 5. 18. 22:48

Continuing from the previous post(168_Image Classification using Torchvision).

 

image_datasets = {
  d: ImageFolder(f'{DATA_DIR}/{d}', transforms[d]) for d in DATASETS
}

data_loaders = {
  d: DataLoader(image_datasets[d], batch_size=4, shuffle=True, num_workers=4) 
  for d in DATASETS
}

 

Here, to facilitate easier training, PyTorch datasets and data loaders are created for each image dataset folder. Data loaders provide data in batches. The image_datasets dictionary loads the image data for each dataset and stores an ImageFolder object with the corresponding transformations applied. The data_loaders dictionary stores a DataLoader object for each dataset, which loads data in batches. This setup efficiently provides data during model training and evaluation.

 

def imshow(inp, title=None):
  inp = inp.numpy().transpose((1, 2, 0))
  mean = np.array([mean_nums])
  std = np.array([std_nums])
  inp = std * inp + mean
  inp = np.clip(inp, 0, 1)
  plt.imshow(inp)
  if title is not None:
    plt.title(title)
  plt.axis('off')

inputs, classes = next(iter(data_loaders['train']))
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])


The following code provides examples of images with applied transformations. To display the images correctly, the applied normalization needs to be reversed, and the color channels need to be rearranged. The imshow function takes the image tensor (inp) and a title (title) as input. It converts the PyTorch tensor to a numpy array and changes the dimensions. To de-normalize, it creates numpy arrays for the mean and standard deviation values and uses them to convert the normalized image back to its original values.

 

def create_model(n_classes):
  model = models.resnet34(pretrained=True)

  n_features = model.fc.in_features
  model.fc = nn.Linear(n_features, n_classes)

  return model.to(device)

base_model = create_model(len(class_names))

 

Instead of constructing a model from scratch, use the architecture of the ResNet model. Additionally, utilize the weights of a model pre-trained on the ImageNet dataset. The create_model function takes the number of classes to classify as input and loads a pre-trained ResNet-34 model. This model uses weights pre-trained on the ImageNet dataset. The function retrieves the number of input features for the last fully connected layer of the ResNet-34 model via model.fc.in_features and replaces the final fully connected layer with a new one using nn.Linear to match the number of classes to be classified.

 

def train_epoch(
  model, 
  data_loader, 
  loss_fn, 
  optimizer, 
  device, 
  scheduler, 
  n_examples
):
  model = model.train()

  losses = []
  correct_predictions = 0
  
  for inputs, labels in data_loader:
    inputs = inputs.to(device)
    labels = labels.to(device)

    outputs = model(inputs)

    _, preds = torch.max(outputs, dim=1)
    loss = loss_fn(outputs, labels)

    correct_predictions += torch.sum(preds == labels)
    losses.append(loss.item())

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

  scheduler.step()

  return correct_predictions.double() / n_examples, np.mean(losses)

 

train_epoch is a function that trains the given model for one epoch. It sets the model to training mode and initializes a list to store the loss values for each batch and a variable to count the number of correctly predicted samples. Then, it iterates over the data loader in batches. The input data is passed to the model to compute the output values, and the class index with the highest value is selected as the predicted value. The output values are compared with the actual labels to compute the loss, the number of correctly predicted samples is accumulated, and the current batch's loss value is added to the list. The loss is then backpropagated to compute the gradients, and the weights are updated using the optimization algorithm, followed by gradient reset. Finally, the learning rate scheduler is updated to adjust the learning rate, and the function returns the accuracy and average loss values.

 

def eval_model(model, data_loader, loss_fn, device, n_examples):
  model = model.eval()

  losses = []
  correct_predictions = 0

  with torch.no_grad():
    for inputs, labels in data_loader:
      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)

      _, preds = torch.max(outputs, dim=1)

      loss = loss_fn(outputs, labels)

      correct_predictions += torch.sum(preds == labels)
      losses.append(loss.item())

  return correct_predictions.double() / n_examples, np.mean(losses)

 

The eval_model function evaluates the given model. It switches the model to evaluation mode and iterates over the data loader in batches to perform model predictions, calculating loss and accuracy. torch.no_grad() is used to disable gradient calculation during evaluation, which reduces memory usage and speeds up the evaluation process.

 

def train_model(model, data_loaders, dataset_sizes, device, n_epochs=3):
  optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
  scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
  loss_fn = nn.CrossEntropyLoss().to(device)

  history = defaultdict(list)
  best_accuracy = 0

  for epoch in range(n_epochs):

    print(f'Epoch {epoch + 1}/{n_epochs}')
    print('-' * 10)

    train_acc, train_loss = train_epoch(
      model,
      data_loaders['train'],    
      loss_fn, 
      optimizer, 
      device, 
      scheduler, 
      dataset_sizes['train']
    )

    print(f'Train loss {train_loss} accuracy {train_acc}')

    val_acc, val_loss = eval_model(
      model,
      data_loaders['val'],
      loss_fn,
      device,
      dataset_sizes['val']
    )

    print(f'Val   loss {val_loss} accuracy {val_acc}')
    print()

    history['train_acc'].append(train_acc)
    history['train_loss'].append(train_loss)
    history['val_acc'].append(val_acc)
    history['val_loss'].append(val_loss)

    if val_acc > best_accuracy:
      torch.save(model.state_dict(), 'best_model_state.bin')
      best_accuracy = val_acc

  print(f'Best val accuracy: {best_accuracy}')
  
  model.load_state_dict(torch.load('best_model_state.bin'))

  return model, history

 

The train_model function trains the model for a specified number of epochs and evaluates the training and validation performance at each epoch. It saves the model with the best performance and ultimately returns the best-performing model and the training history. Here, the optimizer uses the Stochastic Gradient Descent (SGD) optimization algorithm, and the loss function is CrossEntropyLoss. In the loop, the model is trained and evaluated for each epoch, and the performance is printed. The accuracy and loss are added to history, and if the current validation accuracy is higher than the best validation accuracy, the model weights are saved to the file best_model_state.bin. Finally, the highest validation accuracy is printed, and the saved model state is loaded to retrieve the best model. Continued in the next post..

 

ref : Venelin Valkov - Get SH_T Done with PyTorch_ Solve Real-world Machine Learning Problems with Deep Neural Networks in Python-Venelin Valkov (2020)