Deep Learning

166_Simple Regression Model using PyTorch

elif 2024. 5. 15. 19:47

Today, I read a book that is good for studying by following along, and it contains detailed code, so I plan to follow and analyze the code. So far, I have dealt with classification problems a lot, but now I am going to implement a simple regression model using Pytorch. The required libraries are as follows.

 

import torch
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from torch import nn, optim
import torch.nn.functional as F

 

 

Seaborn is a Python library convenient for visualizing statistical data, pylab is used alongside Matplotlib to easily adjust the style and settings of graphs, and the metrics module from sklearn provides various indicators for evaluating model performance. Here, I'll use it to generate a confusion matrix and a classification report to evaluate the model's performance. The required data is "rain in Australia", and the .csv file can be downloaded via(https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package).

 

df = pd.read_csv('weatherAUS.csv')
cols = ['Rainfall', 'Humidity3pm', 'Pressure9am', 'RainToday', 'RainTomorrow']
df = df[cols]
df['RainToday'].replace({'No':0, 'Yes':1}, inplace = True)
df['RainTomorrow'].replace({'No':0, 'Yes':1}, inplace = True)
df = df.dropna(how='any')

 

Load the .csv file into a DataFrame. Then, store the necessary column names in a list and select only those columns from the DataFrame to reconstruct it. Use the replace function to convert the values in the 'RainToday' and 'RainTomorrow' columns from 'No' to 0 and 'Yes' to 1, using the inplace option to directly modify the original DataFrame. Use the dropna function to remove all rows with missing values.

 

X = df[['Rainfall','Humidity3pm','RainToday','Pressure9am']]
y = df[['RainTomorrow']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=RANDOM_SEED)
X_train = torch.from_numpy(X_train.to_numpy()).float()
y_train = torch.squeeze(torch.from_numpy(y_train.to_numpy()).float())
X_test = torch.from_numpy(X_test.to_numpy()).float()
y_test = torch.squeeze(torch.from_numpy(y_test.to_numpy()).float())

 

Next, store the independent variables from the DataFrame in 'X' and the dependent variable in 'y'. Use train_test_split to divide the data into training and testing sets, with 20% of the total data used as the test set. Also, fix the random seed value to make the results reproducible. Then, convert the data to PyTorch tensors by first converting the Pandas DataFrame to a numpy array and then converting it to a float-type PyTorch tensor. The purpose of using squeeze is to reduce the dimensions of the tensor, changing it from a 2-dimensional tensor to a 1-dimensional tensor.

 

class NET(nn.Module):
    def __init__(self, n_feature):
        super(NET, self).__init__()
        self.fc1 = nn.Linear(n_feature, 5)
        self.fc2 = nn.Linear(5,3)
        self.fc3 = nn.Linear(3,1)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return torch.sigmoid(self.fc3(x))

net = NET(X_train.shape[1])
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
X_train = X_train.to(device)
y_train = y_train.to(device)
X_test = X_test.to(device)
y_test = y_test.to(device)
net = net.to(device)
criterion = criterion.to(device)


Next, define the neural network. It is a simple fully connected layer with 3 layers, using ReLU and Sigmoid for activation functions. The loss function used here is Binary Cross Entropy Loss (BCELoss), and the optimization algorithm is Adam. Then, move both the data and the model to the GPU for computation. Continued in the next post..