Deep Learning

98_Simple ANN Example Using PyTorch(2)

elif 2024. 3. 7. 19:38

In the last post(97_Simple ANN Example Using PyTorch), we conducted an example of classifying MNIST data using a simple ANN. In this post, our goal is to achieve better loss values by modifying different parameters and experimenting with various hyperparameter adjustments.

 

 

Before proceeding, it was noted that in the last post, the 'manual_seed' value was not set, leading to different loss values with each training session. It has been decided to set 'torch.manual_seed(42)' to ensure consistent results.

 

The previous results were a Training loss of 0.0344 and a Validation loss of 0.0908, with the total code execution time being 68 seconds.

 

Actually, randomly adjusting hyperparameters to find optimal values is highly inefficient and not recommended. There are many research papers studying the impact of hyperparameters on model performance, and it's advisable to refer to them. Despite this, the reason for writing this post is to explore the hyperparameters that influence model performance and to understand the impact of adjusting these values.

 

The first step is to adjust the learning rate. The original value is 0.01.

 

Lr = 0.01

Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s

 

Lr = 0.05

Training Loss : 0.0106, Validation Loss : 0.0919, time : 68s

 

Lr = 0.1

Training Loss : 0.0297, Validation Loss : 0.1261, time : 69s

 

Lr = 0.005

Training Loss : 0.0747, Validation Loss : 0.1167, time : 67s

 

Lr = 0.001

Training Loss : 0.2479, Validation Loss : 0.2695, time : 66s

 

The Learning rate(Lr) value should neither be too large nor too small, and finding the appropriate value is crucial, this process requires a significant amount of work. It is recommended to adjust the value in very small increments and check the performance. The best performance was observed when Lr=0.05.

 

https://www.jeremyjordan.me/nn-learning-rate/

 

Next is the Batch size. The default value is 64.

 

Batch size = 64

Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s

 

Batch size = 16

Training Loss : 0.0103, Validation Loss : 0.0909, time : 86s

 

Batch size = 32

Training Loss : 0.0147, Validation Loss : 0.0845, time : 75s

 

Batch size = 128

Training Loss : 0.0746, Validation Loss : 0.1150, time : 66s

 

Batch size = 256

Training Loss : 0.1415, Validation Loss : 0.1730, time : 63s

 

As the Batch size increased, the time for training decreased, but there was a significant increase in the loss values. It is concluded that setting it to the default value provides the best results.

 

Next is dropout. The default value is 0.

Drop out = 0

Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s

 

Drop out = 0.2

Training Loss : 0.0593, Validation Loss : 0.0814, time : 68s

 

Drop out = 0.5

Training Loss : 0.1245, Validation Loss : 0.1040, time : 69s

 

Drop out = 0.9

Training Loss : 1.2039, Validation Loss : 0.6379, time : 68s

 

Dropout is a method used to prevent overfitting by excluding a certain level of data during training. In this problem, since overfitting did not occur, adding a dropout layer did not result in better performance. However, when dropout is set to 0.9, meaning only 10% of the data is used for training, it can be observed that the loss values are significantly higher.

 

Next is the Optimizer. While SGD was being used, let's see the results when using different optimizers.

 

Optimizer = SGD

Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s

 

Optimizer = Adam

Training Loss : 0.0169, Validation Loss : 0.1086, time : 74s

 

Optimizer = RMSprop

Training Loss : 0.1080, Validation Loss : 0.2387, time : 72s

 

Optimizer = Adagrad

Training Loss : 0.0351, Validation Loss : 0.0836, time : 70s

 

Both Adam and Adagrad showed good performance.

 

In conclusion, it was observed that changing the optimizer to Adam alone resulted in a reduction in the loss value. However, since the hyperparameters are not independent and can affect each other when values are changed, adjusting hyperparameters is a challenging task and generally time-consuming.

 

'Deep Learning' 카테고리의 다른 글

100_DNN Example using PyTorch  (0) 2024.03.09
99_Simple CNN Example Using PyTorch  (0) 2024.03.08
97_Simple ANN Example Using PyTorch  (0) 2024.03.06
74_Diffusion Model(2)  (0) 2024.02.12
73_Diffusion Model  (0) 2024.02.11