In the last post(97_Simple ANN Example Using PyTorch), we conducted an example of classifying MNIST data using a simple ANN. In this post, our goal is to achieve better loss values by modifying different parameters and experimenting with various hyperparameter adjustments.
Before proceeding, it was noted that in the last post, the 'manual_seed' value was not set, leading to different loss values with each training session. It has been decided to set 'torch.manual_seed(42)' to ensure consistent results.
The previous results were a Training loss of 0.0344 and a Validation loss of 0.0908, with the total code execution time being 68 seconds.
Actually, randomly adjusting hyperparameters to find optimal values is highly inefficient and not recommended. There are many research papers studying the impact of hyperparameters on model performance, and it's advisable to refer to them. Despite this, the reason for writing this post is to explore the hyperparameters that influence model performance and to understand the impact of adjusting these values.
The first step is to adjust the learning rate. The original value is 0.01.
Lr = 0.01
Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s
Lr = 0.05
Training Loss : 0.0106, Validation Loss : 0.0919, time : 68s
Lr = 0.1
Training Loss : 0.0297, Validation Loss : 0.1261, time : 69s
Lr = 0.005
Training Loss : 0.0747, Validation Loss : 0.1167, time : 67s
Lr = 0.001
Training Loss : 0.2479, Validation Loss : 0.2695, time : 66s
The Learning rate(Lr) value should neither be too large nor too small, and finding the appropriate value is crucial, this process requires a significant amount of work. It is recommended to adjust the value in very small increments and check the performance. The best performance was observed when Lr=0.05.
Next is the Batch size. The default value is 64.
Batch size = 64
Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s
Batch size = 16
Training Loss : 0.0103, Validation Loss : 0.0909, time : 86s
Batch size = 32
Training Loss : 0.0147, Validation Loss : 0.0845, time : 75s
Batch size = 128
Training Loss : 0.0746, Validation Loss : 0.1150, time : 66s
Batch size = 256
Training Loss : 0.1415, Validation Loss : 0.1730, time : 63s
As the Batch size increased, the time for training decreased, but there was a significant increase in the loss values. It is concluded that setting it to the default value provides the best results.
Next is dropout. The default value is 0.
Drop out = 0
Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s
Drop out = 0.2
Training Loss : 0.0593, Validation Loss : 0.0814, time : 68s
Drop out = 0.5
Training Loss : 0.1245, Validation Loss : 0.1040, time : 69s
Drop out = 0.9
Training Loss : 1.2039, Validation Loss : 0.6379, time : 68s
Dropout is a method used to prevent overfitting by excluding a certain level of data during training. In this problem, since overfitting did not occur, adding a dropout layer did not result in better performance. However, when dropout is set to 0.9, meaning only 10% of the data is used for training, it can be observed that the loss values are significantly higher.
Next is the Optimizer. While SGD was being used, let's see the results when using different optimizers.
Optimizer = SGD
Training Loss : 0.0344, Validation Loss : 0.0908, time : 68s
Optimizer = Adam
Training Loss : 0.0169, Validation Loss : 0.1086, time : 74s
Optimizer = RMSprop
Training Loss : 0.1080, Validation Loss : 0.2387, time : 72s
Optimizer = Adagrad
Training Loss : 0.0351, Validation Loss : 0.0836, time : 70s
Both Adam and Adagrad showed good performance.
In conclusion, it was observed that changing the optimizer to Adam alone resulted in a reduction in the loss value. However, since the hyperparameters are not independent and can affect each other when values are changed, adjusting hyperparameters is a challenging task and generally time-consuming.
'Deep Learning' 카테고리의 다른 글
100_DNN Example using PyTorch (0) | 2024.03.09 |
---|---|
99_Simple CNN Example Using PyTorch (0) | 2024.03.08 |
97_Simple ANN Example Using PyTorch (0) | 2024.03.06 |
74_Diffusion Model(2) (0) | 2024.02.12 |
73_Diffusion Model (0) | 2024.02.11 |