Deep Learning

55_Likelihood Function

elif 2024. 1. 24. 22:09

Although covered in a previous post(52_Gaussian Distribution), it seems necessary to provide additional explanation, so in this post, I'll write about the likelihood function.

 

Firstly, the likelihood function is a function that represents how likely the observed data is given the model parameters. Therefore, in deep learning, the likelihood function plays an important role.

Probability and likelihood are similar, but while probability refers to the likelihood of an event occurring under certain parameters or conditions, likelihood refers to the likelihood of observed data occurring under specific parameters or a model. Therefore, the likelihood function is used to evaluate the fit between the dataset and model parameters, and during model training, it involves finding parameters that maximize the likelihood for the given data. This process is known as Maximum Likelihood Estimation(MLE).

 

Given a function $y$ with Gaussian noise, $t$ can be represented as follows.

 

 

Where $\varepsilon $ is a Gaussian random variable with a mean of 0 and variance of ${\sigma ^2}$, which can be represented as follow.

 

 

Assuming that the input data and target data are independently drawn as per the above equation, we can obtain a likelihood function for the adjustable parameters $w$ and ${\sigma^2}$.

 

 

By taking the log of the likelihood function and using the single variable Gaussian form, it can be expressed as follows.

 

 

Maximizing the likelihood function in a Gaussian noise distribution is equivalent to minimizing the sum of squared errors function(${E_D}(w)$).

In the likelihood function, the maximum likelihood method can be used to determine $w$ and ${\sigma^2}$. To maximize with respect to $w$, we compute the gradient value with respect to $w$ in the above log likelihood function.

 

 

Setting the gradient to 0 and solving for the value, we obtain the following.

 

 

Using the above equation, the previously defined error function can be modified as follows.

 

 

Therefore,

 

 

The bias value ${w_0}$ reduces the difference between the target values and the basis function. Additionally, the log likelihood function for the variance ${\sigma^2}$ can be maximized as follows.

 

 

 

 

 

ref : Chris Bishop's "Deep Learning - Foundations and Concepts"

'Deep Learning' 카테고리의 다른 글

57_Single Layer Classification  (1) 2024.01.26
56_Sequential Learning  (1) 2024.01.25
54_Linear Regression  (0) 2024.01.23
53_Bernoulli, Binomial, and Multinomial Distribution  (0) 2024.01.22
52_Gaussian Distribution  (0) 2024.01.21