Neural Network
May 06, 2024
Neural Network
Consider where , and is some non-linear transformation. The hidden layer is size and usual candidates for a single hidden layer network is sigmoid or tanh.
We want to have , so we want an estimate for and . Just like with least squares, we can minimize the squared loss.
To get the actual estimates we can use SGD.
Binary Classifiers
If the problem is to classify as true or false, let and minimize the logistic loss with SGD.
Multi-Label Classification
Instead of a vector at the very last layer, have a matrix such that each row will give the sigmoid function
To train the classification problem minimize the following loss with SGD.
Tricks
Some tricks to make neural networks perform better:
- Add bias term.
- Add skip connection:
Deep Learning
Deep Learning is simply a neural network with more than one hidden layer. Note that the non-linear transformation does not have to be the same function across all layers.
However, more hidden layers come with more non-linear transformation, which will induce the vanishing gradient problem. Some solution for this would be:
-
ReLU function, , is inspired by the behavior of neurons in the brain, which only activate when the input exceeds a certain threshold. One drawback of the ReLU function is that it can lead to sparsity in the neural activations, which might not always be desirable. An alternative to address this issue is Leaky ReLU where . This variation allows a small, non-zero gradient when the input is negative, thereby mitigating the problem of inactive neurons.
-
Skip connection: ResNet.
Some other tricks to improve performance include:
- Data standardization
- Initialize weight parameters close to
- Batch Norm
- Adam Gradient Descent