This is a simple method for weight initialization for deep net learning. The method consists of the two steps:

- First, pre-initialize weights of each convolution or inner-product layer with

orthonormal matrices.

- Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one.

Experiment with different activation functions (maxout, ReLU-family, tanh) show

that the proposed initialization leads to learning of very deep nets.

Pseudo-code of LSUV

- In the most cases, batch normalization put after non-linearity performs better.

- LSUV-initialized network is as good as batch-normalized one.

- The paper are not claiming that batch normalization can always be replaced by proper initialization, especially in large datasets like ImageNet.

LSUV-keras: https://github.com/ducha-aiki/LSUV-keras

- First, pre-initialize weights of each convolution or inner-product layer with

orthonormal matrices.

- Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one.

Experiment with different activation functions (maxout, ReLU-family, tanh) show

that the proposed initialization leads to learning of very deep nets.

Pseudo-code of LSUV

**Note**:- In the most cases, batch normalization put after non-linearity performs better.

- LSUV-initialized network is as good as batch-normalized one.

- The paper are not claiming that batch normalization can always be replaced by proper initialization, especially in large datasets like ImageNet.

LSUV-keras: https://github.com/ducha-aiki/LSUV-keras

## No comments:

## Post a Comment