Deep Net Initialisation
- Initialise hidden layer biases to 0 and output (or reconstruction) biases to optimal value if weights were 0 (e.g. mean target or inverse sigmoid of mean target).
- Initialise weights to $\text{Uniform}(-r,r)$, where \[r=\sqrt{\frac{6}{\text{fan-in}+\text{fan-out}}}\] for $\tanh$ units, and 4x bigger for sigmoid units (Glorot AISTATS 2010).