### Effects of Sparse Initialization in Deep Belief Networks

#### Abstract

#### Keywords

#### Full Text:

PDF#### References

Bengio Y.: Practical Recommendations for Gradient-Based Training of Deep Architectures. In: G. Montavon, G.B. Orr, K.R. Müller, eds, Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700, pp. 437–478. Springer, Berlin–Heidelberg, 2012.

Bergstra J., Bengio Y.: Random Search for Hyper-parameter Optimization. Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012.

Bridle J.S.: Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In: F. Soulié, J. Hérault, eds, Neurocomputing, NATO ASI Series, vol. 68, pp. 227–236. Springer, Berlin–Heidelberg, 1990.

Glorot X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: Y.W. Teh, M. Titterington, eds, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, vol. 9, pp. 249–256. JMLR Workshop and Conference Proceedings, 2010.

Hinton G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation, vol. 14(8), pp. 1771–1800, 2002.

Hinton G.E.: A Practical Guide to Training Restricted Boltzmann Machines. In: G. Montavon, G.B. Orr, K.R. M ̈uller, eds, Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700, pp. 599–619. Springer, Berlin–Heidelberg, 2012.

Hinton G.E., Salakhutdinov R.R.: Reducing the dimensionality of data with neural networks. Science, vol. 313(5786), pp. 504–507, 2006.

LeCun Y., Bottou L., Bengio Y., Haffner P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86(11), pp. 2278–2324, 1998.

LeCun Y., Huang F.J., Bottou L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR04), vol. 2, pp. II–97. IEEE, 2004.

Martens J.: Deep learning via Hessian-free optimization. In: J. Fürnkranz, T. Joachims, eds, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 735–742. Omnipress, 2010.

Nair V., Hinton G.E.: Rectified Linear Units Improve Restricted Boltzmann Machines. In: J. F ̈urnkranz, T. Joachims, eds, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814. Omnipress, 2010.

Nesterov Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Soviet Mathematics Doklady, vol. 27(2), pp. 372–376, 1983.

Polyak B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, vol. 4(5), pp. 1–17, 1964.

Rumelhart D.E., Hinton G.E., Williams R.J.: Learning representations by back-propagating errors. Nature, vol. 323(6088), pp. 533–536, 1986.

Smolensky P.: Information Processing in Dynamical Systems: Foundations of Harmony Theory. In: D.E. Rumelhart, J.L. McClelland, CORPORATE PDP Research Group, eds, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 194–281. MIT Press, 1986.

Srivastava N.: Improving neural networks with dropout. Master’s thesis, University of Toronto, 2013.

Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R.: Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, vol. 15(1), pp. 1929–1958, 2014.

Sutskever I., Martens J., Dahl G., Hinton G.E.: On the importance of initialization and momentum in deep learning. In: S. Dasgupta, D. Mcallester, eds, Proceedings of the 30th International Conference on Machine Learning (ICML-13), vol. 28, pp. 1139–1147. JMLR Workshop and Conference Proceedings, 2013.

DOI: https://doi.org/10.7494/csci.2015.16.4.313

### Refbacks

- There are currently no refbacks.