- 深度学习之模型设计:核心算法与案例实践
- 言有三
- 420字
- 2021-04-04 03:40:16
参考文献
[1] Hinton G E,Salakhutdinov R R.Reducing the dimensionality of data with neural networks[J].science,2006,313(5786):504-507.
[2] Viola P,Jones M.Robust real-time object detection[J].International journal of computer vision,2001,4(34-47):4.
[3] Hinton G,Deng L,Yu D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.
[4] Deng L,Yu D.Deep learning:methods and applications[J].Foundations and Trends® in Signal Processing,2014,7(3-4):197-387.
[5] Krizhevsky A,Sutskever I,Hinton G E.Imagenet classification with deep convolutional neural networks[C].Advances in neural information processing systems,2012:1097-1105.
[6] Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model[J].Journal of machine learning research,2003,3:1137-1155.
[7] Glorot X,Bordes A,Bengio Y.Deep sparse rectifier neural networks[C].Proceedings of the fourteenth international conference on artificial intelligence and statistics,2011:315-323.
[8] Ramachandran P,Zoph B,Le Q V.Searching for activation functions[J].arXiv preprint arXiv:1710.05941,2017.
[9] Nwankpa C,Ijomah W,Gachagan A,et al.Activation functions:Comparison of trends in practice and research for deep learning[J].arXiv preprint arXiv:1811.03378,2018.
[10] Glorot X,Bengio Y.Understanding the difficulty of training deep feedforward neural networks[C].Proceedings of the thirteenth international conference on artificial intelligence and statistics,2010:249-256.
[11] Kumar S K.On weight initialization in deep neural networks[J].arXiv preprint arXiv:1704.08863,2017.
[12] Ioffe S,Szegedy C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C].International Conference on International Conference on Machine Learning.JMLR.org,2015.
[13] Ioffe S.Batch Renormalization:Towards Reducing Minibatch Dependence in Batch-Normalized Models[J].arXiv:Learning,2017.
[14] Wu Y,He K.Group Normalization.[J].arXiv:Computer Vision and Pattern Recognition,2018.
[15] Luo P,Ren J,Peng Z,et al.Differentiable learning-to-normalize via switchable normalization[J].arXiv preprint arXiv:1806.10779,2018.
[16] Bjorck N,Gomes C P,Selman B,et al.Understanding batch normalization[C].Advances in Neural Information Processing Systems,2018:7694-7705.
[17] Santurkar S,Tsipras D,Ilyas A,et al.How does batch normalization help optimization?[C].Advances in Neural Information Processing Systems,2018:2483-2493.
[18] Yu D,Wang H,Chen P,et al.Mixed pooling for convolutional neural networks[C].International conference on rough sets and knowledge technology.Springer,Cham,2014:364-375.
[19] Zeiler M D,Fergus R.Stochastic pooling for regularization of deep convolutional neural networks[J].arXiv preprint arXiv:1301.3557,2013.
[20] Ruderman A,Rabinowitz N C,Morcos A S,et al.Pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs[J].arXiv preprint arXiv:1804.04438,2018.
[21] Duchi J C,Hazan E,Singer Y,et al.Adaptive Subgradient Methods for Online Learning and Stochastic Optimization[J].Journal of Machine Learning Research,2011:2121-2159.
[22] Zeiler M D.ADADELTA:An Adaptive Learning Rate Method[J].arXiv:Learning,2012.
[23] Kingma D P,Ba J.Adam:A Method for Stochastic Optimization[J].arXiv:Learning,2014.
[24] Reddi S J,Kale S,Kumar S.On the convergence of adam and beyond[J].arXiv preprint arXiv:1904.09237,2019.
[25] Shazeer N,Stern M.Adafactor:Adaptive learning rates with sublinear memory cost[J].arXiv preprint arXiv:1804.04235,2018.
[26] Luo L,Xiong Y,Liu Y,et al.Adaptive gradient methods with dynamic bound of learning rate[J].arXiv preprint arXiv:1902.09843,2019.
[27] Srivastava N,Hinton G E,Krizhevsky A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[28] Goodfellow I,Wardefarley D,Mirza M,et al.Maxout Networks[C].international conference on machine learning,2013.