atpv

Навчальні матеріали з автоматизації технологічних процесів та виробництв, розроблені спільнотою

     
<— Afterword.md Зміст  

Bibliography

J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer Normalization. CoRR, abs/1607.06450, 2016. [pdf].78

R. Balestriero, M. Ibrahim, V. Sobal, et al. A Cookbook of Self-Supervised Learning. CoRR, abs/2304.12210, 2023. [pdf].138

A. Baydin, B. Pearlmutter, A. Radul, and

J. Siskind. Automatic differentiation in machine learning: a survey. CoRR, abs/1502.05767, 2015. [pdf].41

M. Belkin, D. Hsu, S. Ma, and S. Mandal. Rec- onciling modern machine learning and the bias-variance trade-off. CoRR, abs/1812.11118, 2018. [pdf].46

R. Bommasani, D. Hudson, E. Adeli, et al. On the Opportunities and Risks of Foundation Models. CoRR, abs/2108.07258, 2021. [pdf]. 129

T. Brown, B. Mann, N. Ryder, et al. Lan- guage Models are Few-Shot Learners. CoRR, abs/2005.14165, 2020. [pdf].49,107

S. Bubeck, V. Chandrasekaran, R. Eldan, et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4. CoRR, abs/2303.12712, 2023. [pdf].129

T. Chen, B. Xu, C. Zhang, and C. Guestrin. Train- ing Deep Nets with Sublinear Memory Cost. CoRR, abs/1604.06174, 2016. [pdf].42

K. Cho, B. van Merrienboer, Ç. Gülçehre, et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR, abs/1406.1078, 2014. [pdf]. 134

A. Chowdhery, S. Narang, J. Devlin, et al. PaLM: Scaling Language Modeling with Pathways. CoRR, abs/2204.02311, 2022. [pdf].49,128

G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Con- trol, Signals, and Systems, 2(4):303–314, De- cember 1989. [pdf].93

J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional

Transformers for Language Understanding.

CoRR, abs/1810.04805, 2018. [pdf].49,108

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al. An Image is Worth 16x16 Words: Transform- ers for Image Recognition at Scale. CoRR, abs/2010.11929, 2020. [pdf].107,108

K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in po- sition. Biological Cybernetics, 36(4):193–202, April 1980. [pdf].2

Y. Gal and Z. Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. CoRR, abs/1506.02142, 2015. [pdf].74

X. Glorot and Y. Bengio. Understanding the dif- ficulty of training deep feedforward neural networks. In International Conference on Arti- ficial Intelligence and Statistics (AISTATS), 2010. [pdf].43,57

X. Glorot, A. Bordes, and Y. Bengio. Deep Sparse Rectifier Neural Networks. In International Conference on Artificial Intelligence and Statis- tics (AISTATS), 2011. [pdf].66

A. Gomez, M. Ren, R. Urtasun, and R. Grosse. The Reversible Residual Network: Backprop- agation Without Storing Activations. CoRR, abs/1707.04585, 2017. [pdf].42

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, et al. Generative Adversarial Networks. CoRR, abs/1406.2661, 2014. [pdf].135

K. He, X. Zhang, S. Ren, and J. Sun. Deep Resid- ual Learning for Image Recognition. CoRR, abs/1512.03385, 2015. [pdf].47,79,80,97,99

D. Hendrycks and K. Gimpel. Gaussian Error Linear Units (GELUs). CoRR, abs/1606.08415, 2016. [pdf].68

D. Hendrycks, K. Zhao, S. Basart, et al. Natural Adversarial Examples. CoRR, abs/1907.07174, 2019. [pdf].125

J. Ho, A. Jain, and P. Abbeel. Denoising Diffusion Probabilistic Models. CoRR, abs/2006.11239, 2020. [pdf].130,131,132

S. Hochreiter and J. Schmidhuber. Long Short- Term Memory. Neural Computation, 9(8):1735– 1780, 1997. [pdf].134

S. Ioffe and C. Szegedy. Batch Normalization: Ac- celerating Deep Network Training by Reduc- ing Internal Covariate Shift. In International

Conference on Machine Learning (ICML), 2015. [pdf].75

J. Kaplan, S. McCandlish, T. Henighan, et al. Scal- ing Laws for Neural Language Models. CoRR, abs/2001.08361, 2020. [pdf].47,48

D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980, 2014. [pdf].38

D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. CoRR, abs/1312.6114, 2013. [pdf].135

A. Krizhevsky, I. Sutskever, and G. Hinton. Ima- geNet Classification with Deep Convolutional Neural Networks. In Neural Information Pro- cessing Systems (NIPS), 2012. [pdf].8,95

Y. LeCun, B. Boser, J. S. Denker, et al. Back- propagation applied to handwritten zip code recognition. Neural Computation, 1(4):541– 551, 1989. [pdf].8

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324, 1998. [pdf].95,96

W. Liu, D. Anguelov, D. Erhan, et al. SSD: Single Shot MultiBox Detector. CoRR, abs/1512.02325, 2015. [pdf].114,116

J. Long, E. Shelhamer, and T. Darrell. Fully Con- volutional Networks for Semantic Segmenta- tion. CoRR, abs/1411.4038, 2014. [pdf].79,80, 120

A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rec- tifier nonlinearities improve neural network acoustic models. In proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013. [pdf].67

V. Mnih, K. Kavukcuoglu, D. Silver, et al. Human- level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. [pdf].136

A. Nichol, P. Dhariwal, A. Ramesh, et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. CoRR, abs/2112.10741, 2021. [pdf].133

A. Radford, J. Kim, C. Hallacy, et al. Learn- ing Transferable Visual Models From Natural Language Supervision. CoRR, abs/2103.00020, 2021. [pdf].124,126

A. Radford, J. Kim, T. Xu, et al. Robust Speech Recognition via Large-Scale Weak Supervi- sion. CoRR, abs/2212.04356, 2022. [pdf].122

A. Radford, K. Narasimhan, T. Salimans, and

I. Sutskever. Improving Language Understand- ing by Generative Pre-Training, 2018. [pdf]. 103,106,128

A. Radford, J. Wu, R. Child, et al. Language Models are Unsupervised Multitask Learners, 2019. [pdf].106,138

O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Im- age Segmentation. In Medical Image Comput- ing and Computer-Assisted Intervention, 2015. [pdf].79,80,120

F. Scarselli, M. Gori, A. C. Tsoi, et al. The Graph Neural Network Model. IEEE Transactions on Neural Networks (TNN), 20(1):61–80, 2009. [pdf].137

R. Sennrich, B. Haddow, and A. Birch. Neural Machine Translation of Rare Words with Sub- word Units. CoRR, abs/1508.07909, 2015. [pdf]. 33

J. Sevilla, L. Heim, A. Ho, et al. Compute Trends Across Three Eras of Machine Learning. CoRR, abs/2202.05924, 2022. [pdf].9,47,49

J. Sevilla, P. Villalobos, J. F. Cerón, et al. Param- eter, Compute and Data Trends in Machine Learning, May 2023. [web].50

K. Simonyan and A. Zisserman. Very Deep Con- volutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556, 2014. [pdf]. 95

N. Srivastava, G. Hinton, A. Krizhevsky, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Ma- chine Learning Research (JMLR), 15:1929–1958, 2014. [pdf].72

A. Vaswani, N. Shazeer, N. Parmar, et al. Atten- tion Is All You Need. CoRR, abs/1706.03762, 2017. [pdf].79,82,90,102,103,104

J. Zbontar, L. Jing, I. Misra, et al. Barlow Twins: Self-Supervised Learning via Redundancy Re- duction. CoRR, abs/2103.03230, 2021. [pdf]. 139

M. D. Zeiler and R. Fergus. Visualizing and Un- derstanding Convolutional Networks. In Eu-

ropean Conference on Computer Vision (ECCV), 2014. [pdf].64

H. Zhao, J. Shi, X. Qi, et al. Pyramid Scene Parsing Network. CoRR, abs/1612.01105, 2016. [pdf].120,121

Index