Foreword

The current period of progress in artificial intelligence was triggered when Krizhevsky et al. [2012] showed that an artificial neural network with a simple structure, which had been known for more than twenty years [LeCun et al., 1989], could beat complex state-of-the-art image recognition methods by a huge margin, simply by being a hundred times larger, and trained on a data set similarly scaled up.

This breakthrough was made possible thanks to Graphical Processing Units (GPUs), massmarket highly parallel computing devices developed for real-time image synthesis and repurposed for artificial neural networks.

Since then, under the umbrella term of “deep learning,” innovations in the structures of these networks, the strategies to train them, and dedicated hardware have allowed for an exponential increase in both their size and the quantity of training data they take advantage of [Sevilla et al., 2022]. This has resulted in a wave of successful applications across technical domains, from computer vision and robotics, to speech, and natural language processing.

Although the bulk of deep learning is not particularly difficult to understand, it combines diverse components, which makes it complicated to learn. It involves multiple branches of mathematics such as calculus, probabilities, optimization, linear algebra, and signal processing, and it is also deeply anchored in computer science, programming, algorithmic, and high-performance computing.

Instead of trying to be exhaustive, this little book is limited to the background and tools necessary to understand a few important models.

If you did not get this book from its official URL https://fleuret.org/public/lbdl.pdf please do so, so that I can estimate the number of readers.

François Fleuret, May 21, 2023