Deep Learning with TensorFlow 2 and Keras, Second Edition is a concise yet thorough introduction to modern neural networks, artificial intelligence, and deep learning technologies designed especially for software engineers and data scientists. The book is the natural follow-up of the books Deep Learning with Keras [1] and TensorFlow 1.x Deep Learning Cookbook [2] previously written by the same authors.


This book provides a very detailed panorama of the evolution of learning technologies during the past six years. The book presents dozens of working deep neural networks coded in Python using TensorFlow 2.0, a modular network library based on Keras-like [1] APIs.

You are introduced step-by-step to supervised learning algorithms such as simple linear regression, classical multilayer perceptrons, and more sophisticated deep convolutional networks and generative adversarial networks. In addition, the book covers unsupervised learning algorithms such as autoencoders and generative networks. Recurrent networks and Long Short-Term Memory (LSTM) networks are also explained in detail. The book also includes a comprehensive introduction to deep reinforcement learning and it covers deep learning accelerators (GPUs and TPUs), cloud development, and multi-environment deployment on your desktop, on the cloud, on mobile/IoT devices, and on your browser.

Practical applications include code for text classification into predefined categories, syntactic analysis, sentiment analysis, synthetic generation of text, and parts-of-speech tagging. Image processing is also explored, with recognition of handwritten digit images, classification of images into different categories, and advanced object recognition with related image annotations.

Sound analysis comprises the recognition of discrete speech from multiple speakers. Generation of images using Autoencoders and GANs is also covered. Reinforcement learning is used to build a deep Q-learning network capable of learning autonomously. Experiments are the essence of the book. Each net is augmented by multiple variants that progressively improve the learning performance by changing the input parameters, the shape of the network, loss functions, and algorithms used for optimizations. Several comparisons between training on CPUs, GPUs and TPUs are also provided. The book introduces you to the new field of AutoML where deep learning models are used to learn how to efficiently and automatically learn how to build deep learning models. One advanced chapter is devoted to the mathematical foundation behind machine learning.

Machine learning, artificial intelligence, and the deep learning Cambrian explosion

Artificial intelligence (AI) lays the ground for everything this book discusses. Machine learning (ML) is a branch of AI, and Deep learning (DL) is in turn a subset within ML. This section will briefly discuss these three concepts, which you will regularly encounter throughout the rest of this book.

AI denotes any activity where machines mimic intelligent behaviors typically shown by humans. More formally, it is a research field in which machines aim to replicate cognitive capabilities such as learning behaviors, proactive interaction with the environment, inference and deduction, computer vision, speech recognition, problem solving, knowledge representation, and perception. AI builds on elements of computer science, mathematics, and statistics, as well as psychology and other sciences studying human behaviors. There are multiple strategies for building AI. During the 1970s and 1980s, 'expert' systems became extremely popular. The goal of these systems was to solve complex problems by representing the knowledge with a large number of manually defined if–then rules. This approach worked for small problems on very specific domains, but it was not able to scale up for larger problems and multiple domains. Later, AI focused more and more on methods based on statistical methods that are part of ML.

ML is a subdiscipline of AI that focuses on teaching computers how to learn without the need to be programmed for specific tasks. The key idea behind ML is that it is possible to create algorithms that learn from, and make predictions on, data. There are three different broad categories of ML:

  • Supervised learning, in which the machine is presented with input data and a desired output, and the goal is to learn from those training examples in such a way that meaningful predictions can be made for data that the machine has never observed before.
  • Unsupervised learning, in which the machine is presented with input data only, and the machine has to subsequently find some meaningful structure by itself, with no external supervision or input.
  • Reinforcement learning, in which the machine acts as an agent, interacting with the environment. The machine is provided with "rewards" for behaving in a desired manner, and "penalties" for behaving in an undesired manner. The machine attempts to maximize rewards by learning to develop its behavior accordingly.

DL took the world by storm in 2012. During that year, the ImageNet 2012 challenge [3] was launched with the goal of predicting the content of photographs using a subset of a large hand-labeled dataset. A deep learning model named AlexNet [4] achieved a top-5 error rate of 15.3%, a significant improvement with respect to previous state-of-the-art results. According to the Economist [5], "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." Since 2012, we have seen constant progress [5] (see Figure 1) with several models classifying ImageNet photography, with an error rate of less than 2%; better than the estimated human error rate at 5.1%:

Figure 1: Top 5 accuracy achieved by different deep learning models on ImageNet 2012

That was only the beginning. Today, DL techniques are successfully applied in heterogeneous domains including, but not limited to: healthcare, environment, green energy, computer vision, text analysis, multimedia, finance, retail, gaming, simulation, industry, robotics, and self-driving cars. In each of these domains, DL techniques can solve problems with a level of accuracy that was not possible using previous methods.

It is worth noting that interest in DL is also increasing. According to the State of Deep Learning H2 2018 Review [9] "Every 20 minutes, a new ML paper is born. The growth rate of machine learning papers has been around 3.5% a month [..] around a 50% growth rate annually." During the past three years, it seems like we are living during a Cambrian explosion for DL, with the number of articles on our arXiv growing faster than Moore's Law (see Figure 2). Still, according to the review this "gives you a sense that people believe that this is where the future value in computing is going to come from":

Figure 2: ML papers on arXiv appears to be growing faster than Moore's Law (source:

arXiv is a repository of electronic preprints approved for posting after moderation, but not full peer review.

The complexity of deep learning models is also increasing. ResNet-50 is an image recognition model (see chapters 4 and 5), with about 26 million parameters. Every single parameter is a weight used to fine-tune the model. Transformers, gpt-1, bert, and gpt-2 [7] are natural language processing (see Chapter 8, Recurrent Neural Networks) models able to perform a variety of tasks on text. These models progressively grew from 340 million to 1.5 billion parameters. Recently, Nvidia claimed that it has been able to train the largest-known model, with 8.3 billion parameters, in just 53 minutes. This training allowed Nvidia to build one of the most powerful models to process textual information (

Figure 3: Growth in number of parameters for various deep learning models

Besides that, computational capacity is significantly increasing. GPUs and TPUs (Chapter 16, Tensor Processing Unit) are deep learning accelerators that have made it possible to train large models in a very short amount of time. TPU3s, announced on May 2018, are about twice as powerful (360 teraflops) as the TPU2s announced on May 2017. A full TPU3 pod can deliver more than 100 petaflops of machine learning performance, while TPU2 pods can get to 11.5 teraflops of performance.

An improvement of 10x per pod (see Figure 4) was achieved in one year only, which allows faster training:

Figure 4: TPU accelerators performance in petaflops

However, DL's growth is not only in terms of better accuracy, more research papers, larger models, and faster accelerators. There are additional trends that have been observed over the last four years.

First, the availability of flexible programming frameworks such as Keras [1], TensorFlow [2], PyTorch[8], and; these frameworks have proliferated within the ML and DL community and have provided some very impressive results, as we'll see throughout this book. According to the Kaggle State of the Machine Learning and Data Science Survey 2019, based on responses from 19,717 Kaggle ( members, Keras and TensorFlow are clearly the most popular choices (see Figure 5). TensorFlow 2.0 is the framework covered in this book. This framework aims to take the best of both worlds from the great features found in Keras and TensorFlow 1.x:

Figure 5: Adoption of deep learning frameworks

Second, the increasing possibility of using managed services in the cloud (see Chapter 12, TensorFlow and Cloud) with accelerators (Chapter 16, Tensor Processing Unit). This allows data scientists to focus on ML problems with no need to manage the infrastructural overhead.

Third, the increasing capacity of deploying models in more heterogeneous systems: mobile devices, Internet of Things (IoT) devices, and even the browsers normally used in your desktop and laptop (see Chapter 13, TensorFlow for Mobile and IoT and TensorFlow.js).

Fourth, the increased understanding of how to use more and more sophisticated DL architectures such as Dense Networks (Chapter 1, Neural Network Foundations with TensorFlow 2.0), Convolutional Network (Chapter 4, Convolutional Neural Networks, and Chapter 5, Advanced Convolutional Neural Networks), Generative Adversarial Networks (Chapter 6, Generative Adversarial Networks), Word Embeddings (Chapter 7, Word Embeddings), Recurrent Network (Chapter 8, Recurrent Neural Networks), Autoencoders (Chapter 9, Autoencoders), and advanced techniques such as Reinforcement Learning (Chapter 11, Reinforcement Learning).

Fifth, the advent of new AutoML techniques (Chapter 14, An Introduction to AutoML) that can enable domain experts who are unfamiliar with ML technologies to use ML techniques easily and effectively. AutoML made it possible to reduce the burden of finding the right model for specific application domains, spending time on fine-tuning the models, and spending time in identifying – given an application problem – the right set of features to use as input to ML models.

The above five trends culminated in 2019 when Yoshua Bengio, Geoffrey Hinton, and Yann LeCun – three of the fathers of Deep Learning – won the Turing Award "for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing." The ACM A.M. Turing Award is an annual prize given to an individual selected for contributions "of lasting and major technical importance to the computer field." Quotes taken from the ACM website ( Many are considering this award to be the Nobel of computer science.

Looking back at the previous eight years, it is fascinating and exciting to see the extent of the contributions that DL has made to science and industry. There is no reason to believe that the next eight years will see any less contribution; indeed, as the field of DL continues to advance, we anticipate that we'll see even more exciting and fascinating contributions provided by DL.

The intent of this book is to cover all the above five trends, and to introduce you to the magic of deep learning. We will start with simple models and progressively will introduce increasingly sophisticated models. The approach will always be hands-on, with an healthy dose of code to work with.

Who this book is for

If you are a data scientist with experience in ML or an AI programmer with some exposure to neural networks, you will find this book a useful entry point to DL with TensorFlow 2.0. If you are a software engineer with a growing interest about the DL tsunami, you will find this book a foundational platform to broaden your knowledge on the topic. A basic knowledge of Python is required for this book.

What this book covers

The intent of this book is to discuss the TensorFlow 2.0 features and libraries, to present an overview of Supervised and Unsupervised Machine learning models, and to provide a comprehensive analysis of Deep Learning and Machine Learning models. Practical usage examples for Cloud, Mobile, and large production environments are provided throughout.

Chapter 1, Neural Network Foundations with TensorFlow 2.0, this chapter will provide a step-by-step introduction to neural networks. You will learn how to use tf.keras layers in TensorFlow 2 to build simple neural network models. Perceptron, Multi-layer Perceptrons, Activation functions, and Dense Networks will be discussed. Finally, the chapter provides an intuitive introduction to backpropagation.

Chapter 2, TensorFlow 1.x and 2.x, this chapter will compare TensorFlow 1.x and TensorFlow 2.0 programming models. You will learn how to use TensorFlow 1.x lower-level computational graph APIs, and how to use tf.keras higher-level APIs. New functionalities such as eager computation, Autograph, tf.Datasets, and distributed training will be covered. Brief comparisons between tf.keras with Estimators and between tf.keras and Keras will be provided.

Chapter 3, Regression, this chapter will focus on the most popular ML technique: regression. You will learn how to use TensorFlow 2.0 estimators to build simple and multiple regression models. You will learn to use logistic regression to solve a multi-class classification problem.

Chapter 4, Convolutional Neural Networks, this chapter will introduce Convolutional Neural Networks (CNNs) and their applications to image processing. You will learn how to use TensorFlow 2.0 to build simple CNNs to recognize handwritten characters in the MNIST dataset, and how to classify CIFAR images. Finally, you will understand how to use pretrained networks such as VGG16 and Inception.

Chapter 5, Advanced Convolutional Neural Networks, this chapter discusses advanced applications of CNNs to image, video, audio, and text processing. Examples of image processing (Transfer Learning, DeepDream), audio processing (WaveNet), and text processing (Sentiment Analysis, Q&A) will be discussed in detail.

Chapter 6, Generative Adversarial Networks, this chapter will focus on the recently discovered Generative Adversarial Networks (GANs). We will start with the first proposed GAN model and use it to forge MNIST characters. The chapter will use deep convolutional GANs to create celebrity images. The chapter discusses the various GAN architectures like SRGAN, InfoGAN, and CycleGAN. The chapter covers a range of cool GAN applications. Finally, the chapter concludes with a TensorFlow 2.0 implementation of CycleGAN to convert winter-summer images.

Chapter 7, Word Embeddings, this chapter will describe what word embeddings are, with specific reference to two traditional popular embeddings: Word2vec and GloVe. It will cover the core ideas behind these two embeddings and how to generate them from your own corpus, as well as how to use them in your own networks for Natural Language Processing (NLP) applications. The chapter will then cover various extensions to the basic embedding approach, such as using character trigrams instead of words (fastText), retaining word context by replacing static embeddings with a neural network (ELMO, Google Universal Sentence Encoder), sentence embeddings (InferSent, SkipThoughts), and using pretrained language models for embeddings (ULMFit, BERT).

Chapter 8, Recurrent Neural Networks, this chapter describes the basic architecture of Recurrent Neural Networks (RNNs), and how it is well suited for sequence learning tasks such as those found in NLP. It will cover various types of RNN, LSTM, Gated Recurrent Unit (GRU), Peephole LSTM, and bidirectional LSTM. It will go into more depth as to how an RNN can be used as a language model. It will then cover the seq2seq model, a type of RNN-based encoder-decoder architecture originally used in machine translation. It will then cover Attention mechanisms as a way of enhancing the performance of seq2seq architectures, and finally will cover the Transformer architecture (BERT, GPT-2), which is based on the Attention is all you need paper.

Chapter 9, Autoencoders, this chapter will describe autoencoders, a class of neural networks that attempt to recreate the input as its target. It will cover different varieties of autoencoders like sparse autoencoders, convolutional autoencoders, and denoising autoencoders. The chapter will train a denoising autoencoder to remove noise from input images. It will demonstrate how autoencoders can be used to create MNIST digits. Finally, it will also cover the steps involved in building an LSTM autoencoder to generate sentence vectors.

Chapter 10, Unsupervised Learning, the chapter delves into the unsupervised learning models. It will cover techniques required for clustering and dimensionality reduction like PCA, k-means, and self-organized maps. It will go into the details of Boltzmann Machines and their implementation using TensorFlow. The concepts covered will be extended to build Restricted Boltzmann Machines (RBMs).

Chapter 11, Reinforcement Learning, this chapter will focus upon reinforcement learning. It will start with the Q-learning algorithm. Starting with the Bellman Ford equation, the chapter will cover concepts like discounted rewards, exploration and exploitation, and discount factors. It will explain policy-based and model-based reinforcement learning. Finally, a Deep Q-learning Network (DQN) will be built to play an Atari game.

Chapter 12, TensorFlow and Cloud, this chapter discusses the cloud environment and how to utilize it for training and deploying your model. It will cover the steps needed to set up Amazon Web Services (AWS) for DL. The steps needed to set up Google Cloud Platform for DL applications will also be covered. It will also cover how to set up Microsoft Azure for DL applications. The chapter will include various cloud services that allow you to run the Jupyter Notebook directly on the cloud. Finally, the chapter will conclude with an introduction to TensorFlow Extended.

Chapter 13, TensorFlow for Mobile and IoT and TensorFlow.js, this chapter focuses on developing deep learning based applications for the web, mobile devices and IoT. The chapter discusses TensorFlow Lite and explores how it can be used to deploy models on Android devices. The chapter also discusses in detail Federated learning for distributed learning across thousands of mobile devices. The chapter finally introduces TensorFlow.js and how it can be used with vanilla JavaScript or Node.js to develop Web applications.

Chapter 14, An Introduction to AutoML, this chapter introduces you to the exciting field of AutoML. It talks about automatic data preparation, automatic feature engineering, and automatic model generation. The chapter also introduces AutoKeras and Google Cloud Platform AutoML with its multiple solutions for Table, Vision, Text, Translation, and for Video processing.

Chapter 15, The Math behind Deep Learning, this chapter, as the title implies, discusses the math behind deep learning. In the chapter, we'll get "under the hood" and see what's going on when we perform deep learning. The chapter begins with a brief history regarding the origins of deep learning programming and backpropagation. Next, it introduces some mathematical tools and derivations, which help us in understanding the concepts to be covered. The remainder of the chapter details backpropagation and some of its applications within CNNs and RNNs.

Chapter 16, Tensor Processing Unit, this chapter introduces the Tensor Processing Unit (TPU), a special chip developed at Google for ultra-fast execution of neural network mathematical operations. In this chapter we are going to compare CPUs and GPUs with the three generations of TPUs and with Edge TPUs. The chapter will include code examples of using TPUs.

What you need for this book

To be able to smoothly follow through the chapters, you will need the following pieces of software:

  • TensorFlow 2.0 or higher
  • Matplotlib 3.0 or higher
  • Scikit-learn 0.18.1 or higher
  • NumPy 1.15 or higher

The hardware specifications are as follows:

  • Either 32-bit or 64-bit architecture
  • 2+ GHz CPU
  • 4 GB RAM
  • At least 10 GB of hard disk space available

