object detection using TensorFlow and Python

4 min readDec 30, 2020

CNN- Convolutional Neural Network

CNN is a Deep Learning algorithm that takes an image as an input and weights and bias are assigned to various aspects in the image and differentiation is done from others.

The architecture is as same as the connectivity pattern of the neurons in the Human Brain.

kernel

In a convolutional neural network, the kernel is nothing but a filter that is used to extract the features from the images. The kernel is a matrix that moves over the input data, performs the dot product with the sub-region of input data, and gets the output as the matrix of dot products. Kernel moves on the input data by the stride value. If the stride value is 2, then the kernel moves by 2 columns of pixels in the input matrix. In short, the kernel is used to extract high-level features like edges from the image.

Conv2D

Conv2D is a 2D Convolution Layer, this layer creates a convolution kernel that is a wind with layers input which helps produce a tensor of outputs.

Max pooling

Max pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned.

Activation functions

Activation functions are mathematical equations that determine the output of the neural network. It is connected to each neuron in the network and decides whether fire or not.

We will be using ReLU(rectified linear unit). Mathematically, it is defined as

y = max(0, x)

Visually:

Optimizer

They have both the loss function and model parameter by updating the weights and bias the loss is minimized. We will use adam optimizer.

Dataset

The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

imports

import tensorflow as tf

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

Loading the dataset

(train_images, train_labels), (test_images, test_labels) =                datasets.cifar10.load_data()  test_images = train_images / 255.0, test_images / 255.0

We are loading the dataset and splitting them into train_images, train_labels, and test_images, test_labels. We divide it by 255 so that we can normalize the data pixel values to be between 0 and 1.

Then we initialize a list containing the class names.

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

Plotting the images

plt.figure(figsize=(10,10))
for i in range(25):
  plt.subplot(5,5,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(train_images[i],cmap=plt.cm.binary)
  plt.xlabel(class_names[train_labels[i][0]])

Model

model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(32,32,3)))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation="relu"))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation="relu"))
model.add(layers.Flatten())
model.add(layers.Dense(64,activation='relu'))
model.add(layers.Dense(10))

As input, CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R, G, B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.

Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper into the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten the 3D output to 1D, then add one or more Dense layers on top

Compile and Train

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])history = model.fit(train_images,train_labels,epochs = 10,  validation_data=(test_images, test_labels))

Plotting the accuracy and Loss

plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1.5])
plt.legend(loc='lower right')

plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label = 'val_loss')
plt.xlabel('Epoch')
plt.ylabel('loss')
plt.ylim([0, 1.5])
plt.legend(loc='lower right')