Deep Learning with Python

by François Chollet

Deep Learning with Python

Computer Vision

Introduction to Computer Vision

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects.

Convolutional Neural Networks (CNNs)

CNNs are a class of deep neural networks most commonly applied to analyzing visual imagery. They are inspired by the visual cortex of animals.

Key Components of CNNs

  1. Convolutional Layers
    • Apply filters to input images
    • Detect features like edges, textures, shapes
    • Learn hierarchical representations
  2. Pooling Layers
    • Reduce spatial dimensions
    • Provide translation invariance
    • Reduce computational complexity
  3. Fully Connected Layers
    • Perform classification based on extracted features
    • Combine high-level features for decision making

Convolution Operations

Mathematical formulation:

Where:

  • is the input image
  • is the kernel/filter
  • denotes convolution

Common CNN Architectures

LeNet-5 (1998)

  • One of the first successful CNNs
  • 7 layers (conv + pool + conv + pool + fc)
  • Used for handwritten digit recognition

AlexNet (2012)

  • Won ImageNet competition 2012
  • 8 layers (5 conv + 3 fc)
  • Used ReLU activation and dropout
  • Sparked deep learning revolution

VGGNet (2014)

  • Simpler architecture using only 3×3 filters
  • 16-19 layers
  • Demonstrated network depth importance

GoogLeNet (2014)

  • Introduced inception modules
  • Multi-scale processing
  • Efficient use of parameters

ResNet (2015)

  • Introduced residual connections
  • Enabled training of very deep networks (152+ layers)
  • Solved vanishing gradient problem

Building a CNN for Image Classification

Data Preparation

PYTHON

1
2    from tensorflow.keras.preprocessing.image import ImageDataGenerator
3    
4    # Data augmentation for training
5    train_datagen = ImageDataGenerator(
6        rescale=1./255,
7        rotation_range=20,
8        width_shift_range=0.2,
9        height_shift_range=0.2,
10        horizontal_flip=True,
11        fill_mode='nearest'
12    )
13    
14    # Only rescale for validation
15    val_datagen = ImageDataGenerator(rescale=1./255)
16    
17    # Load data from directories
18    train_generator = train_datagen.flow_from_directory(
19        'data/train',
20        target_size=(224, 224),
21        batch_size=32,
22        class_mode='categorical'
23    )
24    
25    validation_generator = val_datagen.flow_from_directory(
26        'data/validation',
27        target_size=(224, 224),
28        batch_size=32,
29        class_mode='categorical'
30    )
    

Model Architecture

PYTHON

1
2    from tensorflow.keras import layers, models
3    
4    model = models.Sequential([
5        # First convolutional block
6        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
7        layers.MaxPooling2D((2, 2)),
8        
9        # Second convolutional block
10        layers.Conv2D(64, (3, 3), activation='relu'),
11        layers.MaxPooling2D((2, 2)),
12        
13        # Third convolutional block
14        layers.Conv2D(128, (3, 3), activation='relu'),
15        layers.MaxPooling2D((2, 2)),
16        
17        # Fourth convolutional block
18        layers.Conv2D(128, (3, 3), activation='relu'),
19        layers.MaxPooling2D((2, 2)),
20        
21        # Flatten and classify
22        layers.Flatten(),
23        layers.Dense(512, activation='relu'),
24        layers.Dropout(0.5),
25        layers.Dense(10, activation='softmax')
26    ])
27    
28    model.compile(
29        optimizer='adam',
30        loss='categorical_crossentropy',
31        metrics=['accuracy']
32    )
    

Training the Model

PYTHON

1
2    history = model.fit(
3        train_generator,
4        steps_per_epoch=len(train_generator),
5        epochs=50,
6        validation_data=validation_generator,
7        validation_steps=len(validation_generator)
8    )
    

Transfer Learning

Transfer learning leverages pre-trained models for new tasks:

Benefits:

  • Reduced training time
  • Better performance with less data
  • Access to state-of-the-art architectures
  1. VGG16/19
    • Simple architecture
    • Good feature extractor
  2. ResNet50/101/152
    • Deep residual networks
    • Excellent performance
  3. InceptionV3
    • Efficient architecture
    • Good for mobile deployment
  4. EfficientNet
    • State-of-the-art efficiency
    • Best accuracy/parameter ratio

Example with Transfer Learning

PYTHON

1
2    from tensorflow.keras.applications import VGG16
3    from tensorflow.keras import Model
4    
5    # Load pre-trained model
6    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
7    
8    # Freeze convolutional layers
9    base_model.trainable = False
10    
11    # Add custom classifier
12    x = base_model.output
13    x = layers.GlobalAveragePooling2D()(x)
14    x = layers.Dense(256, activation='relu')(x)
15    predictions = layers.Dense(10, activation='softmax')(x)
16    
17    model = Model(inputs=base_model.input, outputs=predictions)
18    
19    # Train only the classifier
20    model.compile(optimizer='adam', loss='categorical_crossentropy')
21    model.fit(train_generator, epochs=10)
    

Object Detection

Beyond classification, CNNs can detect and localize objects:

Popular Approaches:

  • R-CNN family
  • YOLO (You Only Look Once)
  • SSD (Single Shot Detector)
  • RetinaNet

Semantic Segmentation

Pixel-level classification:

  • FCN (Fully Convolutional Networks)
  • U-Net
  • DeepLab
  • Mask R-CNN

Practical Tips

  1. Data Augmentation: Essential for preventing overfitting
  2. Batch Normalization: Stabilizes training
  3. Learning Rate Scheduling: Improves convergence
  4. Early Stopping: Prevents overfitting
  5. Model Checkpoints: Save best models

Common Challenges

  1. Small Dataset: Use transfer learning or data augmentation
  2. Class Imbalance: Use class weights or focal loss
  3. Overfitting: Add regularization, get more data
  4. Computational Cost: Use smaller models or transfer learning

Applications

  • Autonomous vehicles
  • Medical imaging
  • Face recognition
  • Agricultural monitoring
  • Quality control in manufacturing
  • Augmented reality