Deep Learning with Python
by François Chollet
Deep Learning with Python
Share this page
Computer Vision
Introduction to Computer Vision
Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects.
Convolutional Neural Networks (CNNs)
CNNs are a class of deep neural networks most commonly applied to analyzing visual imagery. They are inspired by the visual cortex of animals.
Key Components of CNNs
- Convolutional Layers
- Apply filters to input images
- Detect features like edges, textures, shapes
- Learn hierarchical representations
- Pooling Layers
- Reduce spatial dimensions
- Provide translation invariance
- Reduce computational complexity
- Fully Connected Layers
- Perform classification based on extracted features
- Combine high-level features for decision making
Convolution Operations
Mathematical formulation:
Where:
- is the input image
- is the kernel/filter
- denotes convolution
Common CNN Architectures
LeNet-5 (1998)
- One of the first successful CNNs
- 7 layers (conv + pool + conv + pool + fc)
- Used for handwritten digit recognition
AlexNet (2012)
- Won ImageNet competition 2012
- 8 layers (5 conv + 3 fc)
- Used ReLU activation and dropout
- Sparked deep learning revolution
VGGNet (2014)
- Simpler architecture using only 3×3 filters
- 16-19 layers
- Demonstrated network depth importance
GoogLeNet (2014)
- Introduced inception modules
- Multi-scale processing
- Efficient use of parameters
ResNet (2015)
- Introduced residual connections
- Enabled training of very deep networks (152+ layers)
- Solved vanishing gradient problem
Building a CNN for Image Classification
Data Preparation
PYTHON
1
2 from tensorflow.keras.preprocessing.image import ImageDataGenerator
3
4 # Data augmentation for training
5 train_datagen = ImageDataGenerator(
6 rescale=1./255,
7 rotation_range=20,
8 width_shift_range=0.2,
9 height_shift_range=0.2,
10 horizontal_flip=True,
11 fill_mode='nearest'
12 )
13
14 # Only rescale for validation
15 val_datagen = ImageDataGenerator(rescale=1./255)
16
17 # Load data from directories
18 train_generator = train_datagen.flow_from_directory(
19 'data/train',
20 target_size=(224, 224),
21 batch_size=32,
22 class_mode='categorical'
23 )
24
25 validation_generator = val_datagen.flow_from_directory(
26 'data/validation',
27 target_size=(224, 224),
28 batch_size=32,
29 class_mode='categorical'
30 )
Model Architecture
PYTHON
1
2 from tensorflow.keras import layers, models
3
4 model = models.Sequential([
5 # First convolutional block
6 layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
7 layers.MaxPooling2D((2, 2)),
8
9 # Second convolutional block
10 layers.Conv2D(64, (3, 3), activation='relu'),
11 layers.MaxPooling2D((2, 2)),
12
13 # Third convolutional block
14 layers.Conv2D(128, (3, 3), activation='relu'),
15 layers.MaxPooling2D((2, 2)),
16
17 # Fourth convolutional block
18 layers.Conv2D(128, (3, 3), activation='relu'),
19 layers.MaxPooling2D((2, 2)),
20
21 # Flatten and classify
22 layers.Flatten(),
23 layers.Dense(512, activation='relu'),
24 layers.Dropout(0.5),
25 layers.Dense(10, activation='softmax')
26 ])
27
28 model.compile(
29 optimizer='adam',
30 loss='categorical_crossentropy',
31 metrics=['accuracy']
32 )
Training the Model
PYTHON
1
2 history = model.fit(
3 train_generator,
4 steps_per_epoch=len(train_generator),
5 epochs=50,
6 validation_data=validation_generator,
7 validation_steps=len(validation_generator)
8 )
Transfer Learning
Transfer learning leverages pre-trained models for new tasks:
Benefits:
- Reduced training time
- Better performance with less data
- Access to state-of-the-art architectures
Popular Pre-trained Models
- VGG16/19
- Simple architecture
- Good feature extractor
- ResNet50/101/152
- Deep residual networks
- Excellent performance
- InceptionV3
- Efficient architecture
- Good for mobile deployment
- EfficientNet
- State-of-the-art efficiency
- Best accuracy/parameter ratio
Example with Transfer Learning
PYTHON
1
2 from tensorflow.keras.applications import VGG16
3 from tensorflow.keras import Model
4
5 # Load pre-trained model
6 base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
7
8 # Freeze convolutional layers
9 base_model.trainable = False
10
11 # Add custom classifier
12 x = base_model.output
13 x = layers.GlobalAveragePooling2D()(x)
14 x = layers.Dense(256, activation='relu')(x)
15 predictions = layers.Dense(10, activation='softmax')(x)
16
17 model = Model(inputs=base_model.input, outputs=predictions)
18
19 # Train only the classifier
20 model.compile(optimizer='adam', loss='categorical_crossentropy')
21 model.fit(train_generator, epochs=10)
Object Detection
Beyond classification, CNNs can detect and localize objects:
Popular Approaches:
- R-CNN family
- YOLO (You Only Look Once)
- SSD (Single Shot Detector)
- RetinaNet
Semantic Segmentation
Pixel-level classification:
- FCN (Fully Convolutional Networks)
- U-Net
- DeepLab
- Mask R-CNN
Practical Tips
- Data Augmentation: Essential for preventing overfitting
- Batch Normalization: Stabilizes training
- Learning Rate Scheduling: Improves convergence
- Early Stopping: Prevents overfitting
- Model Checkpoints: Save best models
Common Challenges
- Small Dataset: Use transfer learning or data augmentation
- Class Imbalance: Use class weights or focal loss
- Overfitting: Add regularization, get more data
- Computational Cost: Use smaller models or transfer learning
Applications
- Autonomous vehicles
- Medical imaging
- Face recognition
- Agricultural monitoring
- Quality control in manufacturing
- Augmented reality