Deep Learning with Python

by François Chollet

Deep Learning with Python

Text and Sequences

Working with Text Data

Text data presents unique challenges for machine learning:

  • Variable length sequences
  • Discrete tokens (words/characters)
  • Semantic meaning and context
  • Temporal dependencies

Text Preprocessing

Tokenization

Breaking text into smaller units:

PYTHON

1
2    from tensorflow.keras.preprocessing.text import Tokenizer
3    
4    # Word-level tokenization
5    tokenizer = Tokenizer(num_words=10000)
6    tokenizer.fit_on_texts(texts)
7    sequences = tokenizer.texts_to_sequences(texts)
8    
9    # Character-level tokenization
10    char_tokenizer = Tokenizer(char_level=True)
11    char_tokenizer.fit_on_texts(texts)
12    char_sequences = char_tokenizer.texts_to_sequences(texts)
    

Padding Sequences

Making all sequences the same length:

PYTHON

1
2    from tensorflow.keras.preprocessing.sequence import pad_sequences
3    
4    # Pad sequences to maximum length
5    padded_sequences = pad_sequences(sequences, maxlen=100)
6    
7    # Pad with post-padding
8    padded_sequences = pad_sequences(sequences, maxlen=100, padding='post')
9    
10    # Truncate sequences
11    padded_sequences = pad_sequences(sequences, maxlen=100, truncating='post')
    

Word Embeddings

Word embeddings are dense vector representations of words:

Benefits:

  • Capture semantic relationships
  • Reduce dimensionality
  • Enable better generalization

Types of Embeddings

  1. Learned Embeddings: Trained with the model
  2. Pre-trained Embeddings: GloVe, Word2Vec, fastText
  3. Contextual Embeddings: BERT, GPT

Implementing Embeddings in Keras

PYTHON

1
2    from tensorflow.keras import layers
3    
4    # Embedding layer
5    embedding_layer = layers.Embedding(
6        input_dim=vocab_size,      # Size of vocabulary
7        output_dim=embedding_dim,  # Dimension of embedding
8        input_length=max_length     # Length of input sequences
9    )
10    
11    # In a model
12    model = keras.Sequential([
13        embedding_layer,
14        layers.Flatten(),
15        layers.Dense(1, activation='sigmoid')
16    ])
    

Using Pre-trained Embeddings

PYTHON

1
2    import numpy as np
3    
4    # Load GloVe embeddings
5    embeddings_index = {}
6    with open('glove.6B.100d.txt', encoding='utf8') as f:
7        for line in f:
8            values = line.split()
9            word = values[0]
10            coefs = np.asarray(values[1:], dtype='float32')
11            embeddings_index[word] = coefs
12    
13    # Create embedding matrix
14    embedding_matrix = np.zeros((vocab_size, embedding_dim))
15    for word, i in tokenizer.word_index.items():
16        if i < vocab_size:
17            embedding_vector = embeddings_index.get(word)
18            if embedding_vector is not None:
19                embedding_matrix[i] = embedding_vector
20    
21    # Use in embedding layer
22    embedding_layer = layers.Embedding(
23        vocab_size,
24        embedding_dim,
25        weights=[embedding_matrix],
26        input_length=max_length,
27        trainable=False  # Freeze embeddings
28    )
    

Recurrent Neural Networks (RNNs)

RNNs are designed to process sequential data by maintaining an internal state or memory.

Simple RNN

PYTHON

1
2    from tensorflow.keras.layers import SimpleRNN
3    
4    model = keras.Sequential([
5        layers.Embedding(vocab_size, 128),
6        layers.SimpleRNN(64),
7        layers.Dense(1, activation='sigmoid')
8    ])
    

Long Short-Term Memory (LSTM)

LSTMs address the vanishing gradient problem in RNNs:

PYTHON

1
2    from tensorflow.keras.layers import LSTM
3    
4    model = keras.Sequential([
5        layers.Embedding(vocab_size, 128),
6        layers.LSTM(64),
7        layers.Dense(1, activation='sigmoid')
8    ])
9    
10    # Stacked LSTM
11    model = keras.Sequential([
12        layers.Embedding(vocab_size, 128),
13        layers.LSTM(64, return_sequences=True),
14        layers.LSTM(32),
15        layers.Dense(1, activation='sigmoid')
16    ])
    

Gated Recurrent Unit (GRU)

Simpler variant of LSTM with comparable performance:

PYTHON

1
2    from tensorflow.keras.layers import GRU
3    
4    model = keras.Sequential([
5        layers.Embedding(vocab_size, 128),
6        layers.GRU(64),
7        layers.Dense(1, activation='sigmoid')
8    ])
    

Bidirectional RNNs

Process sequences in both directions:

PYTHON

1
2    from tensorflow.keras.layers import Bidirectional, LSTM
3    
4    model = keras.Sequential([
5        layers.Embedding(vocab_size, 128),
6        Bidirectional(LSTM(64)),
7        layers.Dense(1, activation='sigmoid')
8    ])
    

Attention Mechanisms

Attention allows models to focus on relevant parts of the input:

PYTHON

1
2    from tensorflow.keras.layers import Attention
3    
4    # Query, Value, Key attention
5    query = layers.Dense(64)(inputs)
6    value = layers.Dense(64)(inputs)
7    key = layers.Dense(64)(inputs)
8    
9    attention_output = Attention()([query, value, key])
    

Transformers

Transformer architecture revolutionized NLP:

Key Components:

  • Multi-head self-attention
  • Positional encoding
  • Feed-forward networks
  • Layer normalization

Using Pre-trained Transformers

PYTHON

1
2    from transformers import TFAutoModel, TFAutoTokenizer
3    
4    # Load pre-trained model
5    model_name = 'bert-base-uncased'
6    tokenizer = TFAutoTokenizer.from_pretrained(model_name)
7    model = TFAutoModel.from_pretrained(model_name)
8    
9    # Tokenize text
10    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='tf')
11    
12    # Get embeddings
13    outputs = model(inputs)
    

Text Classification

Sentiment Analysis Example

PYTHON

1
2    from tensorflow.keras import layers, models
3    
4    model = models.Sequential([
5        layers.Embedding(vocab_size, 128),
6        layers.Bidirectional(LSTM(64)),
7        layers.Dense(32, activation='relu'),
8        layers.Dropout(0.5),
9        layers.Dense(1, activation='sigmoid')
10    ])
11    
12    model.compile(
13        optimizer='adam',
14        loss='binary_crossentropy',
15        metrics=['accuracy']
16    )
    

Text Generation

Character-level text generation:

PYTHON

1
2    model = models.Sequential([
3        layers.Embedding(vocab_size, 256, input_length=seq_length),
4        layers.LSTM(128, return_sequences=True),
5        layers.LSTM(128),
6        layers.Dense(vocab_size, activation='softmax')
7    ])
8    
9    # Generate text
10    def generate_text(seed_text, num_chars):
11        for _ in range(num_chars):
12            # Convert to sequences
13            sequences = tokenizer.texts_to_sequences([seed_text])
14            padded = pad_sequences(sequences, maxlen=seq_length)
15            
16            # Predict next character
17            preds = model.predict(padded, verbose=0)
18            pred_index = np.argmax(preds[0])
19            
20            # Convert back to character
21            next_char = index_to_char[pred_index]
22            seed_text += next_char
23            
24        return seed_text
    

Machine Translation

Sequence-to-sequence models:

PYTHON

1
2    # Encoder
3    encoder_inputs = layers.Input(shape=(None,))
4    encoder_embedding = layers.Embedding(vocab_size, 256)(encoder_inputs)
5    encoder_lstm = layers.LSTM(512, return_state=True)
6    _, state_h, state_c = encoder_lstm(encoder_embedding)
7    encoder_states = [state_h, state_c]
8    
9    # Decoder
10    decoder_inputs = layers.Input(shape=(None,))
11    decoder_embedding = layers.Embedding(vocab_size, 256)(decoder_inputs)
12    decoder_lstm = layers.LSTM(512, return_sequences=True, return_state=True)
13    decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
14    decoder_dense = layers.Dense(vocab_size, activation='softmax')
15    decoder_outputs = decoder_dense(decoder_outputs)
16    
17    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
    

Practical Tips

  1. Use Pre-trained Models: Better performance with less data
  2. Attention Mechanisms: Improve performance on long sequences
  3. Layer Normalization: Stabilize training
  4. Gradient Clipping: Prevent exploding gradients
  5. Teacher Forcing: Improve sequence generation

Common Challenges

  1. Vanishing Gradients: Use LSTM/GRU or residual connections
  2. Overfitting: Use dropout, regularization
  3. Long Sequences: Use attention or transformers
  4. Computational Cost: Use smaller models or truncation

Applications

  • Machine translation
  • Chatbots and dialogue systems
  • Text summarization
  • Sentiment analysis
  • Question answering
  • Document classification
  • Speech recognition