Deep Learning with Python
by François Chollet
Deep Learning with Python
Share this page
Text and Sequences
Working with Text Data
Text data presents unique challenges for machine learning:
- Variable length sequences
- Discrete tokens (words/characters)
- Semantic meaning and context
- Temporal dependencies
Text Preprocessing
Tokenization
Breaking text into smaller units:
PYTHON
1
2 from tensorflow.keras.preprocessing.text import Tokenizer
3
4 # Word-level tokenization
5 tokenizer = Tokenizer(num_words=10000)
6 tokenizer.fit_on_texts(texts)
7 sequences = tokenizer.texts_to_sequences(texts)
8
9 # Character-level tokenization
10 char_tokenizer = Tokenizer(char_level=True)
11 char_tokenizer.fit_on_texts(texts)
12 char_sequences = char_tokenizer.texts_to_sequences(texts)
Padding Sequences
Making all sequences the same length:
PYTHON
1
2 from tensorflow.keras.preprocessing.sequence import pad_sequences
3
4 # Pad sequences to maximum length
5 padded_sequences = pad_sequences(sequences, maxlen=100)
6
7 # Pad with post-padding
8 padded_sequences = pad_sequences(sequences, maxlen=100, padding='post')
9
10 # Truncate sequences
11 padded_sequences = pad_sequences(sequences, maxlen=100, truncating='post')
Word Embeddings
Word embeddings are dense vector representations of words:
Benefits:
- Capture semantic relationships
- Reduce dimensionality
- Enable better generalization
Types of Embeddings
- Learned Embeddings: Trained with the model
- Pre-trained Embeddings: GloVe, Word2Vec, fastText
- Contextual Embeddings: BERT, GPT
Implementing Embeddings in Keras
PYTHON
1
2 from tensorflow.keras import layers
3
4 # Embedding layer
5 embedding_layer = layers.Embedding(
6 input_dim=vocab_size, # Size of vocabulary
7 output_dim=embedding_dim, # Dimension of embedding
8 input_length=max_length # Length of input sequences
9 )
10
11 # In a model
12 model = keras.Sequential([
13 embedding_layer,
14 layers.Flatten(),
15 layers.Dense(1, activation='sigmoid')
16 ])
Using Pre-trained Embeddings
PYTHON
1
2 import numpy as np
3
4 # Load GloVe embeddings
5 embeddings_index = {}
6 with open('glove.6B.100d.txt', encoding='utf8') as f:
7 for line in f:
8 values = line.split()
9 word = values[0]
10 coefs = np.asarray(values[1:], dtype='float32')
11 embeddings_index[word] = coefs
12
13 # Create embedding matrix
14 embedding_matrix = np.zeros((vocab_size, embedding_dim))
15 for word, i in tokenizer.word_index.items():
16 if i < vocab_size:
17 embedding_vector = embeddings_index.get(word)
18 if embedding_vector is not None:
19 embedding_matrix[i] = embedding_vector
20
21 # Use in embedding layer
22 embedding_layer = layers.Embedding(
23 vocab_size,
24 embedding_dim,
25 weights=[embedding_matrix],
26 input_length=max_length,
27 trainable=False # Freeze embeddings
28 )
Recurrent Neural Networks (RNNs)
RNNs are designed to process sequential data by maintaining an internal state or memory.
Simple RNN
PYTHON
1
2 from tensorflow.keras.layers import SimpleRNN
3
4 model = keras.Sequential([
5 layers.Embedding(vocab_size, 128),
6 layers.SimpleRNN(64),
7 layers.Dense(1, activation='sigmoid')
8 ])
Long Short-Term Memory (LSTM)
LSTMs address the vanishing gradient problem in RNNs:
PYTHON
1
2 from tensorflow.keras.layers import LSTM
3
4 model = keras.Sequential([
5 layers.Embedding(vocab_size, 128),
6 layers.LSTM(64),
7 layers.Dense(1, activation='sigmoid')
8 ])
9
10 # Stacked LSTM
11 model = keras.Sequential([
12 layers.Embedding(vocab_size, 128),
13 layers.LSTM(64, return_sequences=True),
14 layers.LSTM(32),
15 layers.Dense(1, activation='sigmoid')
16 ])
Gated Recurrent Unit (GRU)
Simpler variant of LSTM with comparable performance:
PYTHON
1
2 from tensorflow.keras.layers import GRU
3
4 model = keras.Sequential([
5 layers.Embedding(vocab_size, 128),
6 layers.GRU(64),
7 layers.Dense(1, activation='sigmoid')
8 ])
Bidirectional RNNs
Process sequences in both directions:
PYTHON
1
2 from tensorflow.keras.layers import Bidirectional, LSTM
3
4 model = keras.Sequential([
5 layers.Embedding(vocab_size, 128),
6 Bidirectional(LSTM(64)),
7 layers.Dense(1, activation='sigmoid')
8 ])
Attention Mechanisms
Attention allows models to focus on relevant parts of the input:
PYTHON
1
2 from tensorflow.keras.layers import Attention
3
4 # Query, Value, Key attention
5 query = layers.Dense(64)(inputs)
6 value = layers.Dense(64)(inputs)
7 key = layers.Dense(64)(inputs)
8
9 attention_output = Attention()([query, value, key])
Transformers
Transformer architecture revolutionized NLP:
Key Components:
- Multi-head self-attention
- Positional encoding
- Feed-forward networks
- Layer normalization
Using Pre-trained Transformers
PYTHON
1
2 from transformers import TFAutoModel, TFAutoTokenizer
3
4 # Load pre-trained model
5 model_name = 'bert-base-uncased'
6 tokenizer = TFAutoTokenizer.from_pretrained(model_name)
7 model = TFAutoModel.from_pretrained(model_name)
8
9 # Tokenize text
10 inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='tf')
11
12 # Get embeddings
13 outputs = model(inputs)
Text Classification
Sentiment Analysis Example
PYTHON
1
2 from tensorflow.keras import layers, models
3
4 model = models.Sequential([
5 layers.Embedding(vocab_size, 128),
6 layers.Bidirectional(LSTM(64)),
7 layers.Dense(32, activation='relu'),
8 layers.Dropout(0.5),
9 layers.Dense(1, activation='sigmoid')
10 ])
11
12 model.compile(
13 optimizer='adam',
14 loss='binary_crossentropy',
15 metrics=['accuracy']
16 )
Text Generation
Character-level text generation:
PYTHON
1
2 model = models.Sequential([
3 layers.Embedding(vocab_size, 256, input_length=seq_length),
4 layers.LSTM(128, return_sequences=True),
5 layers.LSTM(128),
6 layers.Dense(vocab_size, activation='softmax')
7 ])
8
9 # Generate text
10 def generate_text(seed_text, num_chars):
11 for _ in range(num_chars):
12 # Convert to sequences
13 sequences = tokenizer.texts_to_sequences([seed_text])
14 padded = pad_sequences(sequences, maxlen=seq_length)
15
16 # Predict next character
17 preds = model.predict(padded, verbose=0)
18 pred_index = np.argmax(preds[0])
19
20 # Convert back to character
21 next_char = index_to_char[pred_index]
22 seed_text += next_char
23
24 return seed_text
Machine Translation
Sequence-to-sequence models:
PYTHON
1
2 # Encoder
3 encoder_inputs = layers.Input(shape=(None,))
4 encoder_embedding = layers.Embedding(vocab_size, 256)(encoder_inputs)
5 encoder_lstm = layers.LSTM(512, return_state=True)
6 _, state_h, state_c = encoder_lstm(encoder_embedding)
7 encoder_states = [state_h, state_c]
8
9 # Decoder
10 decoder_inputs = layers.Input(shape=(None,))
11 decoder_embedding = layers.Embedding(vocab_size, 256)(decoder_inputs)
12 decoder_lstm = layers.LSTM(512, return_sequences=True, return_state=True)
13 decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
14 decoder_dense = layers.Dense(vocab_size, activation='softmax')
15 decoder_outputs = decoder_dense(decoder_outputs)
16
17 model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
Practical Tips
- Use Pre-trained Models: Better performance with less data
- Attention Mechanisms: Improve performance on long sequences
- Layer Normalization: Stabilize training
- Gradient Clipping: Prevent exploding gradients
- Teacher Forcing: Improve sequence generation
Common Challenges
- Vanishing Gradients: Use LSTM/GRU or residual connections
- Overfitting: Use dropout, regularization
- Long Sequences: Use attention or transformers
- Computational Cost: Use smaller models or truncation
Applications
- Machine translation
- Chatbots and dialogue systems
- Text summarization
- Sentiment analysis
- Question answering
- Document classification
- Speech recognition