BERT Text Classification Deep Dive
Explore the power of BERT for sentiment analysis with an interactive demo and detailed code walkthrough.
Explore the power of BERT for sentiment analysis with an interactive demo and detailed code walkthrough.
BERT (Bidirectional Encoder Representations from Transformers) is a revolutionary NLP model introduced by Google in 2018. Unlike traditional models that process text sequentially, BERT is bidirectional, capturing context from both left and right directions simultaneously. This makes it exceptionally powerful for tasks like text classification, question answering, and more.
Key features of BERT:
BERT’s ability to understand nuanced language patterns makes it ideal for tasks requiring deep contextual understanding, such as classifying movie reviews as positive or negative.
BERT is built on the Transformer’s encoder architecture, stacking multiple layers of interconnected nodes. Each layer includes:
\[ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \]
Where \( Q \) (query), \( K \) (key), and \( V \) (value) are vectors, and \( d_k \) is the dimension of the keys.
BERT’s input includes tokenized text with special tokens:
The bidirectional nature and pre-training allow BERT to capture deep semantic relationships, making it highly effective for downstream tasks.
BERT excels in text classification by leveraging its contextual embeddings. For sentiment analysis, we fine-tune BERT on a labeled dataset of movie reviews labeled as positive or negative. Below is a detailed breakdown of the process:
The dataset consists of movie reviews with binary labels (1 for positive, 0 for negative). We tokenize the text using BERT’s tokenizer, which converts words into subword tokens and adds special tokens like [CLS] and [SEP].
texts = [
"This movie is great and I loved it!",
"Terrible film, very boring.",
...
]
labels = [1, 0, ...]
train_texts, val_texts, train_labels, val_labels = train_test_split(
texts, labels, test_size=0.2, random_state=42
)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
The tokenizer outputs input_ids
(token indices) and attention_mask
(indicating valid tokens vs. padding).
We use the pre-trained bert-base-uncased
model and add a classification head. The [CLS] token’s output (pooled_output) is passed through a dropout layer and a dense layer with softmax activation for binary classification.
bert_model = TFBertModel.from_pretrained('bert-base-uncased')
input_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name="input_ids")
attention_mask = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name="attention_mask")
bert_outputs = bert_model(input_ids, attention_mask=attention_mask)
pooled_output = bert_outputs.pooler_output
dropout = tf.keras.layers.Dropout(0.3)(pooled_output)
output = tf.keras.layers.Dense(2, activation='softmax')(dropout)
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=output)
The softmax layer outputs probabilities for positive and negative classes.
The model is fine-tuned with a small learning rate (e.g., 2e-5) using the Adam optimizer and sparse categorical crossentropy loss. We train for 10 epochs on batched data.
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=10,
verbose=2
)
Small Learning Rate: BERT requires small learning rates (e.g., 2e-5) to avoid catastrophic forgetting of pre-trained weights.
Batch Size: Small batches (e.g., 2) are used due to BERT’s memory demands.
Epochs: 3-10 epochs are often sufficient for fine-tuning on small datasets.
For inference, we tokenize a new text, pass it through the model, and predict the sentiment based on the highest probability class.
def predict_sentiment(text, model, tokenizer, max_len=128):
encodings = tokenizer(
[text],
max_length=max_len,
padding='max_length',
truncation=True,
return_tensors='tf'
)
probs = model({'input_ids': encodings['input_ids'], 'attention_mask': encodings['attention_mask']}).numpy()
prediction = np.argmax(probs, axis=-1)[0]
print(f"Probabilities: {probs}")
return "Positive" if prediction == 1 else "Negative"
Test the sentiment analysis model with your own text:
Note: This demo uses a simple keyword-based simulation. The actual model uses BERT’s contextual understanding for more accurate predictions.
This section provides the complete Python script for fine-tuning BERT for sentiment analysis on movie reviews.
import tensorflow as tf
from transformers import TFBertModel, BertTokenizer
from sklearn.model_selection import train_test_split
import numpy as np
# Dataset
texts = [
"This movie is great and I loved it!",
"Terrible film, very boring.",
"Amazing storyline and acting!",
"I didn't enjoy this at all.",
"Fantastic experience, highly recommend!",
"Really bad, waste of time.",
"One of the best movies I've seen this year!",
"Completely disappointing and predictable.",
"Brilliant direction and stunning visuals.",
"I fell asleep halfway through, so dull.",
"Heartwarming and beautifully shot.",
"Poor acting and weak script.",
"Absolutely loved the plot twists!",
"Not worth the hype at all.",
"Engaging from start to finish!",
"The worst film I’ve ever watched.",
"Incredible performances by the cast!",
"Script was a mess and pacing was off."
]
labels = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
# Train/Val Split
train_texts, val_texts, train_labels, val_labels = train_test_split(
texts, labels, test_size=0.2, random_state=42
)
# Load tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenization function
def tokenize_texts(texts, max_len=128):
encodings = tokenizer(
texts,
max_length=max_len,
padding='max_length',
truncation=True,
return_tensors='tf'
)
return encodings['input_ids'], encodings['attention_mask']
# Tokenize data
train_input_ids, train_attention_mask = tokenize_texts(train_texts)
val_input_ids, val_attention_mask = tokenize_texts(val_texts)
# Convert labels to tensors
train_labels = tf.convert_to_tensor(train_labels, dtype=tf.int32)
val_labels = tf.convert_to_tensor(val_labels, dtype=tf.int32)
# Prepare datasets
train_dataset = tf.data.Dataset.from_tensor_slices((
{'input_ids': train_input_ids, 'attention_mask': train_attention_mask},
train_labels
)).batch(2).prefetch(tf.data.AUTOTUNE)
val_dataset = tf.data.Dataset.from_tensor_slices((
{'input_ids': val_input_ids, 'attention_mask': val_attention_mask},
val_labels
)).batch(2).prefetch(tf.data.AUTOTUNE)
# Load base BERT model
bert_model = TFBertModel.from_pretrained('bert-base-uncased')
# Input layers
input_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name="input_ids")
attention_mask = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name="attention_mask")
# Get pooled output from BERT
bert_outputs = bert_model(input_ids, attention_mask=attention_mask)
pooled_output = bert_outputs.pooler_output
# Classification head
dropout = tf.keras.layers.Dropout(0.3)(pooled_output)
output = tf.keras.layers.Dense(2, activation='softmax')(dropout)
# Build model
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=output)
# Compile model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Train the model
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=10,
verbose=2
)
# Inference function
def predict_sentiment(text, model, tokenizer, max_len=128):
encodings = tokenizer(
[text],
max_length=max_len,
padding='max_length',
truncation=True,
return_tensors='tf'
)
probs = model({'input_ids': encodings['input_ids'], 'attention_mask': encodings['attention_mask']}).numpy()
prediction = np.argmax(probs, axis=-1)[0]
print(f"Probabilities: {probs}")
return "Positive" if prediction == 1 else "Negative"
# Test inference
test_text = "This is an awesome movie!"
result = predict_sentiment(test_text, model, tokenizer)
print(f"\nText: {test_text}")
print(f"Predicted Sentiment: {result}")
Follow these steps to run the BERT model locally:
pip install tensorflow transformers scikit-learn numpy
bert_sentiment.py
.python bert_sentiment.py
bert-base-uncased
for robust embeddings.