GPT Evolution & Alignment

1

Foundation

5

GPT + Encoder

6

Alignment

6

Years

11

Papers Total

512

d_model

8

Attention heads

6+6

Encoder + Decoder layers

No RNN

Fully parallel

117M

Parameters

800M

Training words

9/12

New SOTA tasks

Decoder-only

Architecture

340M

Parameters (Large)

MLM

Masked language model

NSP

Next sentence prediction

NLU

Best for classification

1.5B

Max parameters

8B

Training tokens

Zero-Shot

No task-specific training

10×

Bigger than GPT-1

BERT

Encoder (bidirectional)

GPT

Decoder (autoregressive)

44.16

ROUGE-1 on CNN/DM

Denoise

Learn by reconstruction

175B

Parameters

300B

Training tokens

96

Attention heads

No

Fine-tuning needed

3

Training phases

85%

Preferred over raw GPT-3

1.3B

Beats 175B unaligned

ChatGPT

Direct descendant

170K

Preference pairs

Helpful

vs

Harmless

Competing objectives

Claude

Anthropic's assistant

16

Written principles

0

Human labellers for safety

RLAIF

AI feedback replaces humans

Loop

Generate → Critique → Revise

71–73%

Win rate vs human labels

∞

Scalable labelling

$0

Marginal cost per label

Validates

Constitutional AI approach

1

Training phase (vs 3 in RLHF)

No RM

No reward model

No PPO

No RL required

LLaMA/Mistral

Most open-source models use DPO

3 Roles

One model plays all

+20%

Average improvement

0

Additional training

Agents

Pattern behind AI agents

1

Architecture

3

GPT Generations

2

Encoder Models

6

Alignment Methods

ChatGPT + Claude

What you use today

GPT Evolution & Alignment

Attention Is All You Need

GPT-1 — Generative Pre-Training

BERT — Bidirectional Transformers

GPT-2 — Language Models are Multitask Learners

BART — Denoising Seq2Seq

GPT-3 — Few-Shot Learners at Scale

InstructGPT — Learning to Follow Instructions

HH-RLHF — Helpful and Harmless Assistant

Constitutional AI — Harmlessness via AI Feedback

RLAIF — Scaling RLHF with AI Feedback

DPO — Direct Preference Optimisation

SELF-REFINE — Iterative Refinement

The Full Story in One Diagram