NanoGPT

An 11M-parameter GPT trained from scratch on Shakespeare and Hemingway

Built and trained a nano GPT language model (11M parameters) from scratch, exploring transformer architecture fundamentals.

Architecture:

  • Decoder-only transformer with 6 causal multi-head self-attention blocks
  • 2-layer linear language model head
  • Custom Byte Pair Encoding (BPE) tokenizer trained on Shakespeare’s works

Training:

  • Tokenizer trained on Shakespeare corpus
  • Language model fine-tuned on Ernest Hemingway’s works

This project deepened understanding of transformer internals — from attention mechanics and positional embeddings to tokenization and autoregressive sampling.

GitHub: ruyi101/NanoGPT