高互动(1317分):Train your own LLM from scratch!
A step-by-step repo that walks you through bui...
A step-by-step repo that walks you through building and training a transformer model from scratch using PyTorch. From downloading training data all the way to generat
展开原文
A step-by-step repo that walks you through building and training a transformer model from scratch using PyTorch. From downloading training data all the way to generating text.
The architecture is built from the ground up following the original "Attention is All You Need" paper. MLP, single head attention, multi-head attention, transformer blocks, and the full transformer model - all coded and explained with detailed diagrams at each step.
Training data comes from The Pile - a diverse 825GB open-source dataset covering books, articles, code, websites, and more. The repo includes scripts to download it, preprocess and tokenize it using tiktoken, store it in HDF5 format, and feed it into training batches.
You can train a 13M parameter model on a single Colab T4 GPU. At 13M parameters the model starts generating proper grammar and coherent short sentences. For billion-parameter training you need at least an A100 or RTX 4090. The repo includes a full GPU compatibility table so you know exactly what's possible on your hardware.
Includes a complete SFT and RLHF guide as a separate notebook for taking your trained model further.
Key capabilities:
• End-to-end pipeline: data download → preprocessing → training → text generation
• Full transformer implementation from scratch with PyTorch
• Trains models from 13M to 2B+ parameters on a single GPU
• Training data from The Pile (825GB, 22 diverse datasets)
• Tokenization via tiktoken (r50k_base)
• SFT and RLHF guide included
100% open source.
I've shared the link in the replies!
热门回复 3
50+ AI Agent & RAG apps. 100% free and Open source.
https://t.co/bceBGzigAB
50+ AI Agent & RAG apps. 100% free and Open source.
https://t.co/bceBGzigAB










