Part 3
1 min read2 headingsSplit lesson page
Lesson overview | Previous part | Lesson overview
Training at Scale: Bridge to Fine-Tuning Math to References
Bridge to Fine-Tuning Math
Fine-tuning keeps many of the same scale constraints, but usually changes the parameter-update surface. The next section studies full fine-tuning, adapters, LoRA-style low-rank updates, prompt tuning, and preference-oriented objectives. The training-at-scale accounting here remains useful because every fine-tuning method still has memory, optimizer, throughput, and evaluation budgets.
References
- Jared Kaplan et al., "Scaling Laws for Neural Language Models", 2020: https://arxiv.org/abs/2001.08361
- Jordan Hoffmann et al., "Training Compute-Optimal Large Language Models", 2022: https://arxiv.org/abs/2203.15556
- Mohammad Shoeybi et al., "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism", 2019: https://arxiv.org/abs/1909.08053
- Samyam Rajbhandari et al., "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models", 2019: https://arxiv.org/abs/1910.02054
- Priya Goyal et al., "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour", 2017: https://arxiv.org/abs/1706.02677
- Noam Shazeer et al., "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer", 2017: https://arxiv.org/abs/1701.06538
- OpenAI, "AI and Compute", 2018: https://openai.com/research/ai-and-compute