Training at Scale: Bridge to Fine-Tuning Math to References

Bridge to Fine-Tuning Math

Fine-tuning keeps many of the same scale constraints, but usually changes the parameter-update surface. The next section studies full fine-tuning, adapters, LoRA-style low-rank updates, prompt tuning, and preference-oriented objectives. The training-at-scale accounting here remains useful because every fine-tuning method still has memory, optimizer, throughput, and evaluation budgets.

References

Jared Kaplan et al., "Scaling Laws for Neural Language Models", 2020: https://arxiv.org/abs/2001.08361
Jordan Hoffmann et al., "Training Compute-Optimal Large Language Models", 2022: https://arxiv.org/abs/2203.15556
Mohammad Shoeybi et al., "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism", 2019: https://arxiv.org/abs/1909.08053
Samyam Rajbhandari et al., "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models", 2019: https://arxiv.org/abs/1910.02054
Priya Goyal et al., "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour", 2017: https://arxiv.org/abs/1706.02677
Noam Shazeer et al., "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer", 2017: https://arxiv.org/abs/1701.06538
OpenAI, "AI and Compute", 2018: https://openai.com/research/ai-and-compute

Training at Scale: Part 3 - Bridge To Fine Tuning Math To References

Training at Scale: Bridge to Fine-Tuning Math to References

Bridge to Fine-Tuning Math

References

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?