A New MIT Study Shows Reinforcement Learning Minimizes Catastrophic Forgetting Compared to Supervised Fine-Tuning
Table of contents What is catastrophic forgetting in foundation models? Why does online reinforcement learning forget less than supervised fine-tuning? How can forgetting be measured? What do experiments on large language models reveal? How does RL compare to SFT in robotics tasks? What insights come from the ParityMNIST study? Why do on-policy updates matter? Are…
