Dynamic Fine-Tuning (DFT): Bridging the Generalization Gap in Supervised Fine-Tuning (SFT) for LLMs
Supervised Fine-Tuning (SFT) is a standard technique for adapting LLMs to new tasks by training them on expert demonstration datasets. It is valued for its simplicity and ability to develop expert-like behavior quickly, but often underperforms in generalization compared to reinforcement learning (RL). RL allows models to explore diverse strategies, which leads to stronger generalization….
