Prefix-RFT: A Unified Machine Learning Framework to blend Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT)
Giant language fashions are usually refined after pretraining utilizing both supervised fine-tuning (SFT) or reinforcement fine-tuning (RFT), every with distinct strengths and limitations. SFT is efficient in instructing instruction-following by way of example-based studying, however it will possibly result in inflexible habits and poor generalization. RFT, then again, optimizes fashions for activity success utilizing reward…
