Tag Archive

标签：LLM-post-training

这里整理所有带有「LLM-post-training」标签的文章，方便按主题快速回看。

LLM-post-training

共 1 篇

主题归档 · 2026-05-04

从 OPD 到 OPSD / ExOPD：解读群聊里关于 On-Policy Distillation 的几篇论文

解读 Thinking Machines 的 On-Policy Distillation 博客，以及 arXiv:2604.13016、2603.25562、2601.18734、2602.12125 四篇工作，讲清 OPD、SFT 冷启动、teacher-supported region、OPSD、自蒸馏、多专家蒸馏和 log-prob shift 背后的技术逻辑。

OPD distillation reinforcement-learning LLM-post-training OPSD ExOPD