简介: 该项目通过强化学习训练教师模型,帮助大型语言模型学习如何进行推理,以便在测试阶段实现更好的扩展性和性能。
----------------------
SakanaAI/RLT
Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.
Language: Python
Stars: 212 Issues: 0 Forks: 34
https://github.com/SakanaAI/RLT
GitHub
GitHub - SakanaAI/RLT: Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.
Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling. - SakanaAI/RLT
via GitHub repos - Telegram Channel