gpt-oss-120b & gpt-oss-20b Model Card

介绍

我们推出了 gpt-oss-120b 和 gpt-oss-20b 两款开放权重推理模型,这些模型在 Apache 2.0 许可证和我们的 gpt-oss 使用政策下提供。它们是在开源社区反馈的基础上开发的,仅支持文本输入,兼容我们的 Responses API,设计用于具备强指令遵循能力的代理工作流,支持工具使用(如网页搜索和 Python 代码执行)及推理能力——包括能够根据任务复杂度调整推理力度。模型可定制,支持完整的链式思维(CoT)和结构化输出。

安全性是我们开放模型方法的基础。这些模型与专有模型存在不同的风险特征:一旦发布,决心坚定的攻击者可能会对其进行微调,以绕过安全拒绝机制或直接优化有害行为,而 OpenAI 无法实施额外的缓解措施或撤销访问权限。

在某些情况下,开发者和企业需要实施额外的安全保障措施,以复制通过我们的 API 和产品提供的模型所内置的系统级保护。我们将此文档称为模型卡,而非系统卡,因为 gpt-oss 模型将作为各种系统的一部分被广泛使用,这些系统由不同的利益相关者创建和维护。虽然模型默认设计遵循 OpenAI 的安全政策,但其他利益相关者也会做出并实施自己的决策,以确保这些系统的安全。

我们对 gpt-oss-120b 进行了可扩展能力评估,确认该默认模型在我们准备框架的三个跟踪类别(生物与化学能力、网络能力和 AI 自我改进)中均未达到高能力的指示阈值。我们还调查了两个额外问题:

对抗性行为者是否能通过微调 gpt-oss-120b,使其在生物与化学或网络领域达到高能力?模拟攻击者的潜在行为,我们对 gpt-oss-120b 进行了针对这两个类别的对抗性微调。OpenAI 安全咨询组(SAG)审查了该测试,结论是即使利用 OpenAI 领先的训练技术进行强力微调,gpt-oss-120b 仍未达到生物与化学风险或网络风险的高能力水平。
发布 gpt-oss-120b 是否会显著推动开放基础模型在生物能力领域的前沿?我们的发现是否定的:在大多数评估中,一个或多个现有开源模型的默认表现已接近 gpt-oss-120b 对抗性微调后的表现。

作为此次发布的一部分,OpenAI 重申其致力于推动有益 AI 发展并提升整个生态系统安全标准的承诺。

----------------------

Introduction

We introduce gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models available under the Apache 2.0 license and our gpt-oss usage policy. Developed with feedback from the open-source community, these text-only models are compatible with our Responses API and are designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities—including the ability to adjust the reasoning effort for tasks that don’t require complex reasoning. The models are customizable, provide full chain-of-thought (CoT), and support Structured Outputs.

Safety is foundational to our approach to open models. They present a different risk profile than proprietary models: Once they are released, determined attackers could fine-tune them to bypass safety refusals or directly optimize for harm without the possibility for OpenAI to implement additional mitigations or to revoke access.

In some contexts, developers and enterprises will need to implement extra safeguards in order to replicate the system-level protections built into models served through our API and products. We’re terming this document a model card, rather than a system card, because the gpt-oss models will be used as part of a wide range of systems, created and maintained by a wide range of stakeholders. While the models are designed to follow OpenAI’s safety policies by default, other stakeholders will also make and implement their own decisions about how to keep those systems safe.

We ran scalable capability evaluations on gpt-oss-120b, and confirmed that the default model does not reach our indicative thresholds for High capability in any of the three Tracked Categories of our Preparedness Framework (Biological and Chemical capability, Cyber capability, and AI Self-Improvement). We also investigated two additional questions:

Could adversarial actors fine-tune gpt-oss-120b to reach High capability in the Biological and Chemical or Cyber domains? Simulating the potential actions of an attacker, we adversarially fine-tuned the gpt-oss-120b model for these two categories. OpenAI’s Safety Advisory Group (“SAG”) reviewed this testing and concluded that, even with robust fine-tuning that leveraged OpenAI’s field-leading training stack, gpt-oss-120b did not reach High capability in Biological and Chemical Risk or Cyber risk.
Would releasing gpt-oss-120b significantly advance the frontier of biological capabilities in open foundation models? We found that the answer is no: For most of the evaluations, the default performance of one or more existing open models comes near to matching the adversarially fine-tuned performance of gpt-oss-120b.

As part of this launch, OpenAI is reaffirming its commitment to advancing beneficial AI and raising safety standards across the ecosystem.

via OpenAI News
 
 
Back to Top
Copyright © 2025 BESTAI. All rights reserved.
BEST AI API中转 - OpenAI DeepSeek Claude Gemini Grok MidJourney 2.8折起
[email protected]