GPT-5-Codex: OpenAI's agent that programs and reviews code

  • GPT-5-Codex is a variant of GPT-5 optimized for agent-based coding in Codex.
  • Dynamically adjusts thinking time from seconds to over seven hours depending on the task.
  • Improves code review and critical bug detection, with fewer erroneous comments.
  • Available in Plus, Pro, Business, Edu, and Enterprise; API coming later.

Generic image of GPT-5 Codex

OpenAI has presented GPT-5-Codex, a variant of its generalist model focused on agent-based programming tasks within Codex. The goal is to enable teams to switch between interactive sessions and long-term background work without losing context or quality of results.

The company highlights that the model adjust the time you spend reasoning based on complexity: responds in seconds to simple requests and can invest hours when the task demands it. This approach, oriented to software engineering, includes advanced capabilities of code review and critical bug detection.

What is GPT-5-Codex and what is it for?

Compared to the general-purpose GPT-5, this version has been trained with real development scenarios with frameworks like PyTorch to create projects from scratch, add features and tests, debug, refactor, and monitor changes consistently.

According to OpenAI, the model is more strict with agent guidelines (AGENTS.md), so follow the instructions better, similar to projects like OpenAssistant, and produces higher quality code with short prompts, without the need to write long prompts.

In addition to programming, GPT-5-Codex can assess correctness by running the code and tests, and flag impact issues before they go into production, which is especially useful for teams with demanding reviews.

In interface jobs, the company considers it a reliable partner for front-end tasks and desktop application creation, with improvements in mobile experience generation based on internal human preference assessments.

All of the above is integrated into the usual flow: Terminal (CLI), IDE, web, GitHub, and the ChatGPT app, with context continuity between cloud and local environment.

Performance and adaptable "thinking time"

One of the keys to the launch is its dynamic reasoning management: The model itself decides in real time how much “head” to dedicate, and can extend execution when it detects that the task is growing in complexity.

OpenAI claims to have observed stand-alone sessions of more than seven hours in large-scale refactorings, with iterations that correct test failures and validate results until the objective is met.

This behavior contrasts with strategies based on routers that predetermine resources; here, the model re-evaluates the effort as it progresses, combining agile dialogue with persistent execution.

On a practical level, this translates into quick responses to specific requests and more time invested when the work involves orchestrating changes across multiple modules or resolving complex dependencies.

For software teams, the approach promises fewer irrelevant iterations and more focus on high-impact steps, especially when reviewing large repositories or addressing cross-cutting tasks.

Conceptual image of a code agent

Tools and integration: CLI, IDE, web, and GitHub

The Codex Command Line Interface has been redesigned around agent-based flowsImages can now be attached directly in the CLI to facilitate design decisions or detect visual inconsistencies.

The system can monitor progress with to-do lists and integrates tools such as web search and MCP, an open standard for securely connecting LLMs to external data and utilities.

The interface also improves the tool call format and comparisons, which helps to follow the agent's reasoning and review diffs more clearly.

In development environments, the IDE extension and GitHub integration allow Move work between on-premises and the cloud without losing context, relying on open source in the editor for more precise answers.

OpenAI indicates that the agent runs on controlled environments by default and that it is possible to adjust permissions, in order to limit potentially destructive actions on sensitive projects.

Availability and access

GPT-5-Codex is enabled in ChatGPT Plus, Pro, Business, Edu and Enterprise, in addition to Codex experiences in terminal, web, IDE, and GitHub.

The company plans to make it available to API clients later, although at the moment it has not detailed a schedule or specific prices for that channel.

GPT-5 Codex Tests and Metrics

According to information shared by OpenAI and external reports, GPT-5-Codex offers better results than GPT-5 in agent-oriented scenarios, such as the SWE-bench Verified benchmark.

In concrete figures, they are mentioned Improvements up to 74,5% in SWE-bench Verified and a jump in refactoring tests from 33,9% with GPT-5 to 51,3% with GPT-5-Codex, suggesting advances in multi-file maintenance and editing.

The company also highlights that its Review comments are less erroneous or irrelevant, allowing attention to be focused on critical issues and reducing noise in PRs.

What GPT-5 Codex Means for Technical Teams

For developers, having an agent who combines rapid interaction and autonomous work opens the door to shorter cycles and more effective prioritization of complex tasks.

In organizations, the ability for a model to spend hours on a task requires an enterprise AI strategy, clear policies on cost and execution limits, and validation of its performance across multiple languages ​​and monorepos with extensive context.

Practices of secure integration into existing flows, with permission controls, agent decision tracking, and readable diffs to maintain quality and traceability.

With a focus on software engineering, GPT-5-Codex aims to be a technical contributor capable of creating, reviewing, and sustaining complex projects, adjusting computational effort to the actual size of the problem and raising the bar for AI-powered coding tools.

Red Hat
Related article:
Red Hat strengthens its enterprise AI strategy with OpenShift AI, F5, and an ecosystem of intelligent agents.