GPT-5.2: This is the new model with which OpenAI wants to recover lost ground

  • Accelerated launch of GPT-5.2 following Google's Gemini 3 advance and OpenAI's internal "code red" declaration.
  • Three main variants (Instant, Thinking and Pro) geared towards different levels of speed, reasoning and professional accuracy.
  • Notable improvements in reasoning, coding, handling long contexts, vision and tool use, with superior results in multiple benchmarks.
  • Staggered deployment for paying users and via API, with higher prices than GPT-5.1 but geared towards intensive business use.

GPT-5.2

The race to lead the generative artificial intelligence Tensions have risen even further in recent weeks. After the Gemini 3 launch Google's OpenAI has decided to move quickly and get ahead the arrival GPT-5.2, a new iteration of its flagship model, aims to strengthen ChatGPT's performance in complex tasks, improve stability, and reduce errors in everyday use.

This launch isn't intended as a radical leap, but rather as a significant update within the 5 series. However, the combination of a accelerated deployment, changes to the internal roadmap and a more aggressive focus on reasoning and workplace productivity place GPT-5.2 at the heart of OpenAI's strategy to avoid losing ground to Google, Anthropic and DeepSeek v3.2, other actors who have placed themselves at the top of the technical rankings.

Red code in OpenAI and early release: GPT-5.2

The decision to bring forward GPT-5.2 is framed within a context of maximum competitive pressureThe positive reception of Gemini 3—especially in advanced reasoning and coding tests—prompted OpenAI to internally activate a “code red.” Sam Altman, the company’s CEO, sent a memo requesting that resources be concentrated on improving ChatGPT and secondary initiatives, such as certain monetization experiments and less high-priority features on the platform, be put on hold.

According to various leaks, the update was initially scheduled for the end of December, but management reportedly decided bring forward its deployment by a few weeks to close the performance and public perception gap created by Google's latest models. Although the exact date is always subject to possible last-minute technical adjustments, sources agree that the internal schedule has been compressed so that GPT-5.2 reaches paying users and developers as soon as possible.

This turn of events is reminiscent, albeit on a different scale, of the episode in 2022, when the launch of ChatGPT forced Google to accelerate its own product roadmap. Now the roles have been reversed, and it is OpenAI that is attempting to... reaffirm its benchmark position in a market where performance rankings and model comparison tools change almost daily.

GPT-5.2, an evolution within the 5 series focused on knowledge work

GPT-5.2 is presented as a direct continuation of GPT-5.1, not a completely new generation. Even so, the company insists that the update represents a significant advancement for the so-called knowledge work: programming, document analysis, financial modeling, scientific research, or preparation of complex reports.

OpenAI claims that the model manages the long contextsIt reduces reasoning errors and improves the ability to coordinate sequences of actions and external tools. This combination is key for tasks that go beyond answering a simple question, such as multi-step projects, extensive document reviews, or partial automation of business workflows.

In practice, GPT-5.2 promises advances in the creation of detailed spreadsheets, structured presentations, operational diagrams, and technical documentation, with the aim of enabling companies to delegate more of the "hands-on" work to the model without wasting so much time correcting and rewriting.

Three variants: Instant, Thinking, and Pro

The new GPT-5.2 family is organized into three distinct layers of use, with the intention of adjusting the model to different needs and cost levels:

  • GPT-5.2 InstantThis version prioritizes speed and is designed for everyday queries, general writing, translation, information retrieval, and tasks where response time is more important than in-depth reasoning. This variant also benefits from more stable explanations and fewer errors compared to previous versions.
  • GPT-5.2 Thinking: is the version geared towards multi-step reasoning and handling of extensive documentsIt specializes in complex programming, data analysis, advanced mathematical tasks, financial modeling, contract review, and long-term project planning. This is where OpenAI focuses much of its improvement in consistency and the use of integrated tools.
  • GPT-5.2 ProIt is positioned in the high-end range for particularly demanding uses, focusing on the highest possible precision within current technological limitations. It is the model aimed at those who prioritize reasoning quality over latency and are willing to accept a higher computational cost, such as R&D teams, specialized offices, or complex scientific projects.

This segmentation aims for more than just offering "a more powerful model": it seeks to tailor the catalog to different user profiles, from users who want quick answers in ChatGPT to European companies that deploy internal agents on their own data through the API.

Performance in GPT-5.2 benchmarks: reasoning, code, and science

OpenAI accompanies the launch with a battery of data that places GPT-5.2 is above GPT-5.1 in almost every category it has chosen to publish. In assessments such as GDPval, which compares model results with human professionals in 44 occupations, GPT-5.2 achieves wins or ties in around 70,9% of cases, with significant improvements in tasks involving the creation of presentations, operational documents, and financial materials.

Specialized tests such as GPQA Diamond—focused on graduate-level questions in physics, chemistry, and biology—, GPT-5.2 Pro achieves nearly 93% accuracyThis is closely followed by the Thinking variant, which falls slightly below but also hovers around that threshold. In advanced mathematics, the model's score in FrontierMath (Tier 1-3) rises to just over 40%, a figure that is still far from perfect but suggests steady progress in the ability to follow long and structured logical chains.

The coding section also experiences a leap. In SWE-Bench Pro, which evaluates the resolution of real incidents in software repositories And by reducing the risk of the model having previously seen the data, GPT-5.2 Thinking improves on its predecessor by several points, achieving a problem resolution rate of around 55,6%. For verified tasks, this figure rises to nearly 80%, which in practice translates to less manual intervention for reviewing patches, refactoring, and entire components.

Performing more technical evaluations, such as ARC-AGI (abstract reasoning and pattern discovery) or specific science and programming sets, the model ranks above GPT-5.1 and, according to the graphs published by OpenAI, ahead of Gemini 3. Grok 4 Fast and Claude Opus 4.5 in several complex reasoning tests. These types of metrics, although always debatable in terms of their representativeness, are one of the central arguments with which the company attempts Convince investors and major clients that the technical leadership of their rivals is, at the very least, debatable.

Impact on real-world tasks: finance, document analysis, and agents

Beyond the numbers, OpenAI insists that the improvements are noticeable in daily tasksIn internal simulations that emulate tasks of financial analysts—such as building three-state models or levered buyout operations—GPT-5.2 Thinking would have gone from an average score close to 59% to one above 68%, reducing calculation errors and the need for subsequent corrections.

Companies like Notion, Box, Shopify, Harvey, and Triple Whale, which already used previous models from the company, have reportedly seen progress in the stability of tool-based agentsThis results in better coordination between multiple API calls, more consistent intermediate steps, and fewer blockages in long flows. In some cases, according to these testimonials, it has been possible to replace fragile multi-agent architectures with a single agent supported by GPT-5.2, with more than twenty connected tools and less need for constant monitoring.

For product, support, and development teams within European organizations, these types of changes translate into the possibility of building internal assistants that They process lengthy contracts, regulatory reports, or technical documentation. without losing the thread after hundreds of pages or multiple related files, something especially relevant in regulated sectors such as finance, healthcare or energy.

Viewing, graphical interfaces and understanding of long documents in GPT-5.2

The multimodal component also takes a step forward. In assessments like CharXiv Reasoning—focused on scientific figures—GPT-5.2 halves interpretation errors compared to GPT-5.1. In ScreenSpot-Pro, a test that measures the ability to understand complex graphical interfacesThe model increases its accuracy to figures close to 86%, which is especially useful for reading control panels, dashboards or software diagrams.

Regarding context memory, GPT-5.2 approaches perfect performance on MRCRv2 variants across hundreds of thousands of tokens. In practical terms, this means it can handle large volumes of text —consulting reports, files, technical audits or academic documentation— maintaining internal references and consistency between sections, something that many European organizations see as an essential condition for entrusting sensitive processes to an AI model.

This combination of improved vision and greater contextual capacity opens the door to more ambitious uses, such as joint review of presentations, spreadsheets, and PDF documents within the same flow, or the inspection of web interfaces and internal tools to facilitate technical support and usability analysis.

Fewer errors, but with a need for human supervision

One of the promises most frequently repeated by the company is the reduction of errors in responsesOpenAI states that GPT-5.2 Thinking generates around 30% fewer faulty responses than GPT-5.1, and that overall the rate of responses with some inaccuracy drops from about 8,8% to around 6,2%.

Even so, the company emphasizes that the model remains probabilistic and that a single incorrect statement could require a manual review of the entire output, especially in sensitive or regulated contexts. That's why it insists that GPT-5.2 should be viewed as a reasoning support toolnot as a substitute for human judgment, especially in areas such as health, finance, law, or academic research.

In sensitive areas—for example, conversations about mental health or emotional distress—the company claims to have refined the controls to minimize inappropriate responsesHowever, he acknowledges that there is still room for improvement. These considerations are especially relevant in Europe, where the new AI regulatory framework adds further obligations regarding transparency, security, and risk management.

Contribution to scientific and mathematical work

OpenAI also presents GPT-5.2 as a tool designed for to promote scientific developmentThe company states that the Series 5 already had applications in mathematics, physics, biology, computer science, astronomy, and materials science, and that with the new version these cases become more consistent.

In GPQA Diamond, one of the benchmark sets for assessing advanced scientific understanding, GPT-5.2 Pro and Thinking exceed 92% accuracyThis result is interpreted by the firm as an indication that the model can help researchers explore ideas, review literature, or outline proofs. In one documented case, GPT-5.2 Pro reportedly contributed to addressing an open problem in statistical learning theory, although this is always subject to subsequent human verification.

OpenAI itself, however, clarifies that these systems should be understood as assistants for the exploratory phase of scientific work: useful for generating conjectures, reformulating hypotheses or suggesting intermediate steps, but without displacing the central role of experts when it comes to validating results, interpreting evidence and contextualizing conclusions.

Deployment in ChatGPT and access via API

GPT-5.2 begins to be deployed in stages in ChatGPT for paying usersThis includes the Plus, Pro, Go, Business, and Enterprise plans. Not all subscribers will see the new model at the same time, as OpenAI prefers to activate access in phases to avoid capacity issues, which could be noticeable in Europe as a gradual rollout over several days.

For the next three months, GPT-5.1 will remain available as a legacy model within ChatGPT before its final retirement, so that organizations that rely on established workflows can plan the transition without abrupt interruptions. This temporary coexistence facilitates testing GPT-5.2 in parallel and adjusting prompts, internal controls, and validation processes.

In the API, the nomenclature maintains the usual correspondence: the Instant variant appears as gpt-5.2-chat-latestThe Thinking version is identified as gpt-5.2 and the Pro as gpt-5.2-proDevelopers can modulate the level of reasoning in the Pro option, with a new xhigh level designed for projects where the quality of the logical chain matters more than latency or cost.

Pricing, GPT-5.2 efficiency, and focus on enterprise customers

In economic terms, GPT-5.2 comes with higher fees per million tokens OpenAI's GPT-5.1 sets the base price at around $1,75 per million input tokens and $14 per million output tokens, with 90% discounts for cached inputs. The Pro variant further increases the cost, with figures climbing to several hundred dollars per million output tokens in its most demanding reasoning configurations.

The company argues that the model's greater efficiency allows for a reduction in the effective cost per task, especially in scenarios where GPT-5.2 takes less time to arrive at a valid answer, requires fewer retries, and makes fewer errors. fewer errors that require redoing the workEven so, the pricing structure is clearly designed for enterprise and intensive development use, rather than for one-off experiments.

At ChatGPT, Plus and higher subscriptions maintain their regular rates, shifting a significant portion of the incremental cost to API usage. For many European companies already integrating ChatGPT into intranets, productivity tools, or internal assistants, this could mean recalibrate budgets and decide which processes deserve to migrate to GPT-5.2 and which can continue to function with previous, more economical models.

Infrastructure, security and regulatory pressure

The deployment of GPT-5.2 relies, as in previous generations, on the infrastructure of Microsoft Azure and NVIDIA GPUs (including H100, H200, and GB200-NVL72 families). OpenAI has committed multimillion-dollar investments in computing power to support these frontier models, a gamble that carries financial risks and requires the company to constantly seek new revenue streams. It is also exploring open weighting models such as GPT OSS.

In parallel, the firm is introducing additional measures regarding safety and protection of minorsOne of the most striking steps is the deployment of a system capable of estimating users' ages, with the aim of adapting ChatGPT's responses to those under 18 and paving the way for a future "adult mode" with enhanced controls. These types of mechanisms align with the regulatory requirements that are becoming increasingly established in both the European Union and the United States.

OpenAI acknowledges that its systems can sometimes be overly negative, meaning they reject requests that don't necessarily violate policies, and says it is working to better balance safety and utilityThe company also insists that any relevant changes to the availability of previous versions—such as GPT-5.1, GPT-5, or GPT-4.1 in the API—will be announced well in advance, a sign of continuity for customers who still rely on those models.

GPT-5.2 is presented as a cycle update that attempts to combine improved reasoning, speed, and stability with a strategy more focused on professional and enterprise use. If the improvements in coding, science, document analysis, and handling of extensive contexts are consolidated in daily practice, the model could become a relevant tool for European organizations seeking automate part of their processes without giving up rigorous human controlIt remains to be seen to what extent these promises will translate into real changes in productivity and in the way we work with artificial intelligence in the coming months.

DeepSeek-V3.2
Related article:
DeepSeek-V3.2: the Chinese model that wants to compete with GPT-5 and Gemini-3 Pro