
OpenAI has made another bold statement with the launch de GPT-5.5, the model that places the latest generation of ChatGPT at a much higher level of autonomy and reasoning Based on what we've seen so far, the company presents it as its most intuitive and capable system for handling complex tasks from start to finish, reducing the need for constant user supervision.
This move comes at a time race for leadership in generative artificial intelligenceWith Anthropic and Google also accelerating their own models, GPT-5.5 is launching first on ChatGPT and Codex for paying users, and combines improvements in programming, office work, scientific research, and real-world computer use, at the cost of a The price increase that OpenAI is trying to justify with remarkable efficiency in the use of tokens.
What is GPT-5.5 and what role does it play in OpenAI's strategy?
According to OpenAI, GPT-5.5 marks a new step in sustained reasoning and autonomous work with a computerThe model is designed to take on long, multi-step tasks: it can receive disordered instructions, break them down, plan what to do first, choose and handle tools, review its own results, and continue even if there are ambiguities in the statement.
The company defines it as its AI most intuitive to dateInstead of simply answering isolated questions, GPT-5.5 acts as an agent: it maintains context during extensive processes, navigates information on the web, executes commands, manipulates documents and office applications, and returns pre-structured results, ready for use in professional environments.
The release includes a GPT-5.5 Pro version, geared towards more complex tasks It already has users who need more rigorous answers in fields such as law, business, education, or advanced data analysis. ChatGPT also features a mode GPT-5.5 Thinkingdesigned for particularly difficult problems in programming, research, or information analysis.
Autonomy and real-world use of the computer: from chatbot to work agent
One of OpenAI's big bets is on GPT-5.5's ability to perform as a self-employed agent on the computer. The model can search for information, decide what is relevant, combine sources, operate software and tools in sequence, and transform scattered materials into useful deliverables, without the user having to meticulously specify each step.
In day-to-day life, this translates into tasks such as generate complex documents, spreadsheets or presentationsReviewing contracts, preparing reports, analyzing databases, or creating extensive summaries from multiple files. The company claims that GPT-5.5 understands the user's needs more quickly and handles a greater portion of the work itself than previous versions.
To illustrate this change, OpenAI cites internal examples: finance teams using Codex and GPT-5.5 to review tens of thousands of tax forms In much less time, communication departments automate scoring and risk frameworks on large volumes of requests, or marketing and product areas have significantly reduced the time spent on periodic reports thanks to automated workflows.
Performance in agentive programming and software development
GPT-5.5 demonstrates improvements in programming and what OpenAI calls "agent coding"—the use of AI as an agent that writes, debugs, and maintains end-to-end code. In development environments, the model can manage large repositories, propose complex refactors, identify the source of errors, and anticipate which parts of the system will be affected by a change.
In the benchmark Terminal-Bench 2.0, which measures complex command-line workflows, GPT-5.5 achieves a 82,7%, clearly outperforming GPT-5.4, and doing so while consuming fewer tokens. In SWE-Bench ProFocused on resolving real-world GitHub issues, the model reaches 58,6%and in the internal evaluation Expert-SWE, with tasks whose estimated human time is around 20 hours, it is also above its predecessor.
Engineers who tested early versions point out that GPT-5.5 better understands the overall architecture of complex systemsIn internal tests, cases such as branch merging with hundreds of changes resolved in about twenty minutes, almost complete redesigns of subsystems (for example, a comment system in a collaborative editor) or early detection of bugs that previously required many more iterations are cited.
In Codex, OpenAI's software engineering environment, GPT-5.5 has a context window of up to 400.000 tokens, allowing it to work with very large codebases. It also features a fast mode that generates approximately [number of tokens needed]. 1,5 times faster, although with a higher cost per token, designed for those who prioritize speed of response.
Knowledge work, business, and everyday office use
Beyond software development, GPT-5.5 is designed as a tool for professional work in office, consulting, or data analysis environments. OpenAI maintains that the same capabilities that improve programming now allow for more effective documentation and analysis tasks.
In the benchmark GDPval, which assesses the ability to produce specified expert work in 44 occupationsGPT-5.5 obtains a 84,9% of correct answers or ties. In OSWorld-VerifiedA test designed to verify whether the model can handle real-world computing environments autonomously, arrives at 78,7%. In Tau2-bench Telecom, focused on customer service in the telecommunications sector, reaches a 98% without needing to adjust the prompt, which indicates high performance in support scenarios.
For enterprise use, OpenAI highlights that Over 85% of their staff use Codex weekly in areas such as engineering, finance, marketing, data, or product. Cases such as the automation of weekly reports, which saves between five and ten hours per person per week, illustrate the type of benefits the company attributes to the new model when it is integrated into business processes.
Scientific research, biology, and advanced mathematics
Scientific research is another central theme in the GPT-5.5 presentation. OpenAI is targeting workflows where it is needed. explore hypotheses, gather evidence, test assumptions, interpret results, and decide on the next experiment, an environment in which sustained contextual reasoning is key.
In tests such as GeneBenchFocused on biology and genetics tasks, GPT-5.5 improves upon the results of GPT-5.4, and the variant GPT-5.5 Pro It obtains even higher scores. In BixBenchFocused on bioinformatics and quantitative biology, the new model also achieves the best performance among systems with published data to date, according to information provided by the company.
OpenAI has even cited examples of use in advanced mathematics, where a An internal version of GPT-5.5 collaborated in the search for a new test related to off-diagonal Ramsey numbersThis was subsequently verified using the Lean formal assistant. The company presents this case as an example of how the model not only generates code or explanations, but can also contribute to mathematical arguments in complex areas.
In practical terms, testimonies are mentioned from researchers who have used GPT-5.5 Pro for analyze gene expression datasets with tens of thousands of variables and a significant number of samplesobtaining detailed reports, new angles of analysis and key questions in a timeframe that, according to their calculations, would be much longer if addressed exclusively with human labor.
Latency, reasoning time, and token efficiency
Behind the launch of GPT-5.5 lies a persistent message: It increases the model's intelligence without penalizing response speed.OpenAI claims that the new system matches the latency per token of GPT-5.4 in real-world service, despite being more capable, which is unusual in larger and more complex models.
One of the key points is the reasoning time required to complete complex tasksEarly users who have compared the behavior with previous versions report that processes that previously required between 20 and 40 minutes of work are now resolved in just three or four minutes, maintaining—and even improving—the quality of the responses.
This gain comes not only from raw speed, but also from a better token managementGPT-5.5 requires fewer tokens to achieve comparable or better results than GPT-5.4, reducing both the total processing time and the cost associated with each workflow. In scenarios with high query volume or intensive automation, this difference can be crucial.
OpenAI explains that, in order to maintain latency, it has had to redesigning inference as an integrated systemGPT-5.5 has been co-designed, trained and deployed on state-of-the-art NVIDIA hardware-based infrastructure (GB200 and GB300 NVL72), and GPT-5.5 itself and Codex have been used to optimize load balancing and partitioning heuristics, with an increase of over 20% in token generation speed on their systems.
Prices, actual cost and comparison with GPT-5.4
Although GPT-5.5 is located in the high price range per tokenOpenAI insists that, in practice, it can be more economical than its predecessor and some of the competition. The reason is the combination of greater token efficiency and less need for retries or corrections.
In the API, the reference prices reported for GPT-5.5 are $5 per million input tokens and $30 per million output tokenswith a context window that reaches up to one million tokens. For GPT-5.5 Pro, the rates rise to $30 per million tokens entered and $180 per million tokens exitedclearly targeting uses where the added value of the response outweighs the cost.
OpenAI also offers modes such as Batch and Flex, with rates at approximately half the standard price.and a Priority mode that multiplies the cost by 2,5 in exchange for higher queue priority and shorter response times. The company admits that GPT-5.5 is more expensive than GPT-5.4 in nominal terms, but argues that the reduction in tokens required per task and the shorter reasoning time justify the cost. They can reduce the overall cost of complex projects compared to other models..
In the market, this policy places GPT-5.5 above previous OpenAI models and below high-end alternatives that, according to estimates shared during the presentation, can be between five and ten times more expensive in practical terms when considering the combination of price, tokens consumed, and quality of the result.
Long context and performance in reasoning benchmarks
Another visible improvement of the GPT-5.5 is its ability to working with very extensive contexts without losing the thread. In tests such as Graphwalks BFS 1Mthe model reaches a 45,4% compared to 9,4% for GPT-5.4, and in OpenAI MRCR v2 with contexts between 512K and 1M tokens it rises to 74,0%, compared to 36,6% in the previous version.
In the area of ​​abstract reasoning, GPT-5.5 records a 95,0% in ARC-AGI-1 and 85,0% in ARC-AGI-2, with significant improvements over GPT-5.4. In advanced knowledge tests such as GPQA DiamondFocused on highly difficult questions, it obtains a 93,6%and in assessments such as Humanity's Last Exam It exceeds 50% when it is allowed to use external tools.
OpenAI emphasizes that many of these assessments have been carried out in research environments with reasoning configurations at very high levelsTherefore, the results may differ slightly from those perceived by ChatGPT users in production. Even so, the company wants to convey the idea that GPT-5.5 represents a A practical leap in real-world tasks, not just an academic improvement in benchmark tables.
Security, cybersecurity and responsible use
The increase in capabilities entails a reinforcement of the security safeguardsOpenAI states that GPT-5.5 is launching with its most advanced protection system to date, after undergoing internal and external evaluations, specific readiness frameworks, and red teaming processes with cybersecurity and biology specialists.
Within the framework of his Preparedness FrameworkThe company classifies the capabilities of the GPT-5.5 as Biology, chemistry, and cybersecurity at the "High" levelwithout reaching the "Critical" level. Even so, it acknowledges that the model is more effective than GPT-5.4 at finding and exploiting vulnerabilities, and has therefore deployed stricter classifiers for sensitive requests and mechanisms against repeated risky uses, something that may be more restrictive for some technical users.
In parallel, OpenAI aims to expand access to more advanced capabilities for verified defensive uses through programs such as Trusted Access for CyberThese tools are specifically aimed at organizations responsible for protecting critical infrastructure. The idea is to provide powerful defense tools without relaxing controls against potential offensive uses.
In the field of biological research, the company has launched initiatives such as reward programs for detecting biological errors in the model's behavior, with the aim of having the scientific community help identify flaws and improve safeguards before a wider deployment.
Availability of GPT-5.5 and its deployment in products
The deployment of GPT-5.5 has begun for the ChatGPT and Codex Plus, Pro, Business and Enterprise usersin both personal and corporate environments. At Codex, the model is integrated into software development workflows with an expanded context window and rapid response modes.
La GPT-5.5 Pro version It is being progressively activated for Pro, Business, and Enterprise users who need an extra level of detail and precision, especially in regulated or high-impact fields where errors can be costly. In ChatGPT, users are also starting to see specific options such as GPT-5.5 Thinking for complex research or analysis problems.
Regarding the API, OpenAI is working on Incorporate GPT-5.5 and GPT-5.5 Pro into your Responses and Chat Completions endpoints with context windows reaching one million tokens. The company indicates that access will be expanded as internal security and infrastructure capacity requirements are met, so that developers can integrate the model into their own applications once this phase is complete.
The arrival of GPT-5.5 consolidates a change of stage in the evolution of ChatGPT: The focus is shifting from simply generating text to the comprehensive automation of digital tasks, with more autonomous models capable of reasoning for longer periods and working on real systems, at the cost of higher prices and a growing debate around security and governance.In a European context where AI regulation is advancing and companies are seeking efficiency without losing control, the way in which organizations, developers, and administrations adopt—or limit—the use of GPT-5.5 may be as relevant as the benchmark figures that accompany this new model.
