
OpenAI has taken another turn to its artificial intelligence strategy with the launch of GPT-5.4This model not only promises more computing power and better scores on synthetic benchmarks, but also represents a clear step toward the automation of real-world work. The company presents this system as its most capable and efficient model to date, focusing on professional environments, lengthy tasks, and agents capable of operating as if they were a person sitting at a computer.
The announcement also comes at a delicate time for Sam Altman's company, which is mired in a reputational crisis linked to its agreements with the U.S. Department of Defense and boycott campaigns like #QuitGPT. The unveiling of GPT-5.4, just days after GPT-5.3 InstantIt functions as both a technical and political move: flexing technological muscle while trying to regain the trust of users and major clients.
Two variants for different profiles: GPT-5.4 Thinking and GPT-5.4 Pro
OpenAI has released GPT-5.4 in two main versions. On the one hand, GPT-5.4 Thinking, available in ChatGPT's paid plans (Plus, Team and Pro), geared towards deep and multi-stage reasoningThis model shows the user a preview of how the task will be approached, allowing them to intervene mid-response to redirect instructions without having to start from scratch. This ability to “cut and redirect reasoning,” which OpenAI calls steerabilityIt is designed for complex problems where the user wants to have more control over the process.
On the other hand, it appears GPT-5.4 ProThis variant is aimed at large-scale enterprises and developments where sustained performance is paramount. intensive tasks and complex workflowswith a special focus on agents who work for extended periods, process numerous documents, and must make sequential decisions. In the API, both versions can be used as engines for custom products, also integrating with the code-oriented platform, the successor to Codex.
An AI that operates the computer like a human user.
The feature that is generating the most headlines is that GPT-5.4 is OpenAI's first general-purpose model with native computer usage capabilitiesThe company uses the term “computer use” to refer to a mode in which the system is not limited to generating text, but rather He interprets what he sees on the screen (through high-resolution capture and viewing) and emits mouse and keyboard actions to complete tasks.
In practice, this allows us to consider scenarios that until recently sounded like everyday science fiction: asking AI to open emails, download invoices, extract key data and paste it into a spreadsheet, or navigate through different business applications to fill out forms, query databases, or generate reports. According to internal benchmarks, in the test OSWorld-Verifiedwhich measures precisely that ability to handle a desktop environment, GPT-5.4 achieves a 75% success rate, above the 47,3% of GPT-5.2 and also of average human performance, set at 72,4%.
These types of skills fit perfectly with the trend towards Agent-based AITools like the OpenClaw agent, designed to "take control" of the user's computer to automate repetitive tasks, directly benefit from a model that comes pre-configured to interpret the screen and execute complete sequences of actions. For European companies testing agents in administration, finance, or technical support departments, the difference between a chatbot that merely responds and a model that actually acts is substantial.
Context window for up to one million tokens
Another major highlight of GPT-5.4 is its short-term memory capacity. In the API and its integration with Codex, the model supports context windows of up to one million tokensThis more than doubles the operating memory associated with GPT-5.2, which was around 400.000 tokens, and represents a significant leap for those who work with massive amounts of information: contracts of hundreds of pages, voluminous code repositories, customer databases, or annual financial reports.
For European companies and law firms, accustomed to dealing with extensive regulation, from banking regulations to compliance documentation such as GDPRThis expanded context allows for the processing of entire sets of documents without having to artificially fragment them. The direct consequence is that Less context is lost, errors of omission are reduced, and coherence is better maintained. in tasks that require following precise instructions through many steps.
In addition to memory, GPT-5.4 introduces what some sources have described as a mode of “extreme reasoning”This approach allows for dedicating significantly more computing power to complex questions, executing processes that can last for hours instead of seconds. It's not just about responding quickly, but about being able to to maintain a prolonged analysis with greater depth and consistencyThis is especially relevant for consultancies, auditors, or research teams operating in Europe with long-term projects.
Tool Search and efficiency in the use of tools
For developers building on the API, one of the most practical new features is Tool SearchUntil now, models needed to receive the definition of all available tools within the context, which significantly increased token consumption in feature-rich systems. With Tool Search, GPT-5.4 is able to dynamically search for the tool you need at all times, consulting only the essential information.
In tests with 250 MCP Atlas benchmark tasks, using 36 different tool servers, this form of dynamic access achieved reduce total token consumption by around 47%while maintaining the same level of accuracy. For European companies designing agent platforms with dozens of microservices, from billing systems to internal CRMs and ERPs, this improvement translates into Lower operating costs and faster response timeswithout sacrificing the complexity of workflows.
Professional performance: from the office to the spreadsheet
Beyond the technical headlines, GPT-5.4 is explicitly designed for tasks of professional knowledgeIn the test GDPval, which measures the ability of AI agents to produce real work in 44 different occupations, the new model matches or surpasses human professionals in 83% of comparisonsThese types of tasks range from preparing business presentations to basic financial analysis or drafting legal documents.
OpenAI particularly highlights the improvements in working with spreadsheets and presentations. In an internal financial modeling benchmark, GPT-5.4 achieves an 87,3% score, in front of 68,4% of GPT-5.2For European banks, insurers, or fintech companies that handle complex models in Excel or equivalent tools, this difference can mark the leap between a support tool and an assistant capable of performing the tasks of a junior analyst with limited supervision.
In the area of presentations, human evaluators preferred around the 68% of the time the slides were generated by GPT-5.4 Compared to its predecessor, it offers significant improvements in both aesthetics and visual variety. These kinds of enhancements are perfectly suited to the daily work of sales, marketing, and consulting teams in Spain, where preparing a clear and well-structured presentation can consume many hours of work.
Fewer errors, more reliability in long answers
One of the common criticisms of previous models was their tendency to "hallucinate," that is, to fabricate data or mix sources unreliably. OpenAI claims that GPT-5.4 is 33% less likely to make false statements that GPT-5.2, and that its complete responses have 18% less likely to contain errorsThese figures, although derived from internal tests, suggest that AI is better suited for regulated sectors such as... financial or healthwhere any incorrect information can pose a serious problem.
The combination of a much broader contextual window, an extended mode of reasoning, and the ability to interrupt the process mid-course to correct course contributes to this greater reliability. For a law firm in Madrid or a consultancy in Brussels, being able to review the model's "plan of attack" before it finishes drafting a complete report allows to detect deviations or poor approaches in timewithout wasting resources or time on subsequent review.
Programming and performance in technical benchmarks
In the field of software development, GPT-5.4 inherits the capabilities of GPT-5.3-Codex And, according to OpenAI, it matches or surpasses them in demanding tests such as SWE-Bench Pro with lower latency. The improvements in scores aren't spectacular (we're talking about a moderate jump in the percentage of resolved incidents), but the combination of code, reasoning, and native computer usage in a single model presents an interesting scenario: agents that They read code repositories, modify files, and test changes in real-world environments.all within the same flow.
For European developers integrating GPT-5.4 via the API, perhaps the key isn't so much the exact benchmark figure, but the fact that the model solves similar tasks using fewer tokensOpenAI insists that GPT-5.4 is its most token-efficient reasoning system to date, meaning it can reach the same conclusion with fewer "internal words." For companies that pay per token, that efficiency can more than offset the increased fee per million tokens.
Web browsing and complex searches
Another area where GPT-5.4 improves upon its predecessors is web interaction. In benchmarks such as BrowseCompFocused on online search and research tasks, the new model reaches around 82,7%, in front of 65,8% of GPT-5.2OpenAI maintains that GPT-5.4 is especially good at... identify relevant information among large amounts of data, what they call "needle in a haystack" queries.
For European journalists, market analysts, and researchers, this capability means being able to delegate some of the information screening work to AI, while maintaining a supervisory and final verification role. The model can track multiple sources, select those that appear most reliable, and offer a reasoned summary, reducing the time spent on repetitive searches.
Higher prices, but also greater efficiency
In terms of price, the GPT-5.4 comes with a price increase compared to the GPT-5.2. The standard model costs $2,50 per million input tokens and $15 per million output tokens, compared to $1,75 and $14, respectively, for GPT-5.2. The version GPT-5.4 Pro It's considerably more expensive: $30 per million input tokens and $180 per million output tokens, figures clearly geared towards high value-added business projects.
OpenAI defends these fees by relying on the greater efficiency in token consumption and in error reduction. If a model requires significantly fewer tokens to perform the same task and also makes fewer errors requiring manual correction, the total cost per project can be lower even with a higher token price. For large accounts in Europe, from systemic banks to major industrial groups, the debate is no longer so much about the nominal price per million tokens, but rather about the overall cost of automating processes with guaranteed results.
A launch amid controversy and fierce competition
GPT-5.4 doesn't appear out of nowhere. It arrives in the middle of a very close competition with Anthropic and GoogleAnd amidst the media frenzy surrounding OpenAI's agreements with the Pentagon, while Anthropic has gained ground in the enterprise segment with models like Claude Opus 4.6 and a more security-focused approach, Google competes with its Gemini family and advanced multimodal capabilities. In this context, GPT-5.4 aims to position itself as a benchmark model in agentic capacity, computer use, and long-term context.
At the same time, the movement comes after campaigns such as #CancelChatGPT and QuitGPTThese actions have prompted hundreds of thousands of people to cancel their subscriptions or announce a boycott on social media. The perception that OpenAI accepted a military contract without sufficient safeguards, while Anthropic rejected it, has eroded some of the company's reputational capital. In Europe, where the debate on the ethical use of AI and its regulation is progressing with frameworks such as the upcoming AI Act, these agreements are being watched with particular attention.
Infrastructure costs and pressure for profitability
Behind each new version of GPT lies a less visible reality: the cost of operating increasingly large models with gigantic context windows. OpenAI handles multi-million dollar figures in infrastructure and computing spendingwith projections of substantial losses in the coming years despite significant revenue growth. A model like GPT-5.4, capable of processing up to one million tokens and with reasoning modes that can extend for hours, demands considerable computing power per request.
To contain those costs, the company is betting on proprietary or specialized hardware and through agreements with major cloud providers. It is also segmenting its catalog into several tiers (Instant, Thinking, Pro, Codex) to adjust how much processing power it allocates to each type of request. The introduction of configurable modes in GPT-5.4, which allow users to choose between faster, cheaper responses or more in-depth analysis, aligns with this attempt to balance capacity and cost-effectiveness. In Europe, where data centers and electricity consumption are under regulatory scrutiny, this type of model also reignites the debate about the energy impact of AI.
Towards a new normal: agents, security and constant changes
Beyond the technical specifications, GPT-5.4 reinforces a trend that was already emerging: the transition from chatbots to autonomous agentsThe combination of native computer usage, long-term context management, and dynamic tools points to systems capable of managing complete processes with occasional human intervention. Analysis firms predict that, by the end of 2026, a significant portion of large corporations will be using agent-based architectures from the GPT-5.x series for critical tasks, from customer service to internal document management.
That move comes with uncomfortable questions about security and controlIf a model can operate for hours, consulting sensitive data and executing actions on internal systems, monitoring mechanisms and security barriers must be much more robust. Voices within and outside the industry, including the European research community, have long warned that the race to release increasingly powerful models cannot outpace the development of effective safeguards.
With GPT-5.4, OpenAI is trying to demonstrate that it can offer more power, greater autonomy, and increased efficiency without compromising reliability. The model improves in benchmarks, reduces errors, uses fewer tokens, and is capable of handling the computer smoothly, but it also arrives amidst ethical dilemmas, competitive pressure, and doubts about the economic sustainability of this pace of innovation. For companies and professionals in Spain and the rest of Europe, the question is no longer just whether the technology is impressive, but how to integrate it responsibly into your daily life, with clear benefits and manageable risks.
