Complete guide to installing and using GPT-OSS on Linux step by step

  • GPT-OSS allows you to run advanced artificial intelligence locally, ensuring complete data privacy and control.
  • The gpt-oss-20b model is the most affordable option for users with consumer hardware, while the gpt-oss-120b is reserved for professional equipment.
  • Tools like Ollama and LM Studio simplify the installation, management, and use of GPT-OSS on Linux, offering both command-line and graphical interfaces.

gpt-oss on Linux

The arrival of open language models such as GPT-OSS has marked a before and after in the use of artificial intelligence in localMore and more users want to take advantage of the power of these models without relying on the cloud or exposing their data to third parties. Installing GPT-OSS on Linux is one of the most interesting challenges and opportunities for those seeking technological autonomy and maximum privacy.

This comprehensive guide will walk you through the process of installing and using GPT-OSS on Linux. We'll cover everything you need to know: from requirements, model differences, hardware considerations, choosing and configuring tools like Ollama and LM Studio, to terminal integration, customization, and common troubleshooting. All with practical advice, real-world examples, and without omitting key details, so you can fully utilize the possibilities offered by GPT-OSS while working on your own computer.

What is GPT-OSS and what are the benefits of running it on Linux?

GPT-OSS is OpenAI's open source language model proposal.At launch, the company released two main versions: gpt-oss-20b and gpt-oss-120b. These variants are designed to run locally and allow any user to experiment, program, or work with advanced AI without relying on external servers or cloud connections.

Why is it worth using GPT-OSS locally instead of online services?

  • full privacy: Your data stays on your computer, without sending anything to the Internet.
  • You avoid API costs: perfect for intensive or experimental developments.
  • Personalization: You can control parameters, adapt behavior, and fine-tune the model to specific tasks.
  • Offline access: perfect for environments without connectivity or with security restrictions.

Linux, due to its flexibility and robustness, is the ideal environment to deploy and take advantage of the full potential of GPT-OSS, especially when command-line tools and advanced automation are required.

Key differences between GPT-OSS-20b and GPT-OSS-120b

Although both models share an open source philosophy, their technical requirements are very different. This is essential when choosing which one to install on your computer.

  • gpt-oss-20b: It is the most accessible model and can be run on consumer computers as long as they have at least 16 GB of memory (preferably VRAM). Its performance is very good for most tasks and can even run on powerful laptops or desktop PCs equipped with moderately modern GPUs.
  • gpt-oss-120b: This model requires minimum 60-80 GB of VRAM (graphics memory), something only available on professional workstations or data center hardware. Its performance and reasoning capabilities are on par with the most advanced OpenAI software, but for most home users or individual developers, gpt-oss-20b is the logical choice.

In short, if you have a computer with adequate resources and are looking to experiment, always start with gpt-oss-20b. This way, you avoid performance issues and ensure a smooth experience without compromising the model's core functionality.

Important: If your computer has less than 16 GB of VRAM, the model will use conventional RAM. You'll need at least 16 GB of physical RAM to avoid extreme slowdowns and possible crashes.

Preliminary considerations and technical requirements

Install and run GPT-OSS on Linux involves certain minimum hardware and software requirements. Before moving forward, make sure you follow these guidelines so you don't run into unpleasant problems later.

  • Recommended hardware for gpt-oss-20b: minimum 16GB RAM (preferably dedicated VRAM on GPU), modern CPU and at least 20-50GB free disk space.
  • For gpt-oss-120b: You'll need a professional GPU of 80GB or more, a data center environment, and fast, high-capacity SSD storage.
  • Operating system: Linux is the easiest to set up for this type of application. macOS is supported, while Windows requires additional steps.
  • Auxiliary software: official drivers for your GPU, Don't o LM Studio to facilitate model execution and management, and eventually Docker for advanced web interfaces or API testing.
  • Stable internet connection: only necessary to download the models and components the first time.

Dedicate as much resources as possible to the installation and launch process: close unnecessary applications and free up memory before launching GPT-OSS.

Installing Ollama on Linux: First tools for managing GPT-OSS

Ollama has become the go-to platform for easily running language models locally. It is free, open source, and simplifies the download, management, and use of GPT-OSS and other LLMs (Large Language Models).

Installing it is very simple:

  1. Go to the web ollama.com and download the specific installer for Linux.
  2. Open a terminal and run:
    curl -fsSL https://ollama.com/install.sh | sh
  3. Test the installation by running (it should return the installed version number):
    ollama --version
  4. Start the Ollama server:
    ollama serve

With these steps, Ollama is ready to download and manage your favorite models.

In addition to the CLI, Ollama can be used with web interfaces like Open WebUI or via APIs, making it a very versatile tool for both technical users and those who prefer a graphical environment.

Downloading and installing GPT-OSS models

The next critical step is to download the GPT-OSS model that best suits your equipment. Both models are available from Hugging Face and can be easily imported with Ollama.

  1. Choose the model you'll be using. The most common is gpt-oss-20b unless you have professional hardware.
  2. In the terminal, run (this will download and install the version optimized for your environment):
    ollama pull gpt-oss-20b

The download may be large (from 12 to 50 GB) and will take time depending on your connection. Do not close the terminal or suspend your device during the process.

When you're done, you can list the available models with ollama list.

Running and using GPT-OSS from the terminal

Ollama provides several ways to interact with models: by command line, through API calls, or by integrating it into your own applications.

  • Interactive session: run ollama run gpt-oss-20b and start chatting directly from the terminal.
  • Direct inquiries: To receive quick responses without a session, you can launch:
    ollama run gpt-oss-20b "What is Linux and why is it important for AI?"
  • Adjust behavior: Modify parameters such as temperature and top-p to control the creativity and diversity of responses, for example:
    ollama run gpt-oss-20b --temperature 0.2 --top-p 0.9 "Explain what supervised learning is."

The model will respond in real time, although the speed will depend on the power of your hardware. On computers without a GPU, performance can be much slower, especially with large models. Don't be alarmed if it takes several seconds or minutes to respond initially, especially on low-resource computers.

Advanced Customization: Modelfiles in Ollama

One of Ollama's strengths is the ability to create custom models using so-called ModelfilesThis allows you to tailor GPT-OSS to specific tasks (e.g., being a Python-savvy assistant, writing journalistic texts, etc.)

  1. Create a file called Modelfile in an empty folder.
  2. Specifies the base model and custom parameters, for example:
    FROM gpt-oss-20b SYSTEM "You are an expert data science assistant. Answer clearly and briefly." PARAMETER temperature 0.4 PARAMETER num_ctx 4096
  3. In the same folder, run:
    ollama create assistant-data -f Modelfile
  4. Start the custom model with:
    ollama run data assistant

This method allows you to quickly adapt the model's behavior without having to retrain or modify its internal parameters.

Integrating GPT-OSS into your applications: Using the Ollama API

Ollama exposes a local API, compatible with the OpenAI format, so you can integrate GPT-OSS into your applications or workflows.

  • The main endpoint is http://localhost:11434You can make POST requests to the endpoints /api/generate y /api/chat with JSONs similar to those of OpenAI.
  • Example in terminal:
    curl http://localhost:11434/api/generate -H «Content-Type: application/json» -d '{«model»: «gpt-oss-20b», «prompt»: «Develop a Python function to sort numbers»}'
  • For use in Python you can use the openai library by pointing to the local endpoint:
from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama") response = client.chat.completions.create( model="gpt-oss-20b", messages=[{"role": "user", "content": "What is machine learning?"}] ) print(response.choices[0].message.content)

This way, you can reuse scripts or integrations created for the OpenAI API without significant changes.

Other tools to run GPT-OSS: LM Studio and Open WebUI

In addition to Ollama, there are other platforms that allow you to manage and interact with GPT-OSS models locally. These include: LM Studio It stands out for its ease of use and its visual approach.

Download LM Studio from its official website , install it, and open it. The application will guide you through a simple setup wizard, where you can choose the model most compatible with your hardware. If your system is limited, it will suggest lighter alternatives, although you can always force the installation of GPT-OSS 20b.

To install the model:

  • Opens LM Studio and leave the app running.
  • In your browser, search for the GPT-OSS model on Hugging Face or the official website and select the “Use Model in LM Studio” option.
  • Confirm the opening from your browser and click "Download." The process may take longer than expected due to the size of the model (approximately 12 GB for the small version only).
  • Once the download is complete, the “Use in new chat” option will appear to begin interacting with the model from the LM Studio interface.

What if you have less than 16GB of RAM? You'll be able to run the model, but the experience will be much slower. The more resources you dedicate, the better the fluidity and speed.

Troubleshooting and optimization

Like all advanced software, complications can arise when running GPT-OSS locally. Here are the most common problems and how to solve them:

  • Out of memory failures: gpt-oss-120b won't boot if you don't have an 80GB (or larger) GPU. Use gpt-oss-20b or adjust your system resources.
  • Model not downloaded: If Ollama gives an error, check with ollama list that you have downloaded the desired model.
  • The API doesn't seem to be working: make sure Ollama is running (command ollama serve) and that port 11434 is not busy.
  • Extreme slowness: This occurs when running large models without a GPU or with low RAM. Close applications, reduce the context size, and try shorter prompts.
  • Problems with drivers: Make sure your NVIDIA or AMD drivers are properly installed to take advantage of hardware acceleration.

If you have any serious questions, consult the official repository for the tool you're using or specialized forums like Hugging Face.

Debugging and advanced work with Apidog and Open WebUI

For those developing applications or experimenting with complex prompts, tools like Apidog are essential. They allow you to view streaming responses from the Ollama API, analyze the model's reasoning, and identify potential errors.

  • Install Apidog from its official website.
  • Create a request to the Ollama local API using the appropriate endpoint and enable the streaming option.
  • Apidog displays each token received in real time, making it easy to debug and compare parameters such as temperature or context size.

You can also use Open WebUI (via Docker) for an advanced web interface, including chat history and document uploads for contextual responses.

docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:main

Access in your browser to http://localhost:3000 and select the desired model to chat comfortably.

Advanced Terminal Integration: Python Example

If you want to take it a step further and integrate GPT-OSS or ChatGPT into scripts, Linux makes it easy using Python and the OpenAI API pointing to the Ollama backend.

  1. Make sure you have Python 3 and pip installed:
  2. Install the main dependencies:
    pip3 install openai requests
  3. Export the local API key in your terminal (may not be necessary using Ollama, but is left in for compatibility):
    export OPENAI_API_KEY=llama
  4. Create a script like the following:
import openai openai.api_base = "http://localhost:11434/v1" openai.api_key = "ollama" prompt = input("Enter your question: ") response = openai.ChatCompletion.create( model="gpt-oss-20b", messages=[{"role": "user", "content": prompt}] ) print(response['choices'][0]['message']['content'])

This way, you can create a custom chatbot in your terminal and take advantage of GPT-OSS for any task you need on Linux.

Opting for GPT-OSS and Linux as your local AI platform provides maximum customization, privacy and cost savingsBy installing the appropriate templates, choosing the management tool that best suits your needs (Ollama, LM Studio, Open WebUI), and fine-tuning the configuration to your hardware, you can enjoy a data center-level experience from the comfort of your desktop, maintaining complete control over your data and processes. If you want to experiment, develop, or simply learn how LLMs work on-premises, this is your best opportunity.