A Free, Drop-in OpenAI Replacement in Python

You can replace the OpenAI SDK with a free, drop-in alternative by changing one import: from freelm.compat import OpenAI. Your existing client.chat.completions.create(...) code runs unchanged, but requests go to six free-tier providers with automatic failover instead of a paid OpenAI account.

The one-line swap

freelm ships an OpenAI-compatible shim. Take any OpenAI SDK code and change the import line:

# before
# from openai import OpenAI

# after
from freelm.compat import OpenAI

client = OpenAI()
r = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "hi"}],
)
print(r.choices[0].message.content)

The response object has the same shape — choices[0].message.content — so downstream code that reads OpenAI responses keeps working.

Why this works

OpenAI's chat-completions format became the de facto standard, and every provider freelm supports — OpenRouter, Gemini's compat endpoint, Groq, Cerebras, NIM, Mistral — speaks it. So the shim only has to translate the client surface you call; the wire format is already shared. That's why a one-import swap is enough rather than a rewrite.

What you get that the real SDK doesn't

The OpenAI SDK talks to one provider. The freelm shim is backed by the full FreeLLM router, so the same create() call gets key rotation, cross-provider failover, rate-limit pacing, and live model discovery for free. If one provider is down or throttled, your call still returns — no code change.

# point it at a configured router for full control
from freelm import FreeLLM
from freelm.compat import OpenAI

client = OpenAI(FreeLLM.from_env(strategy="quota_aware"))

Picking a model

With one provider you name an exact model. Across six, exact names differ, so use a virtual model and let freelm resolve it per provider: model="auto" for the best available free model, or "chat:fast" / "chat:large" to bias speed or capability. You can still pass a concrete model ID if you want a specific one.

When the shim is the right choice

Use the shim when you have existing OpenAI code and want it free with minimal changes. Use the native FreeLLM API when you're writing fresh code and want direct access to strategies, timeouts, streaming, and health(). They're the same engine underneath, so you can start with the shim and drop to the native API where you need control.

Frequently Asked Questions

Is it really a drop-in replacement? For the common chat.completions.create path, yes — change the import and your code runs. Exotic OpenAI-only features aren't all mirrored, but standard chat usage is.

Is it free? freelm is MIT-licensed and free; it runs on providers' free tiers. There's no OpenAI key or subscription required — just the free provider keys you supply.

Does streaming work through the shim? Use the native FreeLLM.stream() / AsyncFreeLLM.astream() for streaming with failover. The shim covers the standard non-streaming create call.

Can I keep using OpenAI as one of the providers? freelm focuses on free providers, but you control the provider list. The shim's value is running existing OpenAI-shaped code on free tiers.

What Python versions are supported? Python 3.9 through 3.14, tested in CI.

Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. The drop-in shim is part of freelm — pip install freelm, source on GitHub.