_SH Log's
Back to Root
EST: 3 min read

Free LLM Models Keep Changing. I Automated It

Free LLM model IDs churn weekly. Instead of maintaining a list, freelm discovers them live from each provider's /models endpoint, caches and self-corrects.

#free-llm#llm#python#open-source#build-in-public

Free LLM model IDs change constantly — providers add and retire :free models almost weekly. Hardcoding them breaks within days, so in freelm I made the model list self-updating: it queries each provider's /models endpoint at runtime, tags and caches the result, and falls back to a built-in list only when offline.

How I learned this the hard way

My first release shipped a hardcoded list of OpenRouter free models. A day later, two of them returned 404 — they'd been renamed or pulled. Free models are the most volatile part of the whole ecosystem because providers rotate which ones are free to manage cost. A static list is wrong almost as soon as you publish it.

The fix: discover, don't hardcode

freelm now treats the hardcoded list as a fallback, not a source of truth. On first use it calls the provider's OpenAI-compatible /models endpoint, gets the current models, derives capability tags, and uses that. The resolution order is live API → disk cache → hardcoded fallback, so a default call always lands on a model that exists right now.

from freelm import list_free_models

for m in list_free_models()[:5]:   # current free models, fetched live
    print(m.id, m.tags)

Caching so it's not slow

Hitting /models on every call would be wasteful, so freelm caches the discovered list to disk with a TTL (default one hour, configurable). The first call fetches; subsequent calls read the cache until it expires. If the network is down, the cache or the hardcoded fallback keeps things working. The cache lives under ~/.cache/freelm with restrictive file permissions.

Tagging models from messy metadata

Provider /models responses are inconsistent — some include capability metadata, many just list IDs. So freelm derives tags from whatever it has: size from the parameter count in the name, plus tools, vision, and reasoning from metadata or name hints. That lets auto deprioritize slow giant and reasoning models and lead with a fast plain instruct model, even for providers whose /models carries no metadata at all.

Filtering out the non-chat models

Discovery surfaced a subtle bug: some providers list audio, embedding, and image-generation models in the same /models response, with no modality flag. Early on, freelm offered a text-to-speech model as a chat model. I added a name-based filter that drops whisper, TTS, embedding, rerank, and image-gen entries, so only real chat models enter the pool.

Why this matters beyond freelm

Any multi-provider LLM tool faces the same churn. The lesson generalizes: don't encode volatile external state in your source; fetch it, cache it, and degrade gracefully. The model list is the obvious case, but the same pattern applies to rate limits and pricing — anything a provider can change without telling you.

Frequently Asked Questions

How often do free LLM models change? Frequently — new :free models appear and old ones get retired or renamed weekly on aggregators like OpenRouter. A static list goes stale fast.

Does live discovery slow down my calls? No — the list is fetched once and cached to disk with a TTL. Only the first call (or a cache refresh) hits the /models endpoint.

What if a provider's API is unreachable? freelm falls back to the disk cache, then to a built-in list, so calls still work offline or during a provider outage.

Can I force a refresh? Yes — llm.refresh_models() re-fetches on the next call, and list_free_models(refresh=True) pulls a fresh list immediately.


Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. The self-updating model registry is part of freelm — source on GitHub.