Free LLM Models Keep Changing. I Automated It
Free LLM model IDs churn weekly. Instead of maintaining a list, freelm discovers them live from each provider's /models endpoint, caches and self-corrects.
Free LLM model IDs change constantly — providers add and retire :free models almost weekly. Hardcoding them breaks within days, so in freelm I made the model list self-updating: it queries each provider's /models endpoint at runtime, tags and caches the result, and falls back to a built-in list only when offline.
How I learned this the hard way
My first release shipped a hardcoded list of OpenRouter free models. A day later, two of them returned 404 — they'd been renamed or pulled. Free models are the most volatile part of the whole ecosystem because providers rotate which ones are free to manage cost. A static list is wrong almost as soon as you publish it.
The fix: discover, don't hardcode
freelm now treats the hardcoded list as a fallback, not a source of truth. On first use it calls the provider's OpenAI-compatible /models endpoint, gets the current models, derives capability tags, and uses that. The resolution order is live API → disk cache → hardcoded fallback, so a default call always lands on a model that exists right now.
from freelm import list_free_models
for m in list_free_models()[:5]: # current free models, fetched live
print(m.id, m.tags)
Caching so it's not slow
Hitting /models on every call would be wasteful, so freelm caches the discovered list to disk with a TTL (default one hour, configurable). The first call fetches; subsequent calls read the cache until it expires. If the network is down, the cache or the hardcoded fallback keeps things working. The cache lives under ~/.cache/freelm with restrictive file permissions.
Tagging models from messy metadata
Provider /models responses are inconsistent — some include capability metadata, many just list IDs. So freelm derives tags from whatever it has: size from the parameter count in the name, plus tools, vision, and reasoning from metadata or name hints. That lets auto deprioritize slow giant and reasoning models and lead with a fast plain instruct model, even for providers whose /models carries no metadata at all.
Filtering out the non-chat models
Discovery surfaced a subtle bug: some providers list audio, embedding, and image-generation models in the same /models response, with no modality flag. Early on, freelm offered a text-to-speech model as a chat model. I added a name-based filter that drops whisper, TTS, embedding, rerank, and image-gen entries, so only real chat models enter the pool.
Why this matters beyond freelm
Any multi-provider LLM tool faces the same churn. The lesson generalizes: don't encode volatile external state in your source; fetch it, cache it, and degrade gracefully. The model list is the obvious case, but the same pattern applies to rate limits and pricing — anything a provider can change without telling you.
Frequently Asked Questions
How often do free LLM models change?
Frequently — new :free models appear and old ones get retired or renamed weekly on aggregators like OpenRouter. A static list goes stale fast.
Does live discovery slow down my calls?
No — the list is fetched once and cached to disk with a TTL. Only the first call (or a cache refresh) hits the /models endpoint.
What if a provider's API is unreachable? freelm falls back to the disk cache, then to a built-in list, so calls still work offline or during a provider outage.
Can I force a refresh?
Yes — llm.refresh_models() re-fetches on the next call, and list_free_models(refresh=True) pulls a fresh list immediately.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. The self-updating model registry is part of freelm — source on GitHub.