My $0 AI Stack: Free LLMs for Every Project
The $0 AI stack I use across side projects: six free-tier LLM providers pooled behind freelm, so prototypes run on free capacity with automatic failover.
My $0 AI stack is six free-tier LLM providers — Gemini, Groq, Cerebras, OpenRouter, NVIDIA NIM, and Mistral — pooled behind one client so every side project gets LLM access without a bill. I route across all of them with freelm, which fails over automatically when any one rate-limits. Here's the exact setup.
The principle: spend $0 until something earns
I start a lot of projects. Most won't make money, so paying per token from day one is backwards. Free tiers are plenty for prototypes, internal tools, and low-traffic apps — the only problem is that any single free tier is small and unreliable. The fix isn't a bigger tier; it's more tiers, combined, with failover. That's the whole stack.
The six providers and what each is for
Each provider earns its slot by being good at something:
| Provider | Free tier | I use it for | |---|---|---| | Google AI Studio | ~1,500 req/day, 1M ctx | Long context, default | | Groq | 30 RPM, 14,400/day | Fast short responses | | Cerebras | ~1M tokens/day | High token volume | | OpenRouter | ~50–1,000/day | Model variety | | NVIDIA NIM | build credits | Models others lack | | Mistral | 1B tokens/month | Overflow lane |
Their limits are separate counters, so together they're far more than any one.
One config, every project
I keep provider keys in a .env file and never commit it. Every project then starts the same way:
import freelm
llm = freelm.FreeLLM.from_env(strategy="quota_aware")
print(llm.text("summarize this changelog: ..."))
quota_aware sends each call to whichever provider has the most headroom right now and skips any key that's cooling down. I don't think about which provider — I just call text() and freelm picks.
It has to survive rate limits
A free stack is useless if it dies the moment one tier throttles. freelm rotates keys on a 429, trips a circuit breaker on repeated failures, and fails over across providers — so a prototype demo doesn't break mid-sentence. For batch jobs I add wait=True so it pauses for a cooling key instead of erroring:
llm = freelm.FreeLLM.from_env(strategy="quota_aware", wait=True, max_wait=20)
Drop-in for code that already uses OpenAI
Half my older projects import the OpenAI SDK. I don't rewrite them — I swap one import and they run on the free stack:
from freelm.compat import OpenAI
client = OpenAI()
When I actually pay
I move a project to a paid model only once it has users or revenue, and even then I keep the free stack as the fallback layer. Paid primary for reliability, free pool for overflow and cost control. Until then, the whole AI bill is $0.
Frequently Asked Questions
Is a free AI stack enough for real apps? For prototypes, internal tools, and low-traffic features, yes. For anything high-volume, use free tiers as a fallback behind a paid primary.
How do I manage six sets of keys?
Keep them in a gitignored .env; freelm reads each provider's standard env var automatically. Supply only the ones you have.
What if I only have one or two keys? freelm works with a single provider too — you still get key rotation and retries. Add more providers later for failover.
Does this break the providers' terms? No — each is normal single-account free usage. Stacking just means routing across providers you legitimately signed up for.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. The pooling client is freelm — pip install freelm, source on GitHub.