Local Inference
The tau2-bench function-calling score jumped from 6.6% (Gemma 3) to 86.4% (Gemma 4). That gap is why local agentic coding is now viable for the first time.
What you’ll find here
- Vaughan’s hands-on Codex CLI experiment — full setup recipes for Apple Silicon (llama.cpp) and NVIDIA Blackwell (Ollama), benchmark results, and the counterintuitive finding that first-pass reliability matters more than raw token speed
Featured in this category
Why it matters for regulated verticals
For Centurion (banking), public sector, and any Harris vertical where data-egress is a compliance question, Apache 2.0 Gemma 4 running locally removes an entire class of compliance questions from your AI story. The hybrid workflow (`codex –profile local` for iteration + privacy-sensitive work; default cloud for complex) is a credible deployment pattern.