Local Inference

The tau2-bench function-calling score jumped from 6.6% (Gemma 3) to 86.4% (Gemma 4). That gap is why local agentic coding is now viable for the first time.

What you’ll find here

  • Vaughan’s hands-on Codex CLI experiment — full setup recipes for Apple Silicon (llama.cpp) and NVIDIA Blackwell (Ollama), benchmark results, and the counterintuitive finding that first-pass reliability matters more than raw token speed
  1. I ran Gemma 4 as a local model in Codex CLI

Why it matters for regulated verticals

For Centurion (banking), public sector, and any Harris vertical where data-egress is a compliance question, Apache 2.0 Gemma 4 running locally removes an entire class of compliance questions from your AI story. The hybrid workflow (`codex –profile local` for iteration + privacy-sensitive work; default cloud for complex) is a credible deployment pattern.


Table of contents


Built by Force Information Systems · Harris Computer · Constellation Software. Licensed under MIT.