Gemma Chat Review: Open Source Local AI Coding Agent for Mac
Gemma Chat is an open-source Electron app that runs Google's Gemma 4 model locally on Apple Silicon — no internet, no API keys, just vibe coding from anywhere.
I've been working on a fitness tracking app called FitTrack lately — lots of React components, real-time dashboards, and auth pages. While I use GitHub Copilot daily, there are times I wish I had an AI assistant that didn't need an internet connection. No cloud latency, no data leaving my machine, no subscription. That's why I got excited when I discovered Gemma Chat: an open-source, local-first coding agent built for Apple Silicon that runs Google's Gemma 4 model entirely offline. It's not just a chatbot — it can write multi-file projects, show live previews, and even take voice commands.
What is Gemma Chat?
Gemma Chat is an open-source Electron application that brings Google's Gemma 4 model family to your Mac, running natively via Apple's MLX framework. Unlike cloud-based assistants, everything happens on your machine: code generation, chat, file operations, and even speech-to-text via in-browser Whisper. After an initial model download (~3 GB for the recommended variant), you can code from an airplane, a remote cabin, or anywhere without internet.
The app offers two modes: Build Mode, where you describe a project and it writes files (HTML, CSS, JS, multi-file) with a live preview canvas that updates as the model streams output; and Chat Mode, a conversational AI with tool use (web search, URL fetch, calculator, bash). You can hot-swap between four Gemma variants: the fast 1.5 GB E2B, the balanced 3 GB E4B, the powerful 8 GB 27B MoE, and the max-quality 18 GB 31B.
Why I starred it
As a developer building FitTrack, I often prototype UI ideas quickly — sometimes offline during commutes or at coffee shops with spotty Wi-Fi. I needed an AI that could generate a component scaffold or a simple dashboard layout without reaching for the cloud. Gemma Chat fits that niche perfectly. It's also a testament to how far open models have come: Gemma 4 can produce coherent multi-file projects with no outside help. I starred it because it shows that local AI is not just a gimmick — it's a practical tool for real development work, especially when privacy or connectivity is a concern.
How it works
Architecturally, Gemma Chat is an Electron + Vite + React app that bootstraps a Python virtual environment on first launch to run MLX-LM — the model server. The main process manages an agent loop: it streams tokens from the local MLX server, parses XML actions (like <write_file>, <bash>) from the model output, executes them in a sandboxed workspace, and feeds results back. Up to 40 rounds per user prompt. The preview server serves files from the workspace, refreshing every ~450ms to show updates in real-time.
Key concepts:
- Model Runtime: MLX-LM auto-installed into a local venv. You choose from four Gemma models.
- Agent Loop: XML-based tool protocol works better for smaller models than JSON function calling.
- Workspace: Per-conversation sandboxed filesystem with a local HTTP server for previews.
- Voice Input: Uses transformers.js with in-browser Whisper for local speech-to-text.
- Tool Use: In Chat Mode, the model can fetch URLs, search web, calculate, and run bash.
Quick start
You'll need macOS on Apple Silicon (M1 or later), Python 3.10–3.13, and Node 20+. Install Python via Homebrew if needed: brew install python@3.13. Then clone and run:
git clone https://github.com/ammaarreshi/gemma-chat.git
cd gemma-chat
npm install
npm run dev
On first launch, the app creates a Python venv, installs MLX-LM, downloads the recommended Gemma 4 E4B model (~3 GB), and you're ready. To build a distributable .dmg, run npm run dist.
Real-world example
Let's build a simple personal homepage with a dark theme and a profile card. In Build Mode, type this prompt:
"Create a single-page personal website with a dark gradient background, a circular profile photo placeholder, your name, a short bio, and three social media icon links. Make it responsive and use smooth CSS transitions."
The model will stream back XML actions:
<action name="write_file">
<path>index.html</path>
<content>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Personal Homepage</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
min-height: 100vh;
background: linear-gradient(135deg, #0f0c29, #302b63, #24243e);
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
display: flex;
justify-content: center;
align-items: center;
color: #fff;
}
.card {
background: rgba(255,255,255,0.1);
backdrop-filter: blur(10px);
border-radius: 20px;
padding: 40px;
text-align: center;
max-width: 400px;
width: 90%;
transition: transform 0.3s, box-shadow 0.3s;
}
.card:hover {
transform: translateY(-10px);
box-shadow: 0 20px 40px rgba(0,0,0,0.4);
}
.avatar {
width: 120px;
height: 120px;
border-radius: 50%;
background: #ddd;
margin: 0 auto 20px;
display: flex;
align-items: center;
justify-content: center;
font-size: 60px;
color: #555;
}
h1 { margin-bottom: 10px; font-size: 2rem; }
.bio { margin-bottom: 20px; opacity: 0.8; }
.social a {
display: inline-block;
margin: 0 10px;
color: #fff;
text-decoration: none;
font-size: 24px;
transition: color 0.3s;
}
.social a:hover { color: #a0a0ff; }
</style>
</head>
<body>
<div class="card">
<div class="avatar">👤</div>
<h1>Your Name</h1>
<p class="bio">Full-stack developer passionate about building cool things. Available for freelance projects.</p>
<div class="social">
<a href="#" title="GitHub">💻</a>
<a href="#" title="Twitter">💬</a>
<a href="#" title="LinkedIn">💼</a>
</div>
</div>
</body>
</html>
</content>
</action>
The preview updates live as the file is written. You can then ask for changes: "Make the avatar a real image placeholder" and the model will edit the file accordingly.
Pros and cons
| Pros | Cons |
|---|---|
| Fully offline after model download — great for privacy and travel | Only works on Apple Silicon (no Intel, no Windows/Linux) |
| Free and open-source (MIT) | Model download is ~3 GB; larger models need 16+ GB RAM |
| Build Mode with live preview speeds up prototyping | Small open models sometimes produce inconsistent code |
| Voice input via local Whisper | Setup requires Python and Node (but auto-provisions) |
| Multiple model sizes to fit hardware | Still a proof-of-concept — occasional bugs (11 open issues) |
Alternatives
- Ollama: Run any GGUF model locally with a REST API, supports many hardware platforms. Great for general LLM use, but no built-in coding agent UI.
- Continue.dev: Open-source AI code assistant that integrates with VS Code and JetBrains. Supports local models via Ollama or LM Studio, but requires an editor extension.
- LM Studio: Desktop app to download and run local models (via llama.cpp). Offers a chat interface and API server, but no project-based coding agent or live preview.
My verdict — should you use it?
If you're on an Apple Silicon Mac and care about privacy or offline capabilities, Gemma Chat is a compelling tool for quick prototypes and learning. It's not yet ready to replace cloud assistants for complex production code, but it's a fantastic proof-of-concept that showcases the power of local AI. Skip it if you need cross-platform support or require the reliability of larger models. For offline vibe coding sessions, it's already a joy to use.