Gemma 4
Frontier-Level Intelligence. Open Weights.
Google DeepMind's most capable open model family. From edge devices to enterprise servers — trimodal reasoning, agentic tool use, and 256K context in a single architecture.
One Family. Every Scale.
From a Raspberry Pi to an H100 cluster — Gemma 4 variants cover the full spectrum of deployment needs.
E2B / E4B
2.3B – 4.5B effective
Native text, image & audio on mobile and IoT. 1.5GB memory with 2-bit quantization. 3x faster than previous generation.
- Android / iOS via AICore
- Raspberry Pi 5 compatible
- 140-language ASR
- 60% less battery usage
26B A4B
3.8B active / 26B total (MoE)
128-expert MoE architecture with 8 active per token. Knowledge of a 26B model at the speed of a 4B. The efficiency paradox.
- 128 experts, 8 active/token
- 88.3% AIME accuracy
- Runs on RTX 4090 (24GB)
- 4B-speed, 26B-knowledge
31B Dense
30.7B parameters
The flagship. Outperforms models 20x its size in agentic simulations. Step-by-step "Thinking Mode" for deterministic logic.
- 89.2% AIME 2026
- 84.3% GPQA Diamond
- Arena ELO ~1452
- 256K context window
A Generational Leap in Reasoning
Gemma 4 31B doesn't just improve on its predecessor — it redefines what's possible at this parameter count. The jump from 20.8% to 89.2% on AIME represents one of the largest single-generation improvements in AI history.
Gemma 4 31B vs Gemma 3 27B
How Gemma 4 Stacks Up
Head-to-head against the top open model families of 2026.
| Feature | Gemma 4 31BGoogle DeepMind | Qwen 3.5 27BAlibaba | Llama 4 ScoutMeta |
|---|---|---|---|
| License | Apache 2.0 | Apache 2.0 | Custom Open |
| Reasoning (GPQA) | 84.3% | ~72.0% | 74.3% |
| Mathematics (AIME) | 89.2% | 48.7% | N/A |
| Coding (LCB v6) | 80.0% | 43.0% | N/A |
| Context Window | 256K | 262K | 10M |
| Native Modalities | Text, Image, Video | Text, Image | Text |
| Edge Variants | E2B, E4B | — | — |
| Agentic (τ2-bench) | 76.9% | — | — |
Data sourced from official publications and Arena AI leaderboard as of April 2026. "—" indicates data not publicly available.
Deploy Anywhere
Match your hardware to the right variant. From IoT to cloud — no compromise on intelligence.
Raspberry Pi 5
E2B (Quantized)
Best for IoT and robotics prototyping
Android / iOS
E2B / E4B
Production mobile AI with Gemini Nano forward compat
MacBook M-series
E4B / 26B
Unified memory advantage for larger variants
PC (RTX 3060)
E4B
Entry-level desktop GPU deployment
Workstation (4090)
26B / 31B
4-bit 31B fits on a single consumer GPU
Cloud (H100)
31B (Full)
Enterprise serving with 70% TTFT reduction
Go Deeper
Everything you need to understand, evaluate, and deploy Gemma 4.
Architecture Deep Dive
Hybrid attention, Dual RoPE, Per-Layer Embeddings, and the MoE efficiency paradox. Understand every layer.
Performance & Benchmarks
Comprehensive analysis of MMLU Pro, AIME, GPQA, Arena AI ratings, and competitive coding results.
Deployment Guides
Step-by-step setup for Ollama, vLLM, LiteRT-LM. Hardware-specific guides from Raspberry Pi to H100.
Agentic Workflows
Function calling, ADK integration, autonomous coding agents, and the Claude Code proxy setup.
Start Building with Gemma 4
400 million downloads. 100,000+ fine-tuned variants. The Gemmaverse is the fastest-growing open model ecosystem. Join it today.