Released April 2, 2026 · Apache 2.0 License

Gemma 4

Frontier-Level Intelligence. Open Weights.

Google DeepMind's most capable open model family. From edge devices to enterprise servers — trimodal reasoning, agentic tool use, and 256K context in a single architecture.

89.2% AIME Math Accuracy
Apache 2.0 — Truly Open
Text + Image + Audio Native
89.2%
AIME 2026
Math Reasoning
84.3%
GPQA Diamond
Graduate Science
2150
Codeforces ELO
Expert Coding
256K
Context Window
Token Capacity

One Family. Every Scale.

From a Raspberry Pi to an H100 cluster — Gemma 4 variants cover the full spectrum of deployment needs.

On-Device
Edge Tier

E2B / E4B

2.3B – 4.5B effective

Native text, image & audio on mobile and IoT. 1.5GB memory with 2-bit quantization. 3x faster than previous generation.

  • Android / iOS via AICore
  • Raspberry Pi 5 compatible
  • 140-language ASR
  • 60% less battery usage
Best Value
Workstation Tier

26B A4B

3.8B active / 26B total (MoE)

128-expert MoE architecture with 8 active per token. Knowledge of a 26B model at the speed of a 4B. The efficiency paradox.

  • 128 experts, 8 active/token
  • 88.3% AIME accuracy
  • Runs on RTX 4090 (24GB)
  • 4B-speed, 26B-knowledge
Flagship
Frontier Tier

31B Dense

30.7B parameters

The flagship. Outperforms models 20x its size in agentic simulations. Step-by-step "Thinking Mode" for deterministic logic.

  • 89.2% AIME 2026
  • 84.3% GPQA Diamond
  • Arena ELO ~1452
  • 256K context window

A Generational Leap in Reasoning

Gemma 4 31B doesn't just improve on its predecessor — it redefines what's possible at this parameter count. The jump from 20.8% to 89.2% on AIME represents one of the largest single-generation improvements in AI history.

4.3x improvement in mathematical reasoning (AIME)
4.7x improvement in agentic tool use (Tau2)
2.7x improvement in live coding (LiveCodeBench)
Codeforces ELO 2150 — expert-level competitive programming

Gemma 4 31B vs Gemma 3 27B

Gemma 4 Gemma 3
MMLU Pro
67.6%85.2%
AIME 2026
20.8%89.2%
LiveCodeBench
29.1%80%
GPQA Diamond
42.4%84.3%
Tau2 (Agent)
16.2%76.9%

How Gemma 4 Stacks Up

Head-to-head against the top open model families of 2026.

FeatureGemma 4 31BGoogle DeepMindQwen 3.5 27BAlibabaLlama 4 ScoutMeta
LicenseApache 2.0Apache 2.0Custom Open
Reasoning (GPQA)84.3%~72.0%74.3%
Mathematics (AIME)89.2%48.7%N/A
Coding (LCB v6)80.0%43.0%N/A
Context Window256K262K10M
Native ModalitiesText, Image, VideoText, ImageText
Edge VariantsE2B, E4B
Agentic (τ2-bench)76.9%

Data sourced from official publications and Arena AI leaderboard as of April 2026. "—" indicates data not publicly available.

Deploy Anywhere

Match your hardware to the right variant. From IoT to cloud — no compromise on intelligence.

Raspberry Pi 5

E2B (Quantized)

Framework
LiteRT-LM
Memory
CPU-only
Speed
~10 t/s

Best for IoT and robotics prototyping

Android / iOS

E2B / E4B

Framework
AICore / ML Kit
Memory
NPU on flagship
Speed
31 t/s (NPU)

Production mobile AI with Gemini Nano forward compat

MacBook M-series

E4B / 26B

Framework
Ollama / Metal
Memory
Unified memory
Speed
30-50 t/s

Unified memory advantage for larger variants

PC (RTX 3060)

E4B

Framework
Ollama
Memory
8GB VRAM
Speed
20-35 t/s

Entry-level desktop GPU deployment

Workstation (4090)

26B / 31B

Framework
vLLM / Ollama
Memory
24GB VRAM
Speed
40-60 t/s

4-bit 31B fits on a single consumer GPU

Cloud (H100)

31B (Full)

Framework
vLLM / Vertex AI
Memory
80GB
Speed
100+ t/s

Enterprise serving with 70% TTFT reduction

Start Building with Gemma 4

400 million downloads. 100,000+ fine-tuned variants. The Gemmaverse is the fastest-growing open model ecosystem. Join it today.

Apache 2.0 Licensed
400M+ Downloads
100K+ Fine-tuned Variants
Google DeepMind