Gemma 4 is the latest generation of __open source__ models from Google DeepMind, derived from Gemini 3 research. The family includes pre-trained and instruction-tuned variants, with a context window up to __256K tokens__ and native support for over 140 languages. The models integrate a configurable __thinking mode__, multimodal image, video and audio capabilities, as well as native function calling that makes them perfect for AI agents.
What is Gemma 4?
Gemma 4 is a family of open source models published by Google DeepMind. It builds on advances from Gemini 3 research and distills them into open models, downloadable under the Apache 2.0 license. The family offers multiple sizes, from very compact models suited for edge and mobile deployments to more powerful models designed for servers. All models are available in both pre-trained and instruction-tuned versions, covering both R&D and operational applications. The presence of native function calling and a configurable thinking mode distinguishes Gemma 4 from most other open source families, clearly orienting it toward AI agents and complex workflows.
Key Features
Gemma 4 introduces several major advances. The architecture combines sliding window local attention layers with global attention layers, ensuring full coverage while optimizing inference costs. The context window reaches 128K tokens on small versions and 256K tokens on medium versions, allowing long documents or extended histories to be processed without truncation. The models natively handle text, images and videos, with excellent optical character recognition and good understanding of graphics. E2B and E4B versions add native audio input for speech recognition and understanding. The thinking mode, configurable, allows enabling explicit reasoning chains when the task justifies it, or generating the response directly for simple cases. Native function calling and system role support make Gemma 4 an ideal foundation for AI agents. Performance on code and agentic benchmarks shows clear improvement compared to Gemma 3.
Use Cases
Gemma 4 covers a wide range of scenarios. Developers targeting edge deployments use it in mobile applications, browser extensions or embedded devices, thanks to 2B and 4B versions compatible with LiteRT-LM or Cactus. AI teams build internal agents capable of reasoning and executing tools, leveraging native function calling. Regulated enterprises deploy larger versions locally to meet sovereignty and auditability requirements. Researchers use it as an experimentation foundation for multilingual, long reasoning or hybrid architectures. Finally, SaaS editors integrate it into their products to offer a cost-efficient alternative to proprietary models.
Advantages
The main benefit of Gemma 4 is the combination of quality, openness and flexibility. Quality is illustrated by proximity to the best proprietary models on reference benchmarks. Openness, guaranteed by the Apache 2.0 license, authorizes fine-tuning, auditing and deployment in any environment, including the most regulated. Flexibility comes from family diversity: the same technology base ranges from mobile to GPU cluster, simplifying architectural consistency within an organization. The ecosystem support is exceptional, with day-one integrations at Hugging Face, Ollama, vLLM, llama.cpp, MLX, NVIDIA NIM and many others, guaranteeing near-universal portability.
Pricing
Gemma 4 is free to download, under an Apache 2.0 license that permits unrestricted commercial use. Practical costs are only at the inference infrastructure level: GPUs for on-prem or usage-based pricing via cloud providers like Google Cloud, Hugging Face Inference, Baseten or Replicate. This absence of license costs represents a significant economic advantage compared to proprietary models, particularly for high-volume usage.
Conclusion
Gemma 4 illustrates the central place taken by open source in Google DeepMind’s strategy. The new family brings a rare combination of total openness, reference quality and exceptional use case coverage. For AI teams building agents, assistants or advanced reasoning products, it’s probably the most interesting open source foundation available in 2026.