Gemma 3: Google DeepMind’s New Multimodal Open Model

2025-03-12
ℹ️Note on the source

This blog post was automatically generated (and translated). It is based on the following original, which I selected for publication on this blog:
Gemma 3 Technical Report.

Gemma 3: A Leap Forward in Open Language Models

Google DeepMind has unveiled Gemma 3, the latest iteration in its family of lightweight open models. This new version brings significant advancements in multimodality, language support, and context handling, pushing the boundaries of what's possible on standard consumer-grade hardware.

Key Features and Improvements

Gemma 3 introduces several enhancements, including:

  • Multimodality: Most Gemma 3 models are compatible with a tailored version of the SigLIP vision encoder, enabling them to process and understand images.
  • Long Context: The models now support a context length of 128K tokens (32k for the 1B model), allowing for more coherent and contextually aware responses.
  • Multilingualism: Improvements to the data mixture have enhanced the models' multilingual capabilities.

Architecture and Training

Gemma 3 maintains the decoder-only transformer architecture of its predecessors but incorporates key modifications:

  • Local/Global Layer Interleaving: To mitigate memory issues associated with long contexts, the architecture interleaves local sliding window self-attention layers with global self-attention layers.
  • Knowledge Distillation: All Gemma 3 models are trained using knowledge distillation techniques to improve performance.
  • Quantization Aware Training: Quantized versions of the models are provided in different formats (per-channel int4, per-block int4, and switched fp8) through Quantization Aware Training (QAT), optimizing them for efficient inference.

Instruction Tuning and Evaluation

Post-training efforts focused on enhancing mathematics, reasoning, and chat abilities, along with integrating long-context and image inputs. A novel post-training approach yields gains across various capabilities, making the instruction-tuned models powerful and versatile.

Evaluations on the LMSys Chatbot Arena place Gemma 3 27B IT among the top models, surpassing larger models in certain benchmarks. Standard benchmark results demonstrate performance improvements compared to previous Gemma versions and Gemini 1.5.

Safety and Responsibility

Responsibility, safety, and security remain paramount in Gemma's development. Enhanced safety processes are integrated throughout the development workflow, with a focus on mitigating risks and conducting robust evaluations for the new image-to-text capabilities.

Conclusion

Gemma 3 represents a significant step forward in open language models, offering enhanced capabilities and performance while maintaining compatibility with standard hardware. As these models continue to evolve, questions arise about the implications of increasingly capable open models and the importance of responsible development practices. How will these advancements shape the future of AI and its accessibility to a wider audience?


Comments are closed.