Google: Gemini 3 Flash Preview
Gemini 3 Flash Preview is Google's multimodal model accepting text, images, files, audio, and video as input. Its context window reaches 1,048,576 tokens, which accommodates long documents, extended conversations, or large media files without truncation. The model supports tool use and reasoning, giving it footing in agentic and multi-step workflows. Structured output support is unconfirmed, so developers who depend on reliable JSON schema enforcement should verify that before committing. At $0.50 per million input tokens and $3.00 per million output tokens, it sits in the budget-to-mid range for multimodal models. Its blended benchmark score of 44.7, drawn from a single benchmark, is too narrow to draw firm conclusions about overall capability, so treat that figure as provisional. Teams that need broad input modality coverage at a relatively low input cost are the natural audience, though the thin benchmark coverage means real-world testing should carry more weight than the score alone.
- Model ID
- google/gemini-3-flash-preview
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- text, image, file, audio, video
- Output Modalities
- text
- Max Output
- 65,535 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- ✓ accepts audio
- Moderated
- no
Strong choice for
Category rankings
Where Google: Gemini 3 Flash Preview places across the 6 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #5 | TranscriptionVoice · of 19 ranked | 123 |
| #6 | Audio SummarizationVoice · of 19 ranked | 145 |
| #10 | TTS ReplacementVoice · of 19 ranked | 115 |
| #14 | Video SummarizationVideo · of 25 ranked | 145 |
| #19 | Code CompletionCode · of 25 ranked | 132 |
| #19 | Image CaptioningVision · of 25 ranked | 120 |