Google: Gemini 3 Flash Preview
Gemini 3 Flash Preview is Google's multimodal model, accepting text, image, file, audio, and video as inputs. Its context window reaches 1,048,576 tokens, which accommodates long documents, extended transcripts, and multi-turn sessions without truncation. The model supports tool use and reasoning, making it applicable to agentic workflows and multi-step problem solving. Structured output support is unconfirmed, so developers who depend on guaranteed JSON schemas should verify this before committing. At $0.50 per million input tokens and $3.00 per million output tokens, it sits in the budget-to-mid range on input cost while carrying a moderately higher output price. Its blended benchmark score of 44.7 comes from a single benchmark, so the performance picture is limited and should be treated with caution. Teams processing high-volume, mixed-media content on a cost-conscious budget have reason to shortlist it, but those prioritizing well-validated accuracy should wait for broader benchmark coverage before relying on it for critical applications.
- Model ID
- google/gemini-3-flash-preview
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- text, image, file, audio, video
- Output Modalities
- text
- Max Output
- 65,535 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- ✓ accepts audio
- Moderated
- no
Strong choice for
Category rankings
Where Google: Gemini 3 Flash Preview places across the 6 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #5 | TranscriptionVoice · of 19 ranked | 123 |
| #6 | Audio SummarizationVoice · of 19 ranked | 145 |
| #10 | TTS ReplacementVoice · of 19 ranked | 115 |
| #14 | Video SummarizationVideo · of 25 ranked | 145 |
| #18 | Code CompletionCode · of 25 ranked | 132 |
| #19 | Image CaptioningVision · of 25 ranked | 120 |