google

Google: Gemini 3 Flash Preview

Gemini 3 Flash Preview is Google's multimodal model, accepting text, image, file, audio, and video as inputs. Its context window reaches 1,048,576 tokens, which accommodates long documents, extended transcripts, and multi-turn sessions without truncation. The model supports tool use and reasoning, making it applicable to agentic workflows and multi-step problem solving. Structured output support is unconfirmed, so developers who depend on guaranteed JSON schemas should verify this before committing. At $0.50 per million input tokens and $3.00 per million output tokens, it sits in the budget-to-mid range on input cost while carrying a moderately higher output price. Its blended benchmark score of 44.7 comes from a single benchmark, so the performance picture is limited and should be treated with caution. Teams processing high-volume, mixed-media content on a cost-conscious budget have reason to shortlist it, but those prioritizing well-validated accuracy should wait for broader benchmark coverage before relying on it for critical applications.

Quality Score
100/100
price + capability + benchmarks
Input Price
$0.50
per 1M tokens
Output Price
$3.00
per 1M tokens
Context Window
1,048,576
tokens
Model ID
google/gemini-3-flash-preview
Vendor
google
Tokenizer
Gemini
Input Modalities
text, image, file, audio, video
Output Modalities
text
Max Output
65,535 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
✓ accepts audio
Moderated
no

Strong choice for

Category rankings

Where Google: Gemini 3 Flash Preview places across the 6 categories it ranks in. How we rank →

#CategoryScore
#5 TranscriptionVoice · of 19 ranked 123
#6 Audio SummarizationVoice · of 19 ranked 145
#10 TTS ReplacementVoice · of 19 ranked 115
#14 Video SummarizationVideo · of 25 ranked 145
#18 Code CompletionCode · of 25 ranked 132
#19 Image CaptioningVision · of 25 ranked 120

Similar models