google

Google: Nano Banana 2 (Gemini 3.1 Flash Image)

Google's Nano Banana 2 is a multimodal model from Google that accepts both text and image inputs and returns up to 32,768 tokens of output within a 131,072-token context window. It supports reasoning but does not support tool use, and structured output availability is unconfirmed. These constraints make it a narrower fit than general-purpose alternatives for agentic or pipeline-heavy workflows. At $0.50 per million input tokens and $3.00 per million output tokens, the pricing sits in a competitive range for image-capable models. The catch is that there is currently no independent benchmark coverage, so there is no external data to validate its reasoning quality or accuracy against peers. Buyers comfortable piloting an unproven model, particularly those with image-plus-text workloads and moderate output volume, may find the input price attractive, but teams that need benchmark evidence before committing should wait for independent evaluations.

Quality Score
84/100
price + capability + benchmarks
Input Price
$0.50
per 1M tokens
Output Price
$3.00
per 1M tokens
Context Window
131,072
tokens
Model ID
google/gemini-3.1-flash-image
Vendor
google
Tokenizer
Gemini
Input Modalities
image, text
Output Modalities
image, text
Max Output
32,768 tokens
Tool Calling
not supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Strong choice for

Category rankings

Where Google: Nano Banana 2 (Gemini 3.1 Flash Image) places across the 1 category it ranks in. How we rank →

#CategoryScore
#5 Image GenerationVision · of 8 ranked 99

Similar models