google

Google: Gemma 3 4B

Gemma 3 4B is a text-and-image input model from Google with a 131,072-token context window and a 16,384-token output ceiling. It does not support tool use, reasoning modes, or structured output, so workflows that depend on function calling or guaranteed response schemas will need a different option. At $0.05 per million input tokens and $0.10 per million output tokens, it sits at the budget end of the market. Its blended benchmark score of 20.3 comes from only two benchmarks, which is thin coverage, so treating that figure as a reliable signal of general capability would be premature. Buyers who need a low-cost multimodal model for straightforward text and image tasks, and who can tolerate limited third-party validation, may find it worth testing. Teams with stricter performance requirements or tool-dependent pipelines should compare it carefully against better-documented alternatives before committing.

Query via API → View on google → Estimate cost

Quality Score

81/100

price + capability + benchmarks

Input Price

$0.05

per 1M tokens

Output Price

$0.10

per 1M tokens

Context Window

131,072

tokens

Model ID: google/gemma-3-4b-it
Vendor: google
Tokenizer: Gemini
Input Modalities: text, image
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: not supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Similar models

google

Google: Gemma 3 4B

Similar models

Google: Nano Banana Pro (Gemini 3 Pro Image)

Google: Nano Banana 2 (Gemini 3.1 Flash Image)

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Google: Lyria 3 Pro Preview

Google: Lyria 3 Clip Preview

Google: Gemma 3 12B