google

Google: Gemini 2.5 Pro

Gemini 2.5 Pro is Google's multimodal model accepting text, images, files, audio, and video as input. Its context window reaches 1,048,576 tokens, making it practical for tasks that require processing large documents or long conversation histories in a single pass. The model supports tool use and reasoning, which extends its usefulness to agentic workflows and multi-step problem solving. Structured output support is not confirmed in available specifications. At $1.25 per million input tokens and $10.00 per million output tokens, it sits at a mid-range price for frontier models, with output costs that will matter for high-volume generation tasks. Its blended benchmark score of 94.2 is based on one independent benchmark, so treat that figure as directional rather than comprehensive. Teams that need broad modality coverage, a very large context window, and reasoning support will find it worth evaluating, particularly if those capabilities offset the output cost for their use case.

Quality Score
100/100
price + capability + benchmarks
Input Price
$1.25
per 1M tokens
Output Price
$10.00
per 1M tokens
Context Window
1,048,576
tokens
Model ID
google/gemini-2.5-pro
Vendor
google
Tokenizer
Gemini
Input Modalities
text, image, file, audio, video
Output Modalities
text
Max Output
65,536 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
✓ accepts audio
Moderated
no

Strong choice for

Category rankings

Where Google: Gemini 2.5 Pro places across the 4 categories it ranks in. How we rank →

#CategoryScore
#5 Audio SummarizationVoice · of 19 ranked 147
#11 Video SummarizationVideo · of 25 ranked 147
#14 TranscriptionVoice · of 19 ranked 115
#14 TTS ReplacementVoice · of 19 ranked 115

Similar models