google

Google: Nano Banana 2 (Gemini 3.1 Flash Image)

Google's Nano Banana 2 is a multimodal model from Google that accepts both text and image inputs and returns text outputs. It offers a 131,072-token context window and up to 65,536 completion tokens, and it supports reasoning. Tool use and structured output are not supported, which narrows its fit for agentic pipelines or applications that require schema-constrained responses. At $0.50 per million input tokens and $3.00 per million output tokens, it sits in the budget-to-mid tier for input costs but carries a higher output rate that matters for verbose workloads. There is currently no independent benchmark coverage, so performance relative to competitors is unverified. Teams that need image-plus-text reasoning at moderate cost and can work without tool calling may want to shortlist it, but those who rely on benchmark data to justify model selection should treat it as unproven until third-party evaluations are available.

Quality Score
84/100
price + capability + benchmarks
Input Price
$0.50
per 1M tokens
Output Price
$3.00
per 1M tokens
Context Window
131,072
tokens
Model ID
google/gemini-3.1-flash-image
Vendor
google
Tokenizer
Gemini
Input Modalities
image, text
Output Modalities
image, text
Max Output
65,536 tokens
Tool Calling
not supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Strong choice for

Category rankings

Where Google: Nano Banana 2 (Gemini 3.1 Flash Image) places across the 1 category it ranks in. How we rank →

#CategoryScore
#4 Image GenerationVision · of 8 ranked 99

Similar models