Google: Nano Banana 2 (Gemini 3.1 Flash Image)
Google's Nano Banana 2 is a multimodal model from Google that accepts both text and image inputs and returns text outputs. It offers a 131,072-token context window and up to 65,536 completion tokens, and it supports reasoning. Tool use and structured output are not supported, which narrows its fit for agentic pipelines or applications that require schema-constrained responses. At $0.50 per million input tokens and $3.00 per million output tokens, it sits in the budget-to-mid tier for input costs but carries a higher output rate that matters for verbose workloads. There is currently no independent benchmark coverage, so performance relative to competitors is unverified. Teams that need image-plus-text reasoning at moderate cost and can work without tool calling may want to shortlist it, but those who rely on benchmark data to justify model selection should treat it as unproven until third-party evaluations are available.
- Model ID
- google/gemini-3.1-flash-image
- Vendor
- Tokenizer
- Gemini
- Input Modalities
- image, text
- Output Modalities
- image, text
- Max Output
- 65,536 tokens
- Tool Calling
- not supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Strong choice for
Category rankings
Where Google: Nano Banana 2 (Gemini 3.1 Flash Image) places across the 1 category it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #4 | Image GenerationVision · of 8 ranked | 99 |