Powered by Google
Nano Banana 2 (Gemini 3.1 Flash Image Preview)
- Text Generation
Nano Banana 2 (Gemini 3.1 Flash Image Preview) is Google DeepMind’s image generation and editing model built on the Gemini 3.1 Flash architecture, optimized for fast, cost‑efficient, high‑quality visuals. It balances strong multimodal understanding with 4K-capable output and low latency for both text-to-image and image-edit tasks.
About the model
What is Nano Banana 2 (Gemini 3.1 Flash Image Preview)?
Nano Banana 2 (Gemini 3.1 Flash Image Preview) is a Google DeepMind model for high-speed, high-quality image understanding, generation, and editing built on the Gemini 3.1 Flash family. It is mainly used for text-to-image generation, enabling rapid creation of detailed images with controllable aspect ratios and resolutions up to 4K. It is also used for image editing workflows, where users supply reference or input images for transformations, variations, and iterative refinements within apps and APIs. As part of the Gemini 3.1 model line, it succeeds earlier Gemini image capabilities and sits alongside the higher-end Nano Banana Pro image models.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Supports interactive, multi-turn dialogue, following instructions and maintaining context for tasks like Q&A, drafting, and brainstorming.
-
Image Understanding
Interprets images to identify objects, scenes, and visual details, enabling visual question answering and description tasks.
-
Optical Character Recognition
Reads and extracts text from images, including screenshots and documents, enabling search, analysis, and transformation of visual text content.
-
Code and Tools
Helps write and reason about code, and orchestrate external tools or APIs by interpreting structured instructions and outputs.
-
Language Translation
Translates between multiple natural languages while preserving meaning and tone, supporting cross-lingual understanding and communication tasks.
Use cases
6 Most Valuable Use Cases
- Marketing Visual Creation
- Product Mockup Design
- UI Layout Ideation
- Grounded Image Generation
- Image Editing & Variants
- High-Volume Asset Production
Transparent pricing
Cost Comparison
LLM API offers the lowest effective cost and latency for Nano Banana 2–class vision models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120 img/min | 99.99% | $0.30/1K img | $0.00 | 16 images / 128K tokens equivalent |
| Global | ~200ms | ~60 img/min | 99.9% | ~$0.60/1K img | $0.00 | ~16 images / 128K tokens equivalent | |
| Vertex AI | US East | ~220ms | ~48 img/min | 99.9% | ~$0.65/1K img | $0.00 | ~16 images / 128K tokens equivalent |
| AWS Bedrock (equivalent vision model) | US East | ~240ms | ~45 img/min | 99.9% | ~$0.80/1K img | $0.00 | ~8 images / 128K tokens equivalent |
| Azure OpenAI (equivalent vision model) | Global | ~230ms | ~50 img/min | 99.9% | ~$0.75/1K img | $0.00 | ~8 images / 128K tokens equivalent |
Performance benchmarks
Technical Specifications
| Metric | Nano Banana 2 (Gemini 3.1 Flash Image Preview) | GPT-4o mini (Image Preview) | Claude 3.5 Haiku (Vision) |
|---|---|---|---|
| Latency per Image | ~250ms | ~220ms | ~260ms |
| Context Window | 128K | 128K | 200K |
| Max Resolution | 2K | 2K | 2K |
| Price per Image | $0.002 | $0.002 | $0.003 |
| Supported Formats | PNG, JPG, WEBP | PNG, JPG, WEBP | PNG, JPG, WEBP |
| Throughput | 80 img/s | 90 img/s | 70 img/s |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 3.8B
- Prompt tokens processed (last 30 days)
- 240M
- Completion tokens generated (last 30 days)
- 9.5M
- API requests served (last 30 days)
- 99.8%
- Avg uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent AI Routing
Automatically route each request to the optimal model and provider based on latency, cost, and quality — without changing your integration or redeploying.
One endpoint, every model -
Cost-Aware Orchestration
Dynamically choose cheaper equivalents, downgrade for non-critical paths, and enforce budgets with per-route policies so you never get surprised by your AI bill.
Lower spend, same output -
Resilient Fallback Flows
Define automatic cross-provider fallbacks and retries so traffic fails over seamlessly during outages, rate limits, or model errors — no manual playbooks required.
Zero-downtime AI calls -
Full-Stack Observability
Get unified traces, logs, metrics, and payload sampling across all providers to debug latency, failures, and regressions from a single, model-agnostic view.
See every token, everywhere -
Task-Level Abstractions
Describe tasks like chat, embedding, or tool-calling once and let LLM.API handle provider-specific quirks, parameters, and response formats for you.
Code to tasks, not vendors -
High-Throughput Batching
Batch thousands of requests per task across providers with built-in retries, rate-limit smoothing, and streaming results to maximize throughput and minimize cost.
Scale to millions of calls
Decision guide
When to Use — When NOT to Use
Use it if...
- You need fast, low-cost image understanding for previews, tagging, and lightweight classification.
- You need to quickly analyze UI mockups or app screenshots for basic structure.
- You need visual content checks, like detecting obvious unsafe or off-brand imagery.
- Your use case involves generating short descriptions or captions based on images.
- Your use case involves prototyping visual features before upgrading to heavier multimodal models.
- You need to enrich images with simple metadata, labels, or alt-text at scale.
Avoid if...
- You need state-of-the-art reasoning over complex diagrams, technical schematics, or dense charts.
- Your workload requires very long multimodal context, like many related images plus documents.
- You need highly accurate OCR on small text, dense tables, or multilingual documents.
- Your workload requires creative image generation rather than understanding or preview analysis.
- You need robust domain-specific visual reasoning, such as medical, legal, or industrial inspection.
- Your workload requires consistent, production-grade decisions on safety-critical or regulated visual content.
FAQ
Frequently Asked Questions
-
What is Nano Banana 2 (Gemini 3.1 Flash Image Preview)?
Nano Banana 2 (Gemini 3.1 Flash Image Preview) is a Google multimodal model optimized for fast, low-cost text and image understanding via LLM.API.
-
What is Nano Banana 2 (Gemini 3.1 Flash Image Preview) best suited for?
It is best for latency-sensitive applications needing quick image interpretation, lightweight vision-language reasoning, and inexpensive high-volume text processing.
-
What context window does Nano Banana 2 (Gemini 3.1 Flash Image Preview) support via LLM.API?
Nano Banana 2 (Gemini 3.1 Flash Image Preview) supports a 32K token context window through LLM.API.
-
How fast is Nano Banana 2 (Gemini 3.1 Flash Image Preview) in terms of latency?
It is tuned for very low end-to-end latency, making it suitable for real-time or interactive user experiences.
-
What input and output modalities does Nano Banana 2 (Gemini 3.1 Flash Image Preview) support?
The model accepts text and image inputs and returns text outputs, enabling vision-language workflows.
-
How is Nano Banana 2 (Gemini 3.1 Flash Image Preview) priced on LLM.API?
Pricing is usage-based per token and image processed, with exact rates available in the LLM.API Google models pricing table.
-
How do I call Nano Banana 2 (Gemini 3.1 Flash Image Preview) through LLM.API?
Use the LLM.API chat or completions endpoint specifying the Nano Banana 2 model name, sending text and optional image inputs in the request body.
-
How does Nano Banana 2 (Gemini 3.1 Flash Image Preview) compare to larger Gemini models?
Compared to larger Gemini models, it trades some reasoning depth and creativity for significantly lower latency and cost.
-
What are the main limitations of Nano Banana 2 (Gemini 3.1 Flash Image Preview)?
It may struggle with highly complex reasoning, long multi-step problem solving, and domain-expert tasks compared to larger frontier models.
-
Can Nano Banana 2 (Gemini 3.1 Flash Image Preview) be used for structured tool-calling or function calling?
Yes, you can use LLM.API tool-calling interfaces, but the model is mainly optimized for lightweight text and image understanding.
