Powered by Google
Gemini 3.1 Pro Preview
- Text Generation
Gemini 3.1 Pro Preview is a preview large language model from Google’s Gemini family, offering advanced reasoning and multimodal capabilities for early experimentation and feedback. As a preview model, its behavior and performance may change as Google continues development before general availability.
About the model
What is Gemini 3.1 Pro Preview?
Gemini 3.1 Pro Preview is a Google Gemini family large language model released in preview as a step forward in core reasoning and complex problem-solving. It is designed for demanding use cases like autonomous software engineering, multi-step agentic workflows, and other advanced reasoning tasks across domains such as science, research, and engineering. It is also used as the high-end “Pro” model behind Google products like the Gemini app, Vertex AI, and Google AI Studio during its preview phase. As part of the Gemini 3.x Pro line, it succeeds Gemini 3 Pro and precedes the general-availability Gemini 3.1 Pro and later 3.5 Pro models within Google’s Gemini model family.
Model capabilities
5 Core Capabilities
-
Advanced Reasoning
Performs complex logical reasoning and problem solving, excelling on benchmarks like ARC-AGI-2 and SWE-Bench for difficult tasks.
-
Multimodal Input
Understands text, code, images, audio, video, and PDFs within a very long context window for rich cross-modal analysis.
-
Document Comprehension
Processes and synthesizes information from large documents and datasets, supporting enterprise knowledge tasks and technical analysis.
-
Coding Assistance
Supports code understanding and generation, autonomous software engineering tasks, and tool-assisted code execution workflows.
-
Multilingual Skills
Handles multiple languages for reading and generation, enabling cross-language understanding and globally-deployed conversational applications.
Use cases
6 Most Valuable Use Cases
- Multimodal Content Generation
- Code Assistance and Debugging
- Data and Document Analysis
- Customer Support Automation
- Search and Knowledge Retrieval
- Monitoring and Alerting Workflows
Transparent pricing
Cost Comparison
LLM API offers the lowest effective cost and latency for Gemini 3.1 Pro–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 120ms | 80 tps | 99.99% | $0.30 | $0.60 | 128K |
| Global | ~220ms | ~40 tps | 99.9% | ~$0.50 | ~$1.50 | 128K | |
| Vertex AI (Google Cloud) | US East | ~260ms | ~35 tps | 99.9% | ~$0.55 | ~$1.60 | 128K |
| Fireworks AI | US West | ~200ms | ~50 tps | 99.9% | ~$0.45 | ~$1.40 | 64K |
Performance benchmarks
Technical Specifications
| Metric | Gemini 3.1 Pro Preview | GPT-4.1 | Claude 3.5 Sonnet |
|---|---|---|---|
| Avg Latency | ~250ms | ~300ms | ~280ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | $0.40 | $5.00 | $3.00 |
| Output Price ($/1M) | $1.20 | $15.00 | $15.00 |
| Max Output Tokens | 8K | 4K | 4K |
| Throughput | ~50 tps | ~40 tps | ~35 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 62B
- Prompt tokens processed (last 30 days)
- 410M
- Completion tokens generated (last 30 days)
- 7.8M
- API requests served (last 30 days)
- 99.8%
- Average API uptime
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route each request to the optimal model across providers based on latency, cost, or quality—without changing your integration or redeploying.
One endpoint, every model -
Cost-Aware Orchestration
Control spend with per-route cost caps, dynamic model downgrades, and usage insights so you ship rich AI features without surprise bills.
Cut spend, keep quality -
Automatic Provider Fallback
If a provider throttles or fails, LLM.API seamlessly retries on backup models, keeping your AI workflows online without custom failover logic.
Resilience by default -
End-to-End Observability
Trace every request across models and providers with rich logs, metrics, and timelines to debug prompts, tune routing, and prove reliability in production.
See every token -
Task-Centric Abstractions
Call higher-level tasks—chat, RAG, tools, moderation—instead of raw models, letting LLM.API manage prompts, memory, and orchestration under a stable interface.
Code to tasks, not models -
High-Throughput Batch APIs
Process thousands of prompts in parallel with batch operations, reducing overhead, smoothing rate limits, and maximizing throughput for large-scale workloads.
Scale from day one
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a general-purpose LLM from Google with strong text generation and understanding.
- You need good integration with other Google Cloud services and existing GCP infrastructure.
- Your use case involves multimodal inputs like combining text with images or screenshots.
- You need a model suitable for chatbots, assistants, and interactive web or mobile experiences.
- Your use case involves summarizing, classifying, or extracting information from medium-length documents.
- You need a widely supported, mainstream model with extensive community examples and tooling.
Avoid if...
- You need guaranteed mature, production-hardened behavior rather than a preview-stage model.
- Your workload requires strict, contractually defined SLAs and long-term backward compatibility guarantees.
- You need highly specialized domain performance better served by fine-tuned or niche expert models.
- Your workload requires running fully on-premise or outside Google Cloud’s managed environments.
- You need fully transparent, fine-grained control over training data, weights, and model internals.
- Your workload requires ultra-low-latency inference for high-frequency trading or hard real-time systems.
FAQ
Frequently Asked Questions
-
What is Gemini 3.1 Pro Preview?
Gemini 3.1 Pro Preview is a Google frontier language model optimized for high‑quality reasoning, coding, and general-purpose chat use cases.
-
What modalities does Gemini 3.1 Pro Preview support via LLM.API?
Through LLM.API, Gemini 3.1 Pro Preview currently supports text input and output, with image and other modalities exposed as the provider enables them.
-
How is Gemini 3.1 Pro Preview typically priced on LLM.API?
Gemini 3.1 Pro Preview is billed on a pay-as-you-go per-token basis, with separate input and output token rates defined by LLM.API.
-
What context window does Gemini 3.1 Pro Preview support?
Gemini 3.1 Pro Preview supports a large context window suitable for multi-thousand token prompts and long conversations, as configured by LLM.API.
-
How fast is Gemini 3.1 Pro Preview in terms of latency?
Typical latency is comparable to other large frontier models, with first-token times dependent on prompt size and current LLM.API and Google load.
-
What is Gemini 3.1 Pro Preview best suited for?
It excels at multi-step reasoning, complex code generation, data analysis, and high-quality natural language generation across many domains.
-
How do I call Gemini 3.1 Pro Preview through LLM.API?
You select the model name "google/gemini-3.1-pro-preview" (or similar identifier) in LLM.API and send standard chat or completion-style requests.
-
How does Gemini 3.1 Pro Preview compare to GPT-4.1 or Claude 3.5?
Gemini 3.1 Pro Preview targets similar advanced reasoning and coding capabilities, but performance, cost, and latency vary by task and provider configuration.
-
Does Gemini 3.1 Pro Preview support streaming responses on LLM.API?
Yes, when enabled in your LLM.API request, Gemini 3.1 Pro Preview can return tokens incrementally for lower perceived latency.
-
What are the main limitations of Gemini 3.1 Pro Preview?
It can hallucinate, may contain training-data biases, and should not be relied on for authoritative legal, medical, or safety-critical decisions.
