What modalities does Gemini 3.1 Pro Preview support via LLM.API?

Through LLM.API, Gemini 3.1 Pro Preview currently supports text input and output, with image and other modalities exposed as the provider enables them.

How is Gemini 3.1 Pro Preview typically priced on LLM.API?

Gemini 3.1 Pro Preview is billed on a pay-as-you-go per-token basis, with separate input and output token rates defined by LLM.API.

What context window does Gemini 3.1 Pro Preview support?

Gemini 3.1 Pro Preview supports a large context window suitable for multi-thousand token prompts and long conversations, as configured by LLM.API.

How fast is Gemini 3.1 Pro Preview in terms of latency?

Typical latency is comparable to other large frontier models, with first-token times dependent on prompt size and current LLM.API and Google load.

What is Gemini 3.1 Pro Preview best suited for?

It excels at multi-step reasoning, complex code generation, data analysis, and high-quality natural language generation across many domains.

How do I call Gemini 3.1 Pro Preview through LLM.API?

You select the model name "google/gemini-3.1-pro-preview" (or similar identifier) in LLM.API and send standard chat or completion-style requests.

How does Gemini 3.1 Pro Preview compare to GPT-4.1 or Claude 3.5?

Gemini 3.1 Pro Preview targets similar advanced reasoning and coding capabilities, but performance, cost, and latency vary by task and provider configuration.

Does Gemini 3.1 Pro Preview support streaming responses on LLM.API?

Yes, when enabled in your LLM.API request, Gemini 3.1 Pro Preview can return tokens incrementally for lower perceived latency.

What are the main limitations of Gemini 3.1 Pro Preview?

It can hallucinate, may contain training-data biases, and should not be relied on for authoritative legal, medical, or safety-critical decisions.

Gemini 3.1 Pro Preview

Text Generation

Gemini 3.1 Pro Preview is a preview large language model from Google’s Gemini family, offering advanced reasoning and multimodal capabilities for early experimentation and feedback. As a preview model, its behavior and performance may change as Google continues development before general availability.

Start Using API

API Performance

Latency: ~0.8s time to first token
Context: 1M token context
Input: ~$2.00 per 1M tokens
Output: ~$12.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Gemini 3.1 Pro Preview?

Gemini 3.1 Pro Preview is an experimental version of Google’s Gemini 3.1 Pro large language model made available for limited testing. It is primarily used by developers and researchers to explore its capabilities in tasks such as code assistance, structured reasoning, and information retrieval. It is also used to prototype multimodal and agentic applications so Google can refine quality, safety, and performance. It belongs to the Gemini model family, following earlier Gemini 1.x and 2.x generations and the non-preview Gemini Pro variants.

Input / Output

Input

Text prompts and documents
Images (via multimodal input)
Audio (via multimodal input)
Video (via multimodal input)
Files such as PDFs (via file input)

Output

Natural language responses and chat-style output
Programming code output
Structured or analytical text suitable for charts/tables

Model capabilities

5 Core Capabilities

Advanced Reasoning

Performs complex logical reasoning and problem solving, excelling on benchmarks like ARC-AGI-2 and SWE-Bench for difficult tasks.
Multimodal Input

Understands text, code, images, audio, video, and PDFs within a very long context window for rich cross-modal analysis.
Document Comprehension

Processes and synthesizes information from large documents and datasets, supporting enterprise knowledge tasks and technical analysis.
Coding Assistance

Supports code understanding and generation, autonomous software engineering tasks, and tool-assisted code execution workflows.
Multilingual Skills

Handles multiple languages for reading and generation, enabling cross-language understanding and globally-deployed conversational applications.

Use cases

6 Most Valuable Use Cases

Multimodal Content Generation
Code Assistance and Debugging
Data and Document Analysis
Customer Support Automation
Search and Knowledge Retrieval
Monitoring and Alerting Workflows

Transparent pricing

Cost Comparison

LLM API offers the lowest effective cost and latency for Gemini 3.1 Pro–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	80 tps	99.99%	$0.30	$0.60	128K
Google	Global	~220ms	~40 tps	99.9%	~$0.50	~$1.50	128K
Vertex AI (Google Cloud)	US East	~260ms	~35 tps	99.9%	~$0.55	~$1.60	128K
Fireworks AI	US West	~200ms	~50 tps	99.9%	~$0.45	~$1.40	64K

Performance benchmarks

Technical Specifications

Metric	Gemini 3.1 Pro Preview	GPT-4.1	Claude 3.5 Sonnet
Avg Latency	~250ms	~300ms	~280ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.40	$5.00	$3.00
Output Price ($/1M)	$1.20	$15.00	$15.00
Max Output Tokens	8K	4K	4K
Throughput	~50 tps	~40 tps	~35 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

62B: Prompt tokens processed (last 30 days)
410M: Completion tokens generated (last 30 days)
7.8M: API requests served (last 30 days)
99.8%: Average API uptime

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the optimal model across providers based on latency, cost, or quality—without changing your integration or redeploying.
One endpoint, every model
Cost-Aware Orchestration

Control spend with per-route cost caps, dynamic model downgrades, and usage insights so you ship rich AI features without surprise bills.
Cut spend, keep quality
Automatic Provider Fallback

If a provider throttles or fails, LLM.API seamlessly retries on backup models, keeping your AI workflows online without custom failover logic.
Resilience by default
End-to-End Observability

Trace every request across models and providers with rich logs, metrics, and timelines to debug prompts, tune routing, and prove reliability in production.
See every token
Task-Centric Abstractions

Call higher-level tasks—chat, RAG, tools, moderation—instead of raw models, letting LLM.API manage prompts, memory, and orchestration under a stable interface.
Code to tasks, not models
High-Throughput Batch APIs

Process thousands of prompts in parallel with batch operations, reducing overhead, smoothing rate limits, and maximizing throughput for large-scale workloads.
Scale from day one

Decision guide

When to Use — When NOT to Use

Use it if...

You need a general-purpose LLM from Google with strong text generation and understanding.
You need good integration with other Google Cloud services and existing GCP infrastructure.
Your use case involves multimodal inputs like combining text with images or screenshots.
You need a model suitable for chatbots, assistants, and interactive web or mobile experiences.
Your use case involves summarizing, classifying, or extracting information from medium-length documents.
You need a widely supported, mainstream model with extensive community examples and tooling.

Avoid if...

You need guaranteed mature, production-hardened behavior rather than a preview-stage model.
Your workload requires strict, contractually defined SLAs and long-term backward compatibility guarantees.
You need highly specialized domain performance better served by fine-tuned or niche expert models.
Your workload requires running fully on-premise or outside Google Cloud’s managed environments.
You need fully transparent, fine-grained control over training data, weights, and model internals.
Your workload requires ultra-low-latency inference for high-frequency trading or hard real-time systems.

FAQ

Frequently Asked Questions

What is Gemini 3.1 Pro Preview?

Gemini 3.1 Pro Preview is a Google frontier language model optimized for high‑quality reasoning, coding, and general-purpose chat use cases.
What modalities does Gemini 3.1 Pro Preview support via LLM.API?

Through LLM.API, Gemini 3.1 Pro Preview currently supports text input and output, with image and other modalities exposed as the provider enables them.
How is Gemini 3.1 Pro Preview typically priced on LLM.API?

Gemini 3.1 Pro Preview is billed on a pay-as-you-go per-token basis, with separate input and output token rates defined by LLM.API.
What context window does Gemini 3.1 Pro Preview support?

Gemini 3.1 Pro Preview supports a large context window suitable for multi-thousand token prompts and long conversations, as configured by LLM.API.
How fast is Gemini 3.1 Pro Preview in terms of latency?

Typical latency is comparable to other large frontier models, with first-token times dependent on prompt size and current LLM.API and Google load.
What is Gemini 3.1 Pro Preview best suited for?

It excels at multi-step reasoning, complex code generation, data analysis, and high-quality natural language generation across many domains.
How do I call Gemini 3.1 Pro Preview through LLM.API?

You select the model name "google/gemini-3.1-pro-preview" (or similar identifier) in LLM.API and send standard chat or completion-style requests.
How does Gemini 3.1 Pro Preview compare to GPT-4.1 or Claude 3.5?

Gemini 3.1 Pro Preview targets similar advanced reasoning and coding capabilities, but performance, cost, and latency vary by task and provider configuration.
Does Gemini 3.1 Pro Preview support streaming responses on LLM.API?

Yes, when enabled in your LLM.API request, Gemini 3.1 Pro Preview can return tokens incrementally for lower perceived latency.
What are the main limitations of Gemini 3.1 Pro Preview?

It can hallucinate, may contain training-data biases, and should not be relied on for authoritative legal, medical, or safety-critical decisions.

Start in 2 lines of code

Get My API Key

Gemini 3.1 Pro Preview

What is Gemini 3.1 Pro Preview?

5 Core Capabilities

Advanced Reasoning

Multimodal Input

Document Comprehension

Coding Assistance

Multilingual Skills

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Automatic Provider Fallback

End-to-End Observability

Task-Centric Abstractions

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code