Gemini 3.1 Pro Preview Custom Tools

Text Generation

Gemini 3.1 Pro Preview Custom Tools is a preview large language model from Google’s Gemini 3.1 Pro line that supports integration with user-defined tools and APIs. It is notable for enabling tailored, tool-augmented workflows while providing the advanced reasoning and multimodal capabilities of the Gemini 3.1 Pro family.

Start Using API

API Performance

Latency: ~0.9s avg response
Context: ~2M token context
Input: ~$1.25 per 1M tokens
Output: ~$5.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Gemini 3.1 Pro Preview Custom Tools?

Gemini 3.1 Pro Preview Custom Tools is a Google Gemini 3.1 Pro–series model variant that allows developers to connect custom tools and APIs for augmented reasoning and action-taking. It is mainly used to build applications where the model can call external services, trigger workflows, or fetch live data through developer-defined tools. It also supports use cases that require more controlled, domain-specific behavior by combining Gemini’s core capabilities with bespoke tool integrations. It belongs to the Gemini 3.1 Pro family of models, which succeeds earlier Gemini Pro generations.

Input / Output

Input

Text prompts
Images (vision input)
Audio inputs
Video inputs
PDF and document inputs

Output

Text responses (natural language, reasoning, structured text)
Code snippets and technical text output

Model capabilities

5 Core Capabilities

General Chat

Engages in multi-turn dialogue, answering questions, following instructions, and adapting responses to user context and prior conversation.
Image Understanding

Accepts image inputs to identify objects, infer context, and answer questions about visual content when enabled by the provider.
Text Translation

Translates text between multiple languages, supporting cross-lingual understanding and communication in both short prompts and longer documents.
Tool and API Calling

Orchestrates custom tools and APIs, interpreting user requests and invoking external functions to retrieve data or perform actions.
Document Reading

Reads and extracts textual content from documents or screenshots, enabling question answering and information retrieval over provided materials.

Use cases

6 Most Valuable Use Cases

Custom workflow orchestration
Domain-specific assistants
Tool-augmented customer support
Contract review automation
Real-time case monitoring
Invoice and billing triage

Transparent pricing

Cost Comparison

LLM API offers the lowest prices and best performance for Gemini 3.1 Pro–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	120 tps	99.99%	$0.20	$0.60	200K
Google	Global	~350ms	~60 tps	99.9%	~$0.30	~$0.90	128K
Vertex AI (Google Cloud)	US Central	~380ms	~55 tps	99.9%	~$0.32	~$0.96	128K
Anthropic	US East	~300ms	~70 tps	99.9%	~$0.40	~$1.20	200K
OpenAI	Global	~250ms	~80 tps	99.9%	~$0.35	~$1.05	128K

Performance benchmarks

Technical Specifications

Metric	Gemini 3.1 Pro Preview Custom Tools	GPT-4.1	Claude 3.5 Sonnet
Avg Latency	~220ms	~250ms	~230ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.80	$5.00	$3.00
Output Price ($/1M)	$2.40	$15.00	$15.00
Max Output Tokens	4K	4K	4K
Throughput	60 tps	50 tps	45 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

62B: Prompt tokens processed (last 30 days)
21M: API requests served (last 30 days)
88B: Completion tokens generated (last 30 days)
310K: Unique developers using this model (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically direct each request to the best model across providers based on performance, latency, and cost—without changing your integration or redeploying code.
One endpoint, every model
Cost-Aware Orchestration

Optimize spend by mixing premium and budget models per request, with guardrails and policies that keep your AI bill predictable at scale.
Lower cost, same output
Resilient Fallback Flows

Design automatic provider and model failover so your workloads survive rate limits, outages, and timeouts—without complex custom retry logic.
No single point of failure
End-to-End Observability

Trace every request across providers with unified logs, metrics, and latency analysis so you can debug faster and continuously tune model performance.
See every token flow
Task-Level Abstractions

Describe what you want—chat, extraction, classification, tools—and let LLM.API pick the right model, parameters, and prompts for the job.
Think tasks, not models
High-Throughput Batch

Process millions of requests efficiently with queued, retried, and parallelized batch jobs that respect provider limits and maximize throughput automatically.
Scale from 10 to millions

Decision guide

When to Use — When NOT to Use

Use it if...

You need to call external APIs or services reliably through model-orchestrated custom tools.
You need to integrate structured business logic with natural-language understanding in one workflow.
Your use case involves multi-step workflows where the model selects and sequences tools.
You need to prototype complex agentic behaviors tightly coupled to Google Cloud or HTTP endpoints.
Your use case involves mixing standard chat, retrieval, and custom tool calls seamlessly.
You need flexible function-calling with explicit JSON schemas for predictable tool inputs and outputs.

Avoid if...

You need a fully stable, production-hardened model rather than an evolving preview release.
Your workload requires strict, audited compliance certifications beyond what preview services typically guarantee.
You need ultra-low latency, because tool-calling overhead can significantly increase response times.
Your workload requires completely offline or on-premises deployment without relying on Google services.
You need guaranteed long-term version pinning and behavior stability with no breaking changes.
Your workload requires only simple Q&A without external actions, making tool orchestration unnecessary complexity.

FAQ

Frequently Asked Questions

What is Gemini 3.1 Pro Preview Custom Tools?

Gemini 3.1 Pro Preview Custom Tools is a Google multimodal large language model with support for custom tool calling and integration via LLM.API.
What is Gemini 3.1 Pro Preview Custom Tools best suited for?

It is best for complex reasoning, multi-step tool-using workflows, and mixed text-plus-structured data applications where external tools or APIs must be orchestrated.
How is Gemini 3.1 Pro Preview Custom Tools priced when used through LLM.API?

LLM.API meters usage per token for input and output, with specific Gemini 3.1 Pro Preview Custom Tools rates shown in the LLM.API pricing section.
What context window does Gemini 3.1 Pro Preview Custom Tools support on LLM.API?

Gemini 3.1 Pro Preview Custom Tools supports a multi-thousand token context window; check the model card on LLM.API for the exact current limit.
How fast is Gemini 3.1 Pro Preview Custom Tools in terms of latency?

Typical latency is similar to other large frontier models, and depends on prompt size, output length, and any custom tool calls executed.
Which modalities does Gemini 3.1 Pro Preview Custom Tools support via LLM.API?

Through LLM.API it supports text input and output, and can interact with external tools; additional modalities depend on LLM.API’s enabled interfaces.
How do I call Gemini 3.1 Pro Preview Custom Tools from the LLM.API gateway?

Specify the model name "google/gemini-3.1-pro-preview-custom-tools" in your LLM.API request, include your API key, and send standard chat completion payloads.
How does Gemini 3.1 Pro Preview Custom Tools compare to other Gemini Pro models?

Compared to standard Gemini Pro variants, it emphasizes robust tool-calling behavior and orchestration over pure generative throughput.
What limitations should I be aware of when using Gemini 3.1 Pro Preview Custom Tools?

It can still hallucinate, miscall tools, and may not reflect real-time information; you must validate critical outputs and handle tool errors gracefully.
Can I use streaming responses with Gemini 3.1 Pro Preview Custom Tools on LLM.API?

If LLM.API enables streaming for this model, you can request streamed responses using the standard streaming flags in the API.

Start in 2 lines of code

Get My API Key

Gemini 3.1 Pro Preview Custom Tools

What is Gemini 3.1 Pro Preview Custom Tools?

5 Core Capabilities

General Chat

Image Understanding

Text Translation

Tool and API Calling

Document Reading

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code