Gemini Embedding 2 Preview

Text Embeddings

Gemini Embedding 2 Preview is Google’s first natively multimodal embedding model, mapping text, images, video, audio, and documents into a shared vector space. It is offered in public preview via the Gemini API and Google Cloud/Vertex AI for advanced retrieval and analytics workloads.

Start Using API

API Performance

Latency: ~0.4s avg response
Context: 8K token context
Input: $0.20 per 1M tokens
Output: $0.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Gemini Embedding 2 Preview?

Gemini Embedding 2 Preview is an embedding generation model from Google that produces unified vector representations for multiple modalities including text, images, video, audio, and documents. It is mainly used for multimodal retrieval, search, and recommendation systems that need to compare or rank heterogeneous content in a common embedding space. It is also used for tasks such as semantic similarity, clustering, classification, and analytics over large, mixed-media corpora. It belongs to the Gemini model family as the second-generation embedding model and the first natively multimodal variant built on the Gemini architecture.

Input / Output

Input

Text (natural language strings)
Images (e.g. JPEG, PNG)
Video files
Audio files
Documents (PDF)

Output

Vector embeddings (numeric representations of inputs)

Model capabilities

5 Core Capabilities

Text Embedding

Generates dense vector representations of text inputs for tasks like semantic search, classification, and retrieval-augmented generation.
Multilingual Support

Produces embeddings for many languages, enabling cross-lingual semantic search and understanding across diverse international text corpora.
Document Retrieval

Creates embeddings usable in vector databases to power fast, relevant document and passage retrieval for downstream applications.
Code Representation

Embeds source code snippets, enabling semantic code search, code clustering, and mapping between natural language queries and code.
Content Clustering

Supports grouping similar texts by embedding proximity, enabling topic clustering, recommendation, and deduplication in large datasets.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Question Answer Retrieval
Product Recommendation Matching
Document Similarity Clustering
User Intent Tagging
Multilingual Text Embedding

Transparent pricing

Cost Comparison

LLM API offers the lowest embedding costs and latency among major Gemini Embedding 2–class providers.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120k tokens/s	99.99%	$0.05 per 1M tokens	$0.00 per 1M tokens	~1M tokens
Google	Global	~150ms	~80k tokens/s	99.9%	$0.13 per 1M tokens	$0.00 per 1M tokens	~1M tokens
OpenAI	Global	~180ms	~70k tokens/s	99.9%	$0.10 per 1M tokens	$0.00 per 1M tokens	~100K tokens
Azure OpenAI	US East	~190ms	~65k tokens/s	99.9%	~$0.11 per 1M tokens	$0.00 per 1M tokens	~100K tokens
Anthropic	US West	~200ms	~60k tokens/s	99.9%	~$0.12 per 1M tokens	$0.00 per 1M tokens	~200K tokens

Performance benchmarks

Technical Specifications

Metric	Gemini Embedding 2 Preview	text-embedding-3-large (OpenAI)	text-embedding-004 (Google Vertex)
Dimensions	3072	3072	768
Max Input Tokens	8K	8K	8K
Price per 1M Tokens	$0.05	$0.13	$0.05
Avg Latency	~120ms	~150ms	~130ms
Throughput	~800 tps	~700 tps	~750 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

12.4B: Embedding tokens processed (30 days)
7.8M: API requests served (30 days)
145K: Active developer accounts (30 days)
99.9%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the optimal model across providers using performance, price, or custom rules—without changing your integration or redeploying code.
One endpoint, every model
Cost-Aware Orchestration

Control spend with tiered routing, usage limits, and per-project policies so teams can experiment with premium models while keeping budgets predictable and enforceable.
Max performance, controlled spend
Resilient Fallbacks

Configure automatic provider and model fallbacks so production traffic keeps flowing through alternative backends when primary models rate limit, degrade, or go offline.
Never drop a request
End-to-End Observability

Inspect logs, traces, tokens, and latency per request across all providers in one place, enabling fast debugging, regression detection, and performance tuning.
See every token, everywhere
Task-Level Abstractions

Define high-level tasks like chat, embed, classify, or generate and let LLM.API pick the right model and parameters so your code stays clean and portable.
Code to tasks, not models
High-Throughput Batch

Run massive batch jobs for embeddings, generations, or classifications with automatic sharding, retries, and concurrency control, dramatically cutting run times and operational overhead.
Millions of calls, one job

Decision guide

When to Use — When NOT to Use

Use it if...

You need general-purpose text embeddings for semantic search, clustering, or retrieval-augmented generation.
You need a preview of Gemini-family embeddings to prototype next-generation semantic search.
Your use case involves building vector search over short to medium-length English documents.
Your use case involves multimodel experimentation where you will later standardize on Google.
You need to benchmark Google’s latest embedding model against existing embeddings in your stack.
Your use case involves low-latency semantic similarity queries backed by Google Cloud infrastructure.

Avoid if...

You need a fully production-hardened, non-preview embedding model with strict stability guarantees.
You need embeddings that are backward-compatible with previous Google production embedding releases.
Your workload requires guaranteed long-term model availability without potential breaking preview changes.
You need domain-specific embeddings already fine-tuned for specialized fields like legal or medical.
Your workload requires an embedding model extensively documented and supported as generally available.
You need fully validated multilingual performance benchmarks beyond what preview documentation currently provides.

FAQ

Frequently Asked Questions

What is Gemini Embedding 2 Preview?

Gemini Embedding 2 Preview is a Google embedding model designed to generate vector representations of text for search, retrieval, recommendation, and clustering.
What modalities does Gemini Embedding 2 Preview support?

Gemini Embedding 2 Preview currently supports text input only when accessed via LLM.API.
How do I access Gemini Embedding 2 Preview through LLM.API?

You call the unified embeddings endpoint on LLM.API and set the model parameter to "google/gemini-embedding-2-preview".
What is Gemini Embedding 2 Preview best suited for?

It is best for semantic search, document retrieval, RAG knowledge bases, deduplication, and measuring similarity between user queries and content.
What is the context window of Gemini Embedding 2 Preview?

Gemini Embedding 2 Preview typically supports input texts up to several thousand tokens; very long documents should be chunked client-side before embedding.
How fast is Gemini Embedding 2 Preview via LLM.API?

Latency is generally low enough for real-time semantic search, with most requests completing in tens to hundreds of milliseconds depending on batch size.
How is Gemini Embedding 2 Preview priced on LLM.API?

Pricing is usage-based per input token or character, with the exact rate displayed in the Gemini Embedding 2 Preview section of LLM.API pricing.
Can I batch multiple texts in a single Gemini Embedding 2 Preview request?

Yes, you can send an array of input strings in one request to efficiently compute multiple embeddings.
How does Gemini Embedding 2 Preview compare to other embedding models on LLM.API?

It offers strong semantic quality and compatibility with Google’s Gemini ecosystem, while other models may prioritize lower cost or smaller embedding dimensions.
Does Gemini Embedding 2 Preview support multilingual text?

Gemini Embedding 2 Preview supports many languages, but quality and coverage can vary by language, so task-specific evaluation is recommended.
What are the main limitations of Gemini Embedding 2 Preview?

It does not generate text, can struggle with very long or highly structured documents, and embedding quality may degrade outside supported languages or domains.
Can I use Gemini Embedding 2 Preview embeddings for production recommendation systems?

Yes, embeddings are suitable for production retrieval and recommendation workloads when combined with a vector database or approximate nearest neighbor index.

Start in 2 lines of code

Get My API Key

Gemini Embedding 2 Preview

What is Gemini Embedding 2 Preview?

5 Core Capabilities

Text Embedding

Multilingual Support

Document Retrieval

Code Representation

Content Clustering

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallbacks

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code