Powered by Google
Gemini Embedding 2 Preview
Gemini Embedding 2 Preview is Google’s first natively multimodal embedding model, mapping text, images, video, audio, and documents into a shared vector space. It is offered in public preview via the Gemini API and Google Cloud/Vertex AI for advanced retrieval and analytics workloads.
About the model
What is Gemini Embedding 2 Preview?
Gemini Embedding 2 Preview is an embedding generation model from Google that produces unified vector representations for multiple modalities including text, images, video, audio, and documents. It is mainly used for multimodal retrieval, search, and recommendation systems that need to compare or rank heterogeneous content in a common embedding space. It is also used for tasks such as semantic similarity, clustering, classification, and analytics over large, mixed-media corpora. It belongs to the Gemini model family as the second-generation embedding model and the first natively multimodal variant built on the Gemini architecture.
Model capabilities
5 Core Capabilities
-
Text Embedding
Generates dense vector representations of text inputs for tasks like semantic search, classification, and retrieval-augmented generation.
-
Multilingual Support
Produces embeddings for many languages, enabling cross-lingual semantic search and understanding across diverse international text corpora.
-
Document Retrieval
Creates embeddings usable in vector databases to power fast, relevant document and passage retrieval for downstream applications.
-
Code Representation
Embeds source code snippets, enabling semantic code search, code clustering, and mapping between natural language queries and code.
-
Content Clustering
Supports grouping similar texts by embedding proximity, enabling topic clustering, recommendation, and deduplication in large datasets.
Use cases
6 Most Valuable Use Cases
- Semantic Text Search
- Question Answer Retrieval
- Product Recommendation Matching
- Document Similarity Clustering
- User Intent Tagging
- Multilingual Text Embedding
Transparent pricing
Cost Comparison
LLM API offers the lowest embedding costs and latency among major Gemini Embedding 2–class providers.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120k tokens/s | 99.99% | $0.05 per 1M tokens | $0.00 per 1M tokens | ~1M tokens |
| Global | ~150ms | ~80k tokens/s | 99.9% | $0.13 per 1M tokens | $0.00 per 1M tokens | ~1M tokens | |
| OpenAI | Global | ~180ms | ~70k tokens/s | 99.9% | $0.10 per 1M tokens | $0.00 per 1M tokens | ~100K tokens |
| Azure OpenAI | US East | ~190ms | ~65k tokens/s | 99.9% | ~$0.11 per 1M tokens | $0.00 per 1M tokens | ~100K tokens |
| Anthropic | US West | ~200ms | ~60k tokens/s | 99.9% | ~$0.12 per 1M tokens | $0.00 per 1M tokens | ~200K tokens |
Performance benchmarks
Technical Specifications
| Metric | Gemini Embedding 2 Preview | text-embedding-3-large (OpenAI) | text-embedding-004 (Google Vertex) |
|---|---|---|---|
| Dimensions | 3072 | 3072 | 768 |
| Max Input Tokens | 8K | 8K | 8K |
| Price per 1M Tokens | $0.05 | $0.13 | $0.05 |
| Avg Latency | ~120ms | ~150ms | ~130ms |
| Throughput | ~800 tps | ~700 tps | ~750 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 12.4B
- Embedding tokens processed (30 days)
- 7.8M
- API requests served (30 days)
- 145K
- Active developer accounts (30 days)
- 99.9%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route each request to the optimal model across providers using performance, price, or custom rules—without changing your integration or redeploying code.
One endpoint, every model -
Cost-Aware Orchestration
Control spend with tiered routing, usage limits, and per-project policies so teams can experiment with premium models while keeping budgets predictable and enforceable.
Max performance, controlled spend -
Resilient Fallbacks
Configure automatic provider and model fallbacks so production traffic keeps flowing through alternative backends when primary models rate limit, degrade, or go offline.
Never drop a request -
End-to-End Observability
Inspect logs, traces, tokens, and latency per request across all providers in one place, enabling fast debugging, regression detection, and performance tuning.
See every token, everywhere -
Task-Level Abstractions
Define high-level tasks like chat, embed, classify, or generate and let LLM.API pick the right model and parameters so your code stays clean and portable.
Code to tasks, not models -
High-Throughput Batch
Run massive batch jobs for embeddings, generations, or classifications with automatic sharding, retries, and concurrency control, dramatically cutting run times and operational overhead.
Millions of calls, one job
Decision guide
When to Use — When NOT to Use
Use it if...
- You need general-purpose text embeddings for semantic search, clustering, or retrieval-augmented generation.
- You need a preview of Gemini-family embeddings to prototype next-generation semantic search.
- Your use case involves building vector search over short to medium-length English documents.
- Your use case involves multimodel experimentation where you will later standardize on Google.
- You need to benchmark Google’s latest embedding model against existing embeddings in your stack.
- Your use case involves low-latency semantic similarity queries backed by Google Cloud infrastructure.
Avoid if...
- You need a fully production-hardened, non-preview embedding model with strict stability guarantees.
- You need embeddings that are backward-compatible with previous Google production embedding releases.
- Your workload requires guaranteed long-term model availability without potential breaking preview changes.
- You need domain-specific embeddings already fine-tuned for specialized fields like legal or medical.
- Your workload requires an embedding model extensively documented and supported as generally available.
- You need fully validated multilingual performance benchmarks beyond what preview documentation currently provides.
FAQ
Frequently Asked Questions
-
What is Gemini Embedding 2 Preview?
Gemini Embedding 2 Preview is a Google embedding model designed to generate vector representations of text for search, retrieval, recommendation, and clustering.
-
What modalities does Gemini Embedding 2 Preview support?
Gemini Embedding 2 Preview currently supports text input only when accessed via LLM.API.
-
How do I access Gemini Embedding 2 Preview through LLM.API?
You call the unified embeddings endpoint on LLM.API and set the model parameter to "google/gemini-embedding-2-preview".
-
What is Gemini Embedding 2 Preview best suited for?
It is best for semantic search, document retrieval, RAG knowledge bases, deduplication, and measuring similarity between user queries and content.
-
What is the context window of Gemini Embedding 2 Preview?
Gemini Embedding 2 Preview typically supports input texts up to several thousand tokens; very long documents should be chunked client-side before embedding.
-
How fast is Gemini Embedding 2 Preview via LLM.API?
Latency is generally low enough for real-time semantic search, with most requests completing in tens to hundreds of milliseconds depending on batch size.
-
How is Gemini Embedding 2 Preview priced on LLM.API?
Pricing is usage-based per input token or character, with the exact rate displayed in the Gemini Embedding 2 Preview section of LLM.API pricing.
-
Can I batch multiple texts in a single Gemini Embedding 2 Preview request?
Yes, you can send an array of input strings in one request to efficiently compute multiple embeddings.
-
How does Gemini Embedding 2 Preview compare to other embedding models on LLM.API?
It offers strong semantic quality and compatibility with Google’s Gemini ecosystem, while other models may prioritize lower cost or smaller embedding dimensions.
-
Does Gemini Embedding 2 Preview support multilingual text?
Gemini Embedding 2 Preview supports many languages, but quality and coverage can vary by language, so task-specific evaluation is recommended.
-
What are the main limitations of Gemini Embedding 2 Preview?
It does not generate text, can struggle with very long or highly structured documents, and embedding quality may degrade outside supported languages or domains.
-
Can I use Gemini Embedding 2 Preview embeddings for production recommendation systems?
Yes, embeddings are suitable for production retrieval and recommendation workloads when combined with a vector database or approximate nearest neighbor index.
