Gemini Embedding 2

Text Embeddings

Gemini Embedding 2 is Google's natively multimodal embedding model that maps text, images, video, audio, and documents into a single semantic vector space. It is notable for unifying many media types in one model to power cross-modal search and retrieval.

Start Using API

API Performance

Latency: ~0.6s avg response
Input: $0.02 per 1K characters
Uptime: 99% 99%

About the model

What is Gemini Embedding 2?

Gemini Embedding 2 is a proprietary multimodal embedding model from Google that produces numerical vector representations for text, images, audio, video, and documents in a unified space. Its main use cases include powering retrieval-augmented generation, semantic search, recommendation, and classification across mixed media, and enabling cross-modal applications like using a text query to retrieve relevant images or video clips. It is part of Google’s Gemini Embedding family and succeeds earlier text-focused Gemini embedding models.

Input / Output

Input

Text (natural language strings)
Images (e.g. JPEG, PNG)
Audio files
Video files
Documents (e.g. PDF and similar file types)

Output

Dense embedding vectors (numeric representations)

Model capabilities

5 Core Capabilities

Text Embeddings

Generates dense vector representations of text inputs optimized for semantic similarity, clustering, search, and other retrieval-augmented applications.
Multilingual Support

Creates embeddings for text in many languages, enabling cross-lingual search, retrieval, and clustering across diverse multilingual content.
Long Context Handling

Encodes relatively long documents into embeddings, supporting use cases like document search, recommendation, and large-scale corpus analysis.
Code Representation

Produces embeddings for source code snippets, improving code search, code recommendation, and semantic understanding across programming languages.
Text-Image Alignment

Supports joint embedding space for text and images, enabling multimodal retrieval like image search based on natural language queries.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Document Clustering
Legal Case Retrieval
Regulation Change Monitoring
Product Recommendation Ranking
Multilingual Text Similarity

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for Gemini-class embeddings.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120K tokens/s	99.99%	$0.05	$0.00	512K tokens
Google	Global	~150ms	~60K tokens/s	99.9%	~$0.13	$0.00	~307K tokens
Vertex AI (Google Cloud)	US East	~160ms	~55K tokens/s	99.9%	~$0.15	$0.00	~307K tokens
AWS Bedrock (equivalent embedding model)	US East	~180ms	~50K tokens/s	99.9%	~$0.12	$0.00	~100K tokens
Azure AI (equivalent embedding model)	EU West	~190ms	~45K tokens/s	99.9%	~$0.11	$0.00	~100K tokens

Performance benchmarks

Technical Specifications

Metric	Gemini Embedding 2	OpenAI text-embedding-3-large	Cohere Embed v3 English	AWS Titan Text Embeddings V2
Embedding Dimensions	3072	3072	1024	1024
Max Input Tokens	8,192	—	—	8,000
Price per 1M Tokens (Input)	$0.02	$0.13	$0.10	$0.12
Price per 1M Tokens (Output)	—	$0.13	—	—
Modalities Supported	Text, Image	Text	Text	Text
Throughput	—	—	—	—
Avg Latency	—	—	—	—
Service Uptime (SLA)	—	—	—	—

30-day usage via LLM API

3.8B: Text chunks embedded (30 days)
520M: API requests (30 days)
45K: Active developer accounts (30 days)
99.97%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, capability, and policies—no client changes, just better defaults.
One endpoint, every model
Cost-Aware Controls

Define per-project or per-tenant budgets, choose cost ceilings, and let LLM.API pick the cheapest model that still meets your quality and latency targets.
Lower spend, same output
Resilient Fallback Logic

Eliminate single-vendor outages with built-in failover across providers, automatic retries, and policy-based degradation that keeps your product responsive.
Never ship 500s again
End-to-End Observability

Get unified logs, traces, and metrics for every provider—latency, errors, token usage, and prompts—all correlated to requests and tenants in one place.
See every token flow
Task-Level Orchestration

Describe tasks, constraints, and tools once; LLM.API handles model selection, tool calling, and execution flow so you focus on product logic, not glue code.
Ship workflows, not wiring
High-Throughput Batch APIs

Process millions of inferences efficiently with bulk submission, concurrency control, and automatic chunking tuned for each provider’s limits and quotas.
Scale from 10 to millions

Decision guide

When to Use — When NOT to Use

Use it if...

You need general-purpose text embeddings for semantic search, clustering, or retrieval applications.
You need multilingual embeddings that handle many languages consistently within a single vector space.
Your use case involves building recommendation systems based on textual similarity or user profiles.
You need embeddings optimized for low latency and reasonable cost on Google Cloud.
Your use case involves hybrid search, combining Gemini Embedding 2 with keyword or metadata filters.
You need embeddings well-integrated with other Google Vertex AI or Gemini-based workflows.
Your use case involves encoding short queries and longer documents into the same embedding space.

Avoid if...

You need to embed images, audio, or video rather than purely textual content.
Your workload requires full generative capabilities like conversation, code synthesis, or content creation.
You need ultra-long-context document understanding beyond the maximum token limits of embeddings.
Your workload requires highly domain-specific vectors trained on proprietary data you fully control.
You need on-premise deployment without relying on Google-managed cloud infrastructure or APIs.
Your workload requires strict vendor neutrality, avoiding lock-in to any specific cloud provider.
You need binary or sparse representations instead of dense floating-point embeddings for storage efficiency.

FAQ

Frequently Asked Questions

What is Gemini Embedding 2?

Gemini Embedding 2 is Google’s latest text and code embedding model designed to generate dense vector representations for search, retrieval, and semantic similarity.
What input modalities does Gemini Embedding 2 support?

Gemini Embedding 2 supports text and code inputs only; it does not embed images, audio, or other modalities.
How do I access Gemini Embedding 2 through LLM.API?

You call the unified LLM.API embeddings endpoint with the provider set to Google and model set to Gemini Embedding 2.
What is the context window of Gemini Embedding 2?

Gemini Embedding 2 supports input sequences up to 8,192 tokens, after which inputs must be truncated or chunked.
How fast is Gemini Embedding 2 for generating embeddings via LLM.API?

Embedding requests typically return in tens of milliseconds to low hundreds of milliseconds per batch, depending on batch size and network latency.
How is pricing for Gemini Embedding 2 handled on LLM.API?

LLM.API charges per 1,000 input tokens for Gemini Embedding 2, with the exact rate shown in your LLM.API pricing and usage dashboard.
How does Gemini Embedding 2 compare to other embedding models on LLM.API?

Gemini Embedding 2 offers strong multilingual and code understanding, often outperforming many older open-source embedding models in retrieval and semantic similarity benchmarks.
What are the main limitations of Gemini Embedding 2?

Gemini Embedding 2 cannot generate text, has a fixed maximum context length, and may encode provider-specific biases present in its training data.
Can I use Gemini Embedding 2 for multilingual applications?

Yes, Gemini Embedding 2 supports many languages and produces a shared embedding space suitable for cross-lingual retrieval and semantic search.
Does Gemini Embedding 2 support batching through LLM.API?

Yes, you can send an array of input texts in a single embeddings request to Gemini Embedding 2 to reduce per-item latency and cost.

Start in 2 lines of code

Get My API Key

Gemini Embedding 2

What is Gemini Embedding 2?

5 Core Capabilities

Text Embeddings

Multilingual Support

Long Context Handling

Code Representation

Text-Image Alignment

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Controls

Resilient Fallback Logic

End-to-End Observability

Task-Level Orchestration

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code