Powered by Google

Gemini Embedding 2 Preview

  • Text Embeddings

Gemini Embedding 2 Preview is Google’s first natively multimodal embedding model, mapping text, images, video, audio, and documents into a shared vector space. It is offered in public preview via the Gemini API and Google Cloud/Vertex AI for advanced retrieval and analytics workloads.

Start Using API

What is Gemini Embedding 2 Preview?

Gemini Embedding 2 Preview is an embedding generation model from Google that produces unified vector representations for multiple modalities including text, images, video, audio, and documents. It is mainly used for multimodal retrieval, search, and recommendation systems that need to compare or rank heterogeneous content in a common embedding space. It is also used for tasks such as semantic similarity, clustering, classification, and analytics over large, mixed-media corpora. It belongs to the Gemini model family as the second-generation embedding model and the first natively multimodal variant built on the Gemini architecture.

5 Core Capabilities

  • Text Embedding

    Generates dense vector representations of text inputs for tasks like semantic search, classification, and retrieval-augmented generation.

  • Multilingual Support

    Produces embeddings for many languages, enabling cross-lingual semantic search and understanding across diverse international text corpora.

  • Document Retrieval

    Creates embeddings usable in vector databases to power fast, relevant document and passage retrieval for downstream applications.

  • Code Representation

    Embeds source code snippets, enabling semantic code search, code clustering, and mapping between natural language queries and code.

  • Content Clustering

    Supports grouping similar texts by embedding proximity, enabling topic clustering, recommendation, and deduplication in large datasets.

6 Most Valuable Use Cases

  • Semantic Text Search
  • Question Answer Retrieval
  • Product Recommendation Matching
  • Document Similarity Clustering
  • User Intent Tagging
  • Multilingual Text Embedding

Cost Comparison

LLM API offers the lowest embedding costs and latency among major Gemini Embedding 2–class providers.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120k tokens/s 99.99% $0.05 per 1M tokens $0.00 per 1M tokens ~1M tokens
Google Global ~150ms ~80k tokens/s 99.9% $0.13 per 1M tokens $0.00 per 1M tokens ~1M tokens
OpenAI Global ~180ms ~70k tokens/s 99.9% $0.10 per 1M tokens $0.00 per 1M tokens ~100K tokens
Azure OpenAI US East ~190ms ~65k tokens/s 99.9% ~$0.11 per 1M tokens $0.00 per 1M tokens ~100K tokens
Anthropic US West ~200ms ~60k tokens/s 99.9% ~$0.12 per 1M tokens $0.00 per 1M tokens ~200K tokens

Technical Specifications

Metric Gemini Embedding 2 Preview text-embedding-3-large (OpenAI) text-embedding-004 (Google Vertex)
Dimensions 3072 3072 768
Max Input Tokens 8K 8K 8K
Price per 1M Tokens $0.05 $0.13 $0.05
Avg Latency ~120ms ~150ms ~130ms
Throughput ~800 tps ~700 tps ~750 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

12.4B
Embedding tokens processed (30 days)
7.8M
API requests served (30 days)
145K
Active developer accounts (30 days)
99.9%
Avg API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Dynamically route each request to the optimal model across providers using performance, price, or custom rules—without changing your integration or redeploying code.

    One endpoint, every model
  • Cost-Aware Orchestration

    Control spend with tiered routing, usage limits, and per-project policies so teams can experiment with premium models while keeping budgets predictable and enforceable.

    Max performance, controlled spend
  • Resilient Fallbacks

    Configure automatic provider and model fallbacks so production traffic keeps flowing through alternative backends when primary models rate limit, degrade, or go offline.

    Never drop a request
  • End-to-End Observability

    Inspect logs, traces, tokens, and latency per request across all providers in one place, enabling fast debugging, regression detection, and performance tuning.

    See every token, everywhere
  • Task-Level Abstractions

    Define high-level tasks like chat, embed, classify, or generate and let LLM.API pick the right model and parameters so your code stays clean and portable.

    Code to tasks, not models
  • High-Throughput Batch

    Run massive batch jobs for embeddings, generations, or classifications with automatic sharding, retries, and concurrency control, dramatically cutting run times and operational overhead.

    Millions of calls, one job

When to Use — When NOT to Use

Use it if...

  • You need general-purpose text embeddings for semantic search, clustering, or retrieval-augmented generation.
  • You need a preview of Gemini-family embeddings to prototype next-generation semantic search.
  • Your use case involves building vector search over short to medium-length English documents.
  • Your use case involves multimodel experimentation where you will later standardize on Google.
  • You need to benchmark Google’s latest embedding model against existing embeddings in your stack.
  • Your use case involves low-latency semantic similarity queries backed by Google Cloud infrastructure.

Avoid if...

  • You need a fully production-hardened, non-preview embedding model with strict stability guarantees.
  • You need embeddings that are backward-compatible with previous Google production embedding releases.
  • Your workload requires guaranteed long-term model availability without potential breaking preview changes.
  • You need domain-specific embeddings already fine-tuned for specialized fields like legal or medical.
  • Your workload requires an embedding model extensively documented and supported as generally available.
  • You need fully validated multilingual performance benchmarks beyond what preview documentation currently provides.

Frequently Asked Questions

  • What is Gemini Embedding 2 Preview?

    Gemini Embedding 2 Preview is a Google embedding model designed to generate vector representations of text for search, retrieval, recommendation, and clustering.

  • What modalities does Gemini Embedding 2 Preview support?

    Gemini Embedding 2 Preview currently supports text input only when accessed via LLM.API.

  • How do I access Gemini Embedding 2 Preview through LLM.API?

    You call the unified embeddings endpoint on LLM.API and set the model parameter to "google/gemini-embedding-2-preview".

  • What is Gemini Embedding 2 Preview best suited for?

    It is best for semantic search, document retrieval, RAG knowledge bases, deduplication, and measuring similarity between user queries and content.

  • What is the context window of Gemini Embedding 2 Preview?

    Gemini Embedding 2 Preview typically supports input texts up to several thousand tokens; very long documents should be chunked client-side before embedding.

  • How fast is Gemini Embedding 2 Preview via LLM.API?

    Latency is generally low enough for real-time semantic search, with most requests completing in tens to hundreds of milliseconds depending on batch size.

  • How is Gemini Embedding 2 Preview priced on LLM.API?

    Pricing is usage-based per input token or character, with the exact rate displayed in the Gemini Embedding 2 Preview section of LLM.API pricing.

  • Can I batch multiple texts in a single Gemini Embedding 2 Preview request?

    Yes, you can send an array of input strings in one request to efficiently compute multiple embeddings.

  • How does Gemini Embedding 2 Preview compare to other embedding models on LLM.API?

    It offers strong semantic quality and compatibility with Google’s Gemini ecosystem, while other models may prioritize lower cost or smaller embedding dimensions.

  • Does Gemini Embedding 2 Preview support multilingual text?

    Gemini Embedding 2 Preview supports many languages, but quality and coverage can vary by language, so task-specific evaluation is recommended.

  • What are the main limitations of Gemini Embedding 2 Preview?

    It does not generate text, can struggle with very long or highly structured documents, and embedding quality may degrade outside supported languages or domains.

  • Can I use Gemini Embedding 2 Preview embeddings for production recommendation systems?

    Yes, embeddings are suitable for production retrieval and recommendation workloads when combined with a vector database or approximate nearest neighbor index.

Start in 2 lines of code

Get My API Key