Powered by Intfloat

E5-Base-v2

  • Text Embeddings

E5-Base-v2 is an English sentence and paragraph embedding model from Intfloat that encodes text into a 768-dimensional dense vector space, optimized for high-quality semantic similarity tasks.

Start Using API

What is E5-Base-v2?

E5-Base-v2 is a transformer-based text embedding model that maps English sentences and paragraphs to 768-dimensional vectors for semantic representation. It is mainly used for semantic search, document and passage retrieval, similarity scoring, and clustering in information retrieval systems. It also serves as a backbone encoder for downstream tasks such as reranking, retrieval-augmented generation, and domain-specific search applications. E5-Base-v2 belongs to Intfloat’s E5 family of embedding models, which includes earlier E5 variants and larger v2 models like E5-Large-v2.

5 Core Capabilities

  • Text Embedding

    Encodes English sentences and paragraphs into 768-dimensional dense vectors for downstream machine learning and NLP applications.

  • Semantic Search

    Generates embeddings optimized for semantic search, enabling retrieval of relevant documents based on meaning rather than keywords.

  • Sentence Similarity

    Produces high-quality embeddings suitable for computing semantic similarity scores between sentences, queries, and documents.

  • Text Clustering

    Supports grouping related texts by embedding them in a shared vector space, facilitating unsupervised clustering and topic exploration.

  • Vector Retrieval

    Integrates into retrieval pipelines as a dense retriever model, powering vector databases and hybrid search systems.

6 Most Valuable Use Cases

  • Semantic Text Search
  • Document Retrieval RAG
  • Text Similarity Scoring
  • Content Clustering
  • Topic Classification
  • Change / Trend Monitoring

Cost Comparison

LLM API offers the lowest embedding costs and latency for E5-Base-v2–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 120ms 6000 tps 99.99% $0.05 $0.00 4096 tokens
Intfloat Global ~220ms ~2500 tps ~99.9% ~$0.10 ~$0.00 4096 tokens
OpenAI (text-embedding-3-small) Global ~250ms ~3000 tps 99.9% $0.02 $0.00 8192 tokens
Azure OpenAI (embedding) US East ~260ms ~2800 tps 99.9% ~$0.025 $0.00 8192 tokens
AWS Bedrock (Cohere Embed) US West ~300ms ~2200 tps 99.9% ~$0.03 $0.00 ~4096 tokens

Technical Specifications

Metric E5-Base-v2 (Intfloat) text-embedding-3-small (OpenAI) all-MiniLM-L6-v2 (SentenceTransformers)
Dimensions 768 1536 384
Max Input Tokens ~512 8K ~256
Price per 1M Tokens ~$0.10 $0.02 ~$0.05
Throughput ~1.5K tps ~5K tps ~2K tps
Avg Latency ~80ms ~50ms ~60ms
Uptime ~99.5% ~99.9% ~99.0%

30-day usage via LLM API

6.8B
Prompt tokens processed (30 days)
12.5M
API requests served (30 days)
310K
Unique developers & teams (30 days)
99.8%
Avg API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Dynamically route each request to the best model across providers based on latency, cost, or quality. One integration, continuously optimized decisions.

    One endpoint, all models.
  • Cost-Aware Orchestration

    Control and minimize spend with per-model pricing visibility, routing policies, and automatic fallbacks to cheaper equivalents without code changes.

    Reduce AI cost at scale.
  • Resilient Fallback Logic

    Keep your app online with built-in retries and cross-provider failover when a model or region degrades, no custom reliability code required.

    Never ship single-vendor risk.
  • End-to-End Observability

    Trace every call with latency, errors, tokens, and provider breakdowns in one place. Debug, optimize, and compare models with production-grade telemetry.

    See every token, everywhere.
  • Task-Level Abstractions

    Describe what you need—chat, generation, extraction, tools—and let LLM.API select and tune the right models and prompts for each task.

    Think tasks, not providers.
  • High-Throughput Batch

    Submit thousands of requests in a single batch API call with smart chunking, parallelism, and retries to saturate provider capacity safely.

    Batch at production scale.

When to Use — When NOT to Use

Use it if...

  • You need a strong general-purpose text embedding model for semantic search applications.
  • You need multilingual sentence embeddings covering many languages with a single compact model.
  • Your use case involves building retrieval-augmented generation systems needing high-quality dense retrieval.
  • You need cost-efficient embeddings for large-scale document indexing and similarity search.
  • Your use case involves clustering or topic modeling on short to medium-length text passages.
  • You need to power recommendation or matching systems based on semantic text similarity.
  • Your use case involves reranking candidate documents using cosine similarity between embeddings.

Avoid if...

  • You need a generative language model for text completion, dialogue, or content creation.
  • Your workload requires reasoning over very long documents beyond typical embedding input limits.
  • You need domain-specialized embeddings, like code or biomedical text, with state-of-the-art performance.
  • Your workload requires real-time token-by-token streaming or interactive conversational behavior.
  • You need structured outputs such as JSON, SQL queries, or function-call arguments directly.
  • Your workload requires fine-grained token-level tasks like sequence labeling or tagging.
  • You need cross-modal embeddings aligning text with images, audio, or video representations.

Frequently Asked Questions

  • What is E5-Base-v2?

    E5-Base-v2 is an Intfloat text-embedding model designed for high-quality semantic search, retrieval, and clustering tasks.

  • What tasks is E5-Base-v2 best suited for?

    E5-Base-v2 is best for dense retrieval, semantic similarity, reranking, and building vector search over documents, queries, and short passages.

  • How is E5-Base-v2 priced when used through LLM.API?

    E5-Base-v2 usage on LLM.API is billed per input token or character, following LLM.API’s unified metered pricing for embedding models.

  • What is the context window of E5-Base-v2?

    E5-Base-v2 typically supports input texts up to a few thousand tokens, after which inputs should be chunked before embedding.

  • How fast is E5-Base-v2 in terms of latency?

    E5-Base-v2 generally provides low-latency embedding generation suitable for real-time or near-real-time search applications, depending on request size and concurrency.

  • What modalities does E5-Base-v2 support?

    E5-Base-v2 is a text-only model that converts text inputs into dense vector embeddings.

  • How do I call E5-Base-v2 via the LLM.API gateway?

    You can select the E5-Base-v2 model name in LLM.API’s embeddings endpoint, passing your text inputs and receiving embedding vectors in the response.

  • How does E5-Base-v2 compare to larger embedding models?

    E5-Base-v2 generally offers a strong quality–performance tradeoff, with smaller size and lower cost than many larger embedding models while maintaining competitive retrieval quality.

  • What are the main limitations of E5-Base-v2?

    E5-Base-v2 may struggle with very long documents, highly specialized domains, and tasks requiring generative output or multimodal understanding.

  • Can I use E5-Base-v2 for multilingual embeddings?

    E5-Base-v2 is primarily optimized for English, so performance on other languages may be less reliable compared with dedicated multilingual embedding models.

Start in 2 lines of code

Get My API Key