E5-Base-v2

Text Embeddings

E5-Base-v2 is an English sentence and paragraph embedding model from Intfloat that encodes text into a 768-dimensional dense vector space, optimized for high-quality semantic similarity tasks.

Start Using API

API Performance

Latency: ~0.35s avg embedding time per 1K tokens on A100
Context: 512 max tokens per input text
Input: Free per 1M tokens (self-hosted, open-source)
Output: Free per 1M tokens (embeddings only)
Uptime: 99% 99%

About the model

What is E5-Base-v2?

E5-Base-v2 is a transformer-based text embedding model that maps English sentences and paragraphs to 768-dimensional vectors for semantic representation. It is mainly used for semantic search, document and passage retrieval, similarity scoring, and clustering in information retrieval systems. It also serves as a backbone encoder for downstream tasks such as reranking, retrieval-augmented generation, and domain-specific search applications. E5-Base-v2 belongs to Intfloat’s E5 family of embedding models, which includes earlier E5 variants and larger v2 models like E5-Large-v2.

Input / Output

Input

Text inputs (English sentences or paragraphs, typically prefixed with 'query:' or 'passage:')

Output

Text embeddings (768-dimensional numeric vectors)

Model capabilities

5 Core Capabilities

Text Embedding

Encodes English sentences and paragraphs into 768-dimensional dense vectors for downstream machine learning and NLP applications.
Semantic Search

Generates embeddings optimized for semantic search, enabling retrieval of relevant documents based on meaning rather than keywords.
Sentence Similarity

Produces high-quality embeddings suitable for computing semantic similarity scores between sentences, queries, and documents.
Text Clustering

Supports grouping related texts by embedding them in a shared vector space, facilitating unsupervised clustering and topic exploration.
Vector Retrieval

Integrates into retrieval pipelines as a dense retriever model, powering vector databases and hybrid search systems.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Document Retrieval RAG
Text Similarity Scoring
Content Clustering
Topic Classification
Change / Trend Monitoring

Transparent pricing

Cost Comparison

LLM API offers the lowest embedding costs and latency for E5-Base-v2–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	6000 tps	99.99%	$0.05	$0.00	4096 tokens
Intfloat	Global	~220ms	~2500 tps	~99.9%	~$0.10	~$0.00	4096 tokens
OpenAI (text-embedding-3-small)	Global	~250ms	~3000 tps	99.9%	$0.02	$0.00	8192 tokens
Azure OpenAI (embedding)	US East	~260ms	~2800 tps	99.9%	~$0.025	$0.00	8192 tokens
AWS Bedrock (Cohere Embed)	US West	~300ms	~2200 tps	99.9%	~$0.03	$0.00	~4096 tokens

Performance benchmarks

Technical Specifications

Metric	E5-Base-v2 (Intfloat)	text-embedding-3-small (OpenAI)	all-MiniLM-L6-v2 (SentenceTransformers)
Dimensions	768	1536	384
Max Input Tokens	~512	8K	~256
Price per 1M Tokens	~$0.10	$0.02	~$0.05
Throughput	~1.5K tps	~5K tps	~2K tps
Avg Latency	~80ms	~50ms	~60ms
Uptime	~99.5%	~99.9%	~99.0%

30-day usage via LLM API

6.8B: Prompt tokens processed (30 days)
12.5M: API requests served (30 days)
310K: Unique developers & teams (30 days)
99.8%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the best model across providers based on latency, cost, or quality. One integration, continuously optimized decisions.
One endpoint, all models.
Cost-Aware Orchestration

Control and minimize spend with per-model pricing visibility, routing policies, and automatic fallbacks to cheaper equivalents without code changes.
Reduce AI cost at scale.
Resilient Fallback Logic

Keep your app online with built-in retries and cross-provider failover when a model or region degrades, no custom reliability code required.
Never ship single-vendor risk.
End-to-End Observability

Trace every call with latency, errors, tokens, and provider breakdowns in one place. Debug, optimize, and compare models with production-grade telemetry.
See every token, everywhere.
Task-Level Abstractions

Describe what you need—chat, generation, extraction, tools—and let LLM.API select and tune the right models and prompts for each task.
Think tasks, not providers.
High-Throughput Batch

Submit thousands of requests in a single batch API call with smart chunking, parallelism, and retries to saturate provider capacity safely.
Batch at production scale.

Decision guide

When to Use — When NOT to Use

Use it if...

You need a strong general-purpose text embedding model for semantic search applications.
You need multilingual sentence embeddings covering many languages with a single compact model.
Your use case involves building retrieval-augmented generation systems needing high-quality dense retrieval.
You need cost-efficient embeddings for large-scale document indexing and similarity search.
Your use case involves clustering or topic modeling on short to medium-length text passages.
You need to power recommendation or matching systems based on semantic text similarity.
Your use case involves reranking candidate documents using cosine similarity between embeddings.

Avoid if...

You need a generative language model for text completion, dialogue, or content creation.
Your workload requires reasoning over very long documents beyond typical embedding input limits.
You need domain-specialized embeddings, like code or biomedical text, with state-of-the-art performance.
Your workload requires real-time token-by-token streaming or interactive conversational behavior.
You need structured outputs such as JSON, SQL queries, or function-call arguments directly.
Your workload requires fine-grained token-level tasks like sequence labeling or tagging.
You need cross-modal embeddings aligning text with images, audio, or video representations.

FAQ

Frequently Asked Questions

What is E5-Base-v2?

E5-Base-v2 is an Intfloat text-embedding model designed for high-quality semantic search, retrieval, and clustering tasks.
What tasks is E5-Base-v2 best suited for?

E5-Base-v2 is best for dense retrieval, semantic similarity, reranking, and building vector search over documents, queries, and short passages.
How is E5-Base-v2 priced when used through LLM.API?

E5-Base-v2 usage on LLM.API is billed per input token or character, following LLM.API’s unified metered pricing for embedding models.
What is the context window of E5-Base-v2?

E5-Base-v2 typically supports input texts up to a few thousand tokens, after which inputs should be chunked before embedding.
How fast is E5-Base-v2 in terms of latency?

E5-Base-v2 generally provides low-latency embedding generation suitable for real-time or near-real-time search applications, depending on request size and concurrency.
What modalities does E5-Base-v2 support?

E5-Base-v2 is a text-only model that converts text inputs into dense vector embeddings.
How do I call E5-Base-v2 via the LLM.API gateway?

You can select the E5-Base-v2 model name in LLM.API’s embeddings endpoint, passing your text inputs and receiving embedding vectors in the response.
How does E5-Base-v2 compare to larger embedding models?

E5-Base-v2 generally offers a strong quality–performance tradeoff, with smaller size and lower cost than many larger embedding models while maintaining competitive retrieval quality.
What are the main limitations of E5-Base-v2?

E5-Base-v2 may struggle with very long documents, highly specialized domains, and tasks requiring generative output or multimodal understanding.
Can I use E5-Base-v2 for multilingual embeddings?

E5-Base-v2 is primarily optimized for English, so performance on other languages may be less reliable compared with dedicated multilingual embedding models.

Start in 2 lines of code

Get My API Key

E5-Base-v2

What is E5-Base-v2?

5 Core Capabilities

Text Embedding

Semantic Search

Sentence Similarity

Text Clustering

Vector Retrieval

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Logic

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code