E5-Large-v2

Text Embeddings

E5-Large-v2 by Intfloat is a 335M-parameter English text-embedding transformer that maps text into 1024-dimensional vectors for high-accuracy semantic search and similarity tasks.

Start Using API

API Performance

Latency: ~0.15s avg encoding time per 1K tokens on GPU
Context: ~4K max input tokens
Input: Free per 1M tokens (open-source embedding model)
Output: $0.00 per 1M tokens (no generative output)
Uptime: 99% 99%

About the model

What is E5-Large-v2?

E5-Large-v2 is a large English text embedding model trained with weakly supervised contrastive pre-training to produce 1024-dimensional sentence and document embeddings. It is mainly used for semantic search and retrieval, where queries and passages are embedded and compared to find relevant results. It is also widely applied to tasks like clustering, reranking, and classification that rely on dense semantic representations. E5-Large-v2 belongs to the E5 family of text embedding models, improving on earlier variants such as e5-base-v2 and e5-small-v2.

Input / Output

Input

English text sequences (up to 512 tokens, e.g. 'query: ...', 'passage: ...')

Output

1024-dimensional text embeddings (dense numeric vectors)

Model capabilities

5 Core Capabilities

Text Embeddings

Generates 1024-dimensional dense vector embeddings for English text, suitable for downstream machine learning and representation learning applications.
Semantic Search

Supports high-quality semantic search by encoding queries and documents for vector similarity retrieval across large text corpora.
Sentence Similarity

Computes meaningful similarity between sentences or passages by comparing their embeddings, enabling clustering and paraphrase detection.
Information Retrieval

Optimized for passage retrieval tasks, including ad-hoc document ranking and open-domain question answering pipelines using dense vectors.
Benchmark Evaluation

Provides strong performance on benchmarks like BEIR and MTEB for diverse retrieval, classification, and semantic similarity tasks.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Question Answer Retrieval
Duplicate Issue Detection
Product Recommendation Ranking
Document Clustering
Multilingual Text Embedding

Transparent pricing

Cost Comparison

LLM API offers the lowest embedding prices and best performance for E5-Large-v2–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~8K tps	99.99%	$0.02 per 1M tokens	$0.02 per 1M tokens	64K tokens
Intfloat	Global	~220ms	~3K tps	99.9%	~$0.10 per 1M tokens	~$0.10 per 1M tokens	32K tokens
OpenAI (text-embedding-3-large)	Global	~250ms	~4K tps	99.9%	$0.13 per 1M tokens	$0.13 per 1M tokens	100K tokens
Cohere (embed-multilingual-light-v3)	Global	~260ms	~2.5K tps	99.9%	~$0.20 per 1M tokens	~$0.20 per 1M tokens	4K tokens
Azure OpenAI (embedding equivalent)	US East	~240ms	~3.5K tps	99.9%	~$0.16 per 1M tokens	~$0.16 per 1M tokens	16K tokens

Performance benchmarks

Technical Specifications

Metric	E5-Large-v2 (Intfloat)	text-embedding-3-large (OpenAI)	bge-large-en-v1.5 (BAAI)
Dimensions	1024	3072	1024
Max Input Tokens	~4K	8K	~4K
Price per 1M Tokens	$0.10	$0.13	$0.05
Throughput	~2K tps	~4K tps	~2.5K tps
Avg Latency	~120ms	~100ms	~130ms
Uptime	99.5%	99.9%	99.0%

30-day usage via LLM API

620M: Embedding tokens processed (30 days)
3.1M: API requests served (30 days)
41K: Unique developer accounts (30 days)
99.95%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, cost, and performance—without changing your integration or redeploying.
One endpoint, every model
Smarter Cost Control

Mix premium and budget models with dynamic routing, hard spend guards, and usage insights so you can scale AI without unpredictable cloud bills.
Optimize quality per dollar
Resilient Fallback Logic

Define provider-agnostic failover rules so if a model or region degrades, traffic is transparently retried on backups—no downtime, no manual switches.
Stay online, automatically
Full-Stack Observability

Get end-to-end traces, latency and error metrics, cost breakdowns, and structured logs for every call so you can debug and tune AI traffic in minutes.
See every token and trace
Task-Native Orchestration

Describe tasks at a higher level—chat, tools, evals, workflows—and let LLM.API select models, parameters, and prompts consistently across providers.
Tasks, not model glue
High-Throughput Batch

Submit large batches of prompts to run asynchronously across multiple models and regions with built-in retries, partial failure handling, and cost reporting.
Millions of calls, one job

Decision guide

When to Use — When NOT to Use

Use it if...

You need strong general-purpose text embeddings for semantic search across diverse domains.
You need multilingual sentence embeddings covering many languages with a single model.
Your use case involves dense retrieval or reranking for question-answering over documents.
Your use case involves clustering or deduplicating large text corpora via vector similarity.
You need an open-source embedding model compatible with common vector databases and libraries.
Your use case involves building recommendation systems based on semantic similarity between texts.

Avoid if...

You need an autoregressive language model for text generation, editing, or conversation.
Your workload requires processing images, audio, or multimodal inputs beyond plain text.
You need token-level tasks like sequence tagging, NER, or structured information extraction.
Your workload requires extremely long-context understanding far beyond typical sentence or paragraph length.
You need state-of-the-art performance on domain-specific tasks better served by specialized models.
Your workload requires on-device inference with very tight memory or latency constraints.

FAQ

Frequently Asked Questions

What is E5-Large-v2?

E5-Large-v2 is a text embedding model by Intfloat optimized for high-quality semantic search, retrieval, and clustering tasks.
What is E5-Large-v2 best suited for?

E5-Large-v2 is best for generating dense vector representations for semantic search, question answering, duplicate detection, and recommendation systems.
What is the context window or maximum input length for E5-Large-v2?

E5-Large-v2 typically supports input sequences up to around 512 tokens, after which text is truncated before embedding.
What modalities does E5-Large-v2 support?

E5-Large-v2 is a text-only model that accepts natural language or short text strings and returns numeric embedding vectors.
How is E5-Large-v2 priced when accessed through LLM.API?

LLM.API exposes E5-Large-v2 with token-based pricing, where you pay per input token embedded; check the LLM.API pricing page for exact rates.
What latency should I expect when using E5-Large-v2 via LLM.API?

For typical short texts, E5-Large-v2 usually responds in tens to a few hundreds of milliseconds, depending on load and batch size.
How do I access E5-Large-v2 through the LLM.API gateway?

You call the LLM.API embeddings endpoint, specifying the E5-Large-v2 model name and passing your input texts in the request body.
How does E5-Large-v2 compare to similar embedding models?

E5-Large-v2 generally offers strong retrieval performance versus smaller E5 variants, with higher accuracy but more compute and latency.
What are the main limitations of E5-Large-v2?

E5-Large-v2 cannot generate text, handle images or audio, and its performance may degrade on very long, noisy, or domain-specific inputs.
Can I use E5-Large-v2 for multilingual tasks through LLM.API?

E5-Large-v2 primarily targets English, and performance on other languages may be weaker compared with dedicated multilingual embedding models.

Start in 2 lines of code

Get My API Key

E5-Large-v2

What is E5-Large-v2?

5 Core Capabilities

Text Embeddings

Semantic Search

Sentence Similarity

Information Retrieval

Benchmark Evaluation

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Smarter Cost Control

Resilient Fallback Logic

Full-Stack Observability

Task-Native Orchestration

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code