bge-large-en-v1.5

Text Embeddings

bge-large-en-v1.5 is a large English text embedding model from BAAI’s BGE (BAAI General Embedding) family that maps text into 1,024-dimensional dense vectors, optimized for semantic search and retrieval.

Start Using API

API Performance

Latency: ~0.15s avg embedding time for 1K tokens on GPU
Context: 512 max input tokens
Input: Free per 1M tokens (open-source model)
Output: Free per 1M embedding vectors
Uptime: 99% 99%

About the model

What is bge-large-en-v1.5?

bge-large-en-v1.5 is a 335M-parameter English sentence embedding model from BAAI that converts text into 1,024-dimensional vectors for similarity-based applications. It is mainly used for dense retrieval in retrieval-augmented generation (RAG) systems, semantic search, and document or passage ranking. The model is also applied to clustering, recommendation, and other tasks that rely on high-quality text similarity representations. It belongs to the BGE (BAAI General Embedding) series, a family that includes earlier English and Chinese variants and later multilingual successors such as bge-m3.

Input / Output

Input

English text to be embedded

Output

Vector embeddings

Model capabilities

5 Core Capabilities

Text Embeddings

Generates high-quality 1024-dimensional English text embeddings for sentences, paragraphs, and documents using an encoder-only architecture.
Semantic Search

Supports high-precision semantic search and retrieval by mapping related English texts to nearby vectors in embedding space.
Document Retrieval

Enables retrieval-augmented generation and knowledge base lookup by encoding long English documents into dense representations.
Similarity Matching

Performs sentence and document similarity scoring, clustering, and reranking based on distances between embedding vectors.
English Only

Specialized for English language inputs, providing optimized performance for monolingual English NLP embedding tasks.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
RAG Knowledge Retrieval
Legal Case Retrieval
Monitoring Similar Documents
Product Recommendation Matching
Clustering Text Embeddings

Transparent pricing

Cost Comparison

LLM API offers the lowest embedding prices and best SLAs for bge-large-en-v1.5–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	6000 tps	99.99%	$0.02	$0.00	8K tokens
BAAI	Global	~150ms	~2000 tps	~99.9%	$0.04	$0.00	8K tokens
OpenAI	Global	~180ms	~3000 tps	99.9%	$0.10	$0.00	8K tokens
Azure AI	US East	~200ms	~2500 tps	99.9%	$0.09	$0.00	8K tokens
AWS Bedrock	US West	~190ms	~2200 tps	99.9%	$0.08	$0.00	8K tokens

Performance benchmarks

Technical Specifications

Metric	bge-large-en-v1.5 (BAAI)	text-embedding-3-large (OpenAI)	e5-large-v2 (intfloat)
Dimensions	1024	3072	1024
Max Input Tokens	8K	8K	4K
Price per 1M Tokens	~$0.10	$0.13	~$0.10
Avg Latency	~120ms	~180ms	~140ms
Throughput	~1.2K tps	~1K tps	~900 tps
Uptime	~99.5%	~99.9%	~99.0%

30-day usage via LLM API

2.8B: Prompt tokens processed (30 days)
9.4M: API requests served (30 days)
210K: Monthly active developers
99.95%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent Model Routing

Define policies once and let LLM.API route each request to the optimal model across providers based on latency, cost, and quality—no client changes required.
One policy, many models
Cost-Aware Execution

Control spend with per-project price caps, smart model selection, and detailed usage insights so you can scale traffic without surprise bills or manual tuning.
Optimize spend by default
Resilient Fallback Flows

Automatically fail over to backup models and regions on errors or timeouts, preserving SLAs and user experience without adding complex retry logic in your code.
Stay online, automatically
End-to-End Observability

Trace every request across models, providers, and regions with structured logs, metrics, and latency breakdowns to debug issues and tune performance in production.
See every token hop
Task-Level Abstractions

Describe tasks like chat, tools, RAG, or scoring once and let LLM.API normalize prompts, parameters, and outputs across incompatible providers and model formats.
Task-first, not model-first
High-Throughput Batch Jobs

Run massive batch workloads through a single API with automatic chunking, concurrency limits, retries, and progress tracking, without maintaining custom pipelines.
Ship at batch scale

Decision guide

When to Use — When NOT to Use

Use it if...

You need strong English sentence embeddings for semantic search and retrieval-augmented generation.
You need dense vector representations to power similarity search over large text corpora.
Your use case involves clustering or deduplicating English documents based on semantic similarity.
Your use case involves reranking candidate search results using high-quality embedding similarity scores.
You need a well-known open-source English embedding model compatible with many vector databases.
Your use case involves intent matching between user queries and short English text descriptions.
You need embeddings for question-answer retrieval across FAQ pages or knowledge-base articles.

Avoid if...

You need a generative model capable of producing text, code, or structured outputs directly.
Your workload requires multilingual or non-English embeddings with strong performance across many languages.
You need ultra-long context understanding for very large documents in a single pass.
Your workload requires strict on-device or mobile deployment with very limited memory footprint.
You need task-specific fine-tuned embeddings for domains like code, biology, or legal text.
You need real-time personalization where embeddings must frequently update during a single session.
Your workload requires built-in safety classification or content filtering instead of separate moderation models.

FAQ

Frequently Asked Questions

What is bge-large-en-v1.5?

bge-large-en-v1.5 is an English sentence-embedding model by BAAI optimized for high-quality semantic similarity, retrieval, and reranking tasks.
What is bge-large-en-v1.5 best used for?

It is best for dense retrieval, semantic search, question-answer retrieval, and clustering English text by meaning rather than exact keywords.
What is the embedding dimension of bge-large-en-v1.5?

bge-large-en-v1.5 outputs 1,024-dimensional embeddings for each input text chunk.
What context length does bge-large-en-v1.5 effectively support?

It is typically used on short to medium English texts, and long documents should be chunked before embedding for best performance.
How fast is bge-large-en-v1.5 in terms of latency?

Latency depends on hardware and request batch size, but as a large embedding model it is slower than small embedding models per request.
What modalities does bge-large-en-v1.5 support?

bge-large-en-v1.5 is a text-only model that converts English text into dense vector embeddings.
How do I access bge-large-en-v1.5 through LLM.API?

Use the LLM.API embeddings endpoint, specifying provider "BAAI" and model "bge-large-en-v1.5" in your request parameters.
How is pricing for bge-large-en-v1.5 handled on LLM.API?

Pricing is metered per token or character for embedding requests, and the exact rate is defined by LLM.API’s BAAI pricing schedule.
How does bge-large-en-v1.5 compare to smaller embedding models?

It generally offers higher retrieval accuracy and semantic quality than smaller embedding models at the cost of higher compute and latency.
Can bge-large-en-v1.5 handle multilingual input?

It is primarily optimized for English; embeddings for other languages may be lower quality and are not the main target use case.
What are the main limitations of bge-large-en-v1.5?

It does not generate text, only embeddings, and may underperform on very long documents or non-English content without careful preprocessing.
Can I use bge-large-en-v1.5 for real-time applications?

Yes, but you should benchmark latency on your infrastructure and consider batching or caching to meet strict real-time requirements.

Start in 2 lines of code

Get My API Key

bge-large-en-v1.5

What is bge-large-en-v1.5?

5 Core Capabilities

Text Embeddings

Semantic Search

Document Retrieval

Similarity Matching

English Only

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent Model Routing

Cost-Aware Execution

Resilient Fallback Flows

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch Jobs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code