Powered by BAAI

bge-large-en-v1.5

  • Text Embeddings

bge-large-en-v1.5 is a large English text embedding model from BAAI’s BGE (BAAI General Embedding) family that maps text into 1,024-dimensional dense vectors, optimized for semantic search and retrieval.

Start Using API

What is bge-large-en-v1.5?

bge-large-en-v1.5 is a 335M-parameter English sentence embedding model from BAAI that converts text into 1,024-dimensional vectors for similarity-based applications. It is mainly used for dense retrieval in retrieval-augmented generation (RAG) systems, semantic search, and document or passage ranking. The model is also applied to clustering, recommendation, and other tasks that rely on high-quality text similarity representations. It belongs to the BGE (BAAI General Embedding) series, a family that includes earlier English and Chinese variants and later multilingual successors such as bge-m3.

5 Core Capabilities

  • Text Embeddings

    Generates high-quality 1024-dimensional English text embeddings for sentences, paragraphs, and documents using an encoder-only architecture.

  • Semantic Search

    Supports high-precision semantic search and retrieval by mapping related English texts to nearby vectors in embedding space.

  • Document Retrieval

    Enables retrieval-augmented generation and knowledge base lookup by encoding long English documents into dense representations.

  • Similarity Matching

    Performs sentence and document similarity scoring, clustering, and reranking based on distances between embedding vectors.

  • English Only

    Specialized for English language inputs, providing optimized performance for monolingual English NLP embedding tasks.

6 Most Valuable Use Cases

  • Semantic Text Search
  • RAG Knowledge Retrieval
  • Legal Case Retrieval
  • Monitoring Similar Documents
  • Product Recommendation Matching
  • Clustering Text Embeddings

Cost Comparison

LLM API offers the lowest embedding prices and best SLAs for bge-large-en-v1.5–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 6000 tps 99.99% $0.02 $0.00 8K tokens
BAAI Global ~150ms ~2000 tps ~99.9% $0.04 $0.00 8K tokens
OpenAI Global ~180ms ~3000 tps 99.9% $0.10 $0.00 8K tokens
Azure AI US East ~200ms ~2500 tps 99.9% $0.09 $0.00 8K tokens
AWS Bedrock US West ~190ms ~2200 tps 99.9% $0.08 $0.00 8K tokens

Technical Specifications

Metric bge-large-en-v1.5 (BAAI) text-embedding-3-large (OpenAI) e5-large-v2 (intfloat)
Dimensions 1024 3072 1024
Max Input Tokens 8K 8K 4K
Price per 1M Tokens ~$0.10 $0.13 ~$0.10
Avg Latency ~120ms ~180ms ~140ms
Throughput ~1.2K tps ~1K tps ~900 tps
Uptime ~99.5% ~99.9% ~99.0%

30-day usage via LLM API

2.8B
Prompt tokens processed (30 days)
9.4M
API requests served (30 days)
210K
Monthly active developers
99.95%
Avg API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Define policies once and let LLM.API route each request to the optimal model across providers based on latency, cost, and quality—no client changes required.

    One policy, many models
  • Cost-Aware Execution

    Control spend with per-project price caps, smart model selection, and detailed usage insights so you can scale traffic without surprise bills or manual tuning.

    Optimize spend by default
  • Resilient Fallback Flows

    Automatically fail over to backup models and regions on errors or timeouts, preserving SLAs and user experience without adding complex retry logic in your code.

    Stay online, automatically
  • End-to-End Observability

    Trace every request across models, providers, and regions with structured logs, metrics, and latency breakdowns to debug issues and tune performance in production.

    See every token hop
  • Task-Level Abstractions

    Describe tasks like chat, tools, RAG, or scoring once and let LLM.API normalize prompts, parameters, and outputs across incompatible providers and model formats.

    Task-first, not model-first
  • High-Throughput Batch Jobs

    Run massive batch workloads through a single API with automatic chunking, concurrency limits, retries, and progress tracking, without maintaining custom pipelines.

    Ship at batch scale

When to Use — When NOT to Use

Use it if...

  • You need strong English sentence embeddings for semantic search and retrieval-augmented generation.
  • You need dense vector representations to power similarity search over large text corpora.
  • Your use case involves clustering or deduplicating English documents based on semantic similarity.
  • Your use case involves reranking candidate search results using high-quality embedding similarity scores.
  • You need a well-known open-source English embedding model compatible with many vector databases.
  • Your use case involves intent matching between user queries and short English text descriptions.
  • You need embeddings for question-answer retrieval across FAQ pages or knowledge-base articles.

Avoid if...

  • You need a generative model capable of producing text, code, or structured outputs directly.
  • Your workload requires multilingual or non-English embeddings with strong performance across many languages.
  • You need ultra-long context understanding for very large documents in a single pass.
  • Your workload requires strict on-device or mobile deployment with very limited memory footprint.
  • You need task-specific fine-tuned embeddings for domains like code, biology, or legal text.
  • You need real-time personalization where embeddings must frequently update during a single session.
  • Your workload requires built-in safety classification or content filtering instead of separate moderation models.

Frequently Asked Questions

  • What is bge-large-en-v1.5?

    bge-large-en-v1.5 is an English sentence-embedding model by BAAI optimized for high-quality semantic similarity, retrieval, and reranking tasks.

  • What is bge-large-en-v1.5 best used for?

    It is best for dense retrieval, semantic search, question-answer retrieval, and clustering English text by meaning rather than exact keywords.

  • What is the embedding dimension of bge-large-en-v1.5?

    bge-large-en-v1.5 outputs 1,024-dimensional embeddings for each input text chunk.

  • What context length does bge-large-en-v1.5 effectively support?

    It is typically used on short to medium English texts, and long documents should be chunked before embedding for best performance.

  • How fast is bge-large-en-v1.5 in terms of latency?

    Latency depends on hardware and request batch size, but as a large embedding model it is slower than small embedding models per request.

  • What modalities does bge-large-en-v1.5 support?

    bge-large-en-v1.5 is a text-only model that converts English text into dense vector embeddings.

  • How do I access bge-large-en-v1.5 through LLM.API?

    Use the LLM.API embeddings endpoint, specifying provider "BAAI" and model "bge-large-en-v1.5" in your request parameters.

  • How is pricing for bge-large-en-v1.5 handled on LLM.API?

    Pricing is metered per token or character for embedding requests, and the exact rate is defined by LLM.API’s BAAI pricing schedule.

  • How does bge-large-en-v1.5 compare to smaller embedding models?

    It generally offers higher retrieval accuracy and semantic quality than smaller embedding models at the cost of higher compute and latency.

  • Can bge-large-en-v1.5 handle multilingual input?

    It is primarily optimized for English; embeddings for other languages may be lower quality and are not the main target use case.

  • What are the main limitations of bge-large-en-v1.5?

    It does not generate text, only embeddings, and may underperform on very long documents or non-English content without careful preprocessing.

  • Can I use bge-large-en-v1.5 for real-time applications?

    Yes, but you should benchmark latency on your infrastructure and consider batching or caching to meet strict real-time requirements.

Start in 2 lines of code

Get My API Key