Powered by BAAI

bge-base-en-v1.5

  • Text Embeddings

bge-base-en-v1.5 is a base-sized English text embedding model from BAAI’s BGE (BAAI General Embedding) series, optimized for semantic similarity and retrieval. It generates 768-dimensional embeddings for tasks like search, clustering, and reranking.

Start Using API

What is bge-base-en-v1.5?

bge-base-en-v1.5 is an English language embedding model developed by BAAI as part of its BGE general embedding series, transforming text into 768-dimensional vectors optimized for semantic similarity. It is mainly used for information retrieval and semantic search, where both queries and documents are embedded into a shared vector space for relevance ranking. It is also applied in downstream tasks such as clustering, reranking, and recommendation systems that rely on dense text representations. It belongs to the FlagEmbedding/BGE family alongside related variants like bge-small-en-v1.5 and bge-large-en-v1.5.

5 Core Capabilities

  • Text Embeddings

    Converts English sentences and passages into 768-dimensional dense vectors capturing semantic meaning for downstream similarity-based applications.

  • Semantic Search

    Supports semantic search by embedding queries and documents into a shared space, enabling retrieval by meaning rather than exact keywords.

  • Sentence Similarity

    Measures similarity between English texts by comparing their embeddings, useful for clustering, deduplication, and paraphrase detection pipelines.

  • Document Retrieval

    Optimized for text retrieval tasks, ranking relevant passages or documents for a given query using vector similarity scores.

  • RAG Integration

    Acts as the embedding backbone in retrieval-augmented generation systems, efficiently indexing and retrieving knowledge for larger language models.

6 Most Valuable Use Cases

  • Semantic Document Search
  • Question Answer Retrieval
  • Text Clustering Analysis
  • RAG Knowledge Base
  • Recommendation Matching
  • Duplicate Ticket Detection

Cost Comparison

LLM API offers the lowest embedding prices and best performance for bge-base-en-v1.5-class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~80ms ~8,000 tps 99.99% $0.0100 $0.0100 8K tokens
BAAI Global ~140ms ~4,000 tps ~99.9% ~$0.0130 ~$0.0130 8K tokens
OpenAI Global ~160ms ~3,000 tps 99.9% ~$0.0200 ~$0.0200 8K tokens
Azure AI US East ~170ms ~2,500 tps 99.9% ~$0.0220 ~$0.0220 8K tokens
Replicate Global ~190ms ~2,000 tps ~99.5% ~$0.0250 ~$0.0250 8K tokens

Technical Specifications

Metric bge-base-en-v1.5 (BAAI) all-MiniLM-L6-v2 (SBERT) text-embedding-3-small (OpenAI)
Dimensions 768 384 1536
Max Input Tokens ~512 ~256 8K
Price per 1M Tokens ~$0.05 ~$0.00 ~$0.02
Avg Latency per 1K Tokens ~80ms ~60ms ~90ms
Throughput ~2.5K tps ~3K tps ~2K tps
Uptime ~99.5% ~99.0% ~99.9%

30-day usage via LLM API

620M
Embedding tokens processed (30 days)
5.4M
API requests (30 days)
41.5K
Active developer accounts (30 days)
99.95%
Avg API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the optimal provider and model based on latency, cost, or performance policies—without changing your application code.

    One endpoint, every model
  • Cost-Aware Orchestration

    Control spend with per-route budgets, transparent usage metrics, and intelligent downshifting to cheaper models when quality thresholds are safely met.

    Optimize spend by default
  • Resilient Fallback Flows

    Define multi-provider fallbacks that auto-trigger on errors, timeouts, or degraded responses so your critical AI paths keep working in production.

    No single point of failure
  • End-to-End Observability

    Trace every request across models and providers with logs, metrics, and structured events to debug failures, tune prompts, and prove SLAs.

    See every token hop
  • Task-Level Abstractions

    Codify tasks like chat, generation, ranking, and tools once, then swap models or providers behind the scenes without touching business logic.

    Code to tasks, not models
  • High-Throughput Batch APIs

    Ship massive workloads through a single batch call with automatic chunking, retries, and concurrency control tuned for throughput and reliability.

    Batch at production scale

When to Use — When NOT to Use

Use it if...

  • You need a strong English sentence-embedding model for general semantic similarity tasks.
  • You need inexpensive, fast vectorization for large-scale retrieval or RAG pipelines.
  • Your use case involves clustering or deduplicating many short English texts or titles.
  • Your use case involves building semantic search over FAQs, documentation, or support tickets.
  • You need a widely adopted open-source baseline embedding model with good community benchmarks.
  • Your use case involves re-ranking small candidate sets using cosine similarity of embeddings.

Avoid if...

  • You need multilingual embeddings beyond English, covering many languages with consistent performance.
  • Your workload requires domain-specialized embeddings, like biomedical or legal text understanding.
  • You need cross-modal embeddings aligning text with images, audio, or other modalities.
  • You need extremely high-dimensional, state-of-the-art embeddings for nuanced reasoning-heavy tasks.
  • Your workload requires very long-context document representation beyond what base models handle well.
  • You need supervised task-specific models, such as direct question answering or classification.

Frequently Asked Questions

  • What is bge-base-en-v1.5?

    bge-base-en-v1.5 is a 768-dimensional English text embedding model from BAAI optimized for retrieval, semantic search, and text similarity tasks.

  • What is bge-base-en-v1.5 best suited for when used via LLM.API?

    It is best suited for building vector search, dense retrieval, reranking pipelines, semantic clustering, and recommendation systems on English text.

  • What context window should I assume when using bge-base-en-v1.5 for embeddings?

    bge-base-en-v1.5 is typically used with inputs up to around 512 tokens, so you should chunk longer documents before embedding.

  • What modalities does bge-base-en-v1.5 support?

    bge-base-en-v1.5 supports only text-to-vector embeddings and does not handle images, audio, or code execution.

  • How is bge-base-en-v1.5 priced on LLM.API?

    Pricing is usage-based per embedded token and may differ from BAAI’s own deployment, so check the LLM.API pricing page for current rates.

  • What latency should I expect from bge-base-en-v1.5 on LLM.API?

    You can generally expect low, sub-second latency for short texts, depending on request batch size and your network conditions.

  • How do I call bge-base-en-v1.5 through LLM.API?

    Specify the model name "bge-base-en-v1.5" in the embeddings endpoint of LLM.API and pass your English text as input.

  • How does bge-base-en-v1.5 compare to larger BGE models?

    Compared to larger BGE variants, it offers smaller embeddings and faster inference at the cost of slightly lower retrieval accuracy.

  • Can I use bge-base-en-v1.5 for multilingual text?

    It is primarily trained for English, so performance on non-English text will generally be weaker than on English inputs.

  • What limitations should I be aware of when using bge-base-en-v1.5?

    It does not generate text, may lose information on very long inputs, and its embeddings can reflect biases present in training data.

Start in 2 lines of code

Get My API Key