Powered by Thenlper

GTE-Base

  • Text Generation

GTE-Base by Thenlper is an English text embedding model that encodes sentences and paragraphs into 768-dimensional vectors for efficient semantic similarity and retrieval tasks. It is part of the General Text Embeddings (GTE) family trained with multi-stage contrastive learning.

Start Using API

What is GTE-Base?

GTE-Base is a BERT-based General Text Embeddings (GTE) model that maps English text into 768-dimensional dense vectors optimized for semantic representations. It is mainly used for semantic search and information retrieval, where it provides high-quality embeddings for matching queries with relevant documents. It is also widely applied to tasks such as clustering, reranking, and semantic textual similarity across diverse domains. GTE-Base belongs to the GTE model family (alongside GTE-Small and GTE-Large) introduced in the “Towards General Text Embeddings with Multi-stage Contrastive Learning” work.

5 Core Capabilities

  • Text Embedding

    Encodes English sentences and paragraphs into 768-dimensional dense vectors optimized for general-purpose semantic representation and downstream tasks.

  • Semantic Search

    Supports efficient semantic search by embedding queries and documents into a shared space to retrieve meaningfully related results.

  • Sentence Similarity

    Computes similarity between sentences or paragraphs using cosine distance in the embedding space for clustering and comparison.

  • Text Reranking

    Improves ranking of candidate texts, leveraging relevance-focused embeddings to reorder search or retrieval results more accurately.

  • Classification Support

    Provides embeddings suitable as input features for various text classification tasks across diverse domains and benchmarks.

6 Most Valuable Use Cases

  • Semantic Text Search
  • Document Clustering
  • Text Reranking
  • Duplicate Detection
  • Recommendation Matching
  • Sentence Similarity Scoring

Cost Comparison

LLM API offers the lowest embedding prices and highest performance versus comparable GTE-Base providers.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 120ms 1,200 tps 99.99% $0.03 $0.03 8K tokens
Thenlper (Original Hosting) Global ~250ms ~300 tps ~99.5% ~$0.10 ~$0.10 ~4K tokens
OpenAI (text-embedding-3-small Equivalent) Global ~220ms ~500 tps 99.9% ~$0.10 ~$0.10 8192 tokens
Cohere (embed-english-light-v3) US East ~200ms ~400 tps 99.9% ~$0.08 ~$0.08 4096 tokens
Azure OpenAI (text-embedding-3-small) Global ~190ms ~450 tps 99.9% ~$0.11 ~$0.11 8192 tokens

Technical Specifications

Metric GTE-Base (Thenlper) text-embedding-3-small (OpenAI) bge-base-en-v1.5 (BAAI)
Dimensions 768 1,536 768
Max Input Tokens ~8K ~8K ~8K
Price per 1M Tokens ~$0.10 $0.02 ~$0.05
Avg Latency ~120ms ~90ms ~130ms
Throughput ~1,500 tps ~2,000 tps ~1,200 tps
Uptime ~99.5% ~99.9% ~99.0%

30-day usage via LLM API

3.1B
Embedding tokens processed (30 days)
9.4M
API requests served (30 days)
410K
Unique developer accounts (30 days)
99.8%
Avg API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the best model across providers based on cost, latency, and quality—without changing your integration or redeploying code.

    One endpoint. Any model.
  • Cost-Aware Controls

    Enforce per-model and per-project budgets with smart price-aware routing and guardrails so you never blow through spend while still meeting performance targets.

    Predictable spend at scale.
  • Resilient Fallbacks

    Recover from provider outages or rate limits automatically with configurable fallback chains that keep your application online, responsive, and consistent under failure.

    Never go dark again.
  • Deep Observability

    Get full visibility into every call—latency, cost, provider, model, and errors—with searchable traces and metrics that plug into your existing monitoring stack.

    Trace every token.
  • Task-Aware Orchestration

    Define high-level tasks—chat, retrieval, tools, vision—and let LLM.API pick and orchestrate the right models and parameters for each use case.

    Describe intent, not models.
  • High-Throughput Batch

    Process millions of requests efficiently with server-side batching, automatic chunking, and retry logic that slashes costs and squeezes maximum throughput from every provider.

    Scale to millions safely.

When to Use — When NOT to Use

Use it if...

  • You need a lightweight general-purpose text embedding model for semantic similarity tasks.
  • You need to power semantic search over short to medium-length English text snippets.
  • You need efficient embedding generation for clustering or topic modeling at scale.
  • Your use case involves intent matching or FAQ retrieval with modest accuracy requirements.
  • Your use case involves building recommendation features based on textual content similarity.
  • You need a small, open-source embedding model that is easy to self-host.

Avoid if...

  • You need state-of-the-art embedding performance across many languages and specialized domains.
  • You need robust embeddings for very long documents or multi-page contexts.
  • Your workload requires task-specific embeddings tuned for code or mathematical reasoning.
  • You need production-grade support, SLAs, and monitoring from a major cloud provider.
  • Your workload requires real-time personalization using extremely high-precision semantic representations.
  • You need multimodal embeddings that jointly represent text with images, audio, or video.

Frequently Asked Questions

  • What is GTE-Base?

    GTE-Base is a sentence embedding model by Thenlper, optimized for generating dense text embeddings for retrieval, clustering, and semantic similarity tasks.

  • What is GTE-Base best suited for?

    GTE-Base is best for semantic search, question answering over documents, duplicate detection, and other tasks requiring high-quality sentence or passage embeddings.

  • What modalities does GTE-Base support via LLM.API?

    GTE-Base is text-only and supports embedding text inputs; it does not process images, audio, or video.

  • How does pricing for GTE-Base work on LLM.API?

    On LLM.API, GTE-Base is billed per input token processed for embeddings; check your LLM.API pricing dashboard for the current rate.

  • What is the context window of GTE-Base on LLM.API?

    GTE-Base typically supports input lengths around a few thousand tokens; very long documents should be chunked before embedding.

  • How is the latency and speed of GTE-Base through LLM.API?

    GTE-Base is lightweight and generally returns embeddings with low latency, suitable for real-time or interactive semantic search applications.

  • How do I call GTE-Base through the LLM.API platform?

    Use the LLM.API embeddings endpoint, specifying the provider as Thenlper and the model name as GTE-Base in your request parameters.

  • How does GTE-Base compare to larger embedding models?

    GTE-Base is smaller and faster than many large embedding models, often with slightly lower accuracy but significantly lower cost and latency.

  • Can I use GTE-Base for code, tables, or other non-natural-language content?

    GTE-Base is primarily trained for natural language text, so embeddings for code or highly structured data may be less accurate.

  • What are the main limitations of GTE-Base?

    GTE-Base may underperform on very specialized domains, extremely long documents, or tasks requiring deep logical reasoning beyond surface semantic similarity.

Start in 2 lines of code

Get My API Key