Powered by Thenlper

GTE-Large

  • Text Generation

GTE-Large is a general-purpose English text embedding model from Thenlper based on the General Text Embeddings (GTE) architecture. It produces 1,024-dimensional sentence embeddings optimized for semantic similarity and retrieval tasks.

Start Using API

What is GTE-Large?

GTE-Large is a BERT-based General Text Embeddings model released by Thenlper that generates 1,024-dimensional sentence and document embeddings for English text. It is mainly used for information retrieval and semantic search, where dense vector representations are required to match queries with relevant passages or documents. It is also applied to tasks such as semantic textual similarity, clustering, reranking, and various downstream applications evaluated on the MTEB benchmark. GTE-Large belongs to the GTE family of models introduced in the paper “Towards General Text Embeddings with Multi-stage Contrastive Learning,” alongside smaller variants like GTE-Base and GTE-Small.

5 Core Capabilities

  • Text Embedding

    Encodes English sentences, paragraphs, and moderate-length documents into dense 1024-dimensional vectors for downstream semantic tasks.

  • Semantic Similarity

    Generates embeddings enabling accurate semantic textual similarity comparisons between sentence or document pairs using vector distance metrics.

  • Information Retrieval

    Produces high-quality embeddings optimized for retrieval pipelines, improving search ranking and relevance over traditional lexical approaches.

  • Reranking Support

    Provides rich semantic embeddings that can rerank candidate search or recommendation results for better ordering and relevance.

  • Clustering Usage

    Offers consistent vector representations suitable for clustering texts into semantically coherent groups in analytics or discovery workflows.

6 Most Valuable Use Cases

  • Semantic Search
  • Information Retrieval
  • Semantic Reranking
  • Text Clustering
  • Text Similarity Scoring
  • General Embedding Tasks

Cost Comparison

LLM API offers the lowest embedding prices and best performance for GTE-Large–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms ~15K tokens/s 99.99% $0.03 per 1M tokens $0.03 per 1M tokens ~8K tokens
Thenlper (Direct) Global ~220ms ~5K tokens/s ~99.5% ~$0.10 per 1M tokens ~$0.10 per 1M tokens ~8K tokens
OpenAI (text-embedding-3-large equivalent) Global ~200ms ~10K tokens/s 99.9% $0.13 per 1M tokens $0.13 per 1M tokens ~8K tokens
AWS Bedrock (similar embedding model) US East ~250ms ~8K tokens/s 99.9% ~$0.20 per 1M tokens ~$0.20 per 1M tokens ~8K tokens

Technical Specifications

Metric GTE-Large (Thenlper) text-embedding-3-large (OpenAI) bge-large-en-v1.5 (BAAI)
Dimensions 1024 3072 1024
Max Input Tokens ~8K 8K ~8K
Price per 1M Tokens ~$0.02 $0.13 ~$0.01
Avg Latency ~120ms ~200ms ~150ms
Throughput ~1,500 tps ~800 tps ~1,200 tps
Uptime ~99.5% 99.9% ~99.0%

30-day usage via LLM API

1.9B
Prompt tokens processed (30 days)
11.4M
Embedding API requests (30 days)
420K
Unique applications using GTE-Large (30 days)
99.95%
Average API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the optimal provider and model based on latency, cost, or quality—without changing your integration or redeploying code.

    One endpoint, every model
  • Cost-Aware Orchestration

    Control spend with smart tiering, per-route budgets, and provider mix policies that automatically balance price versus performance across all your AI workloads.

    Cut cost, keep quality
  • Resilient Fallbacks

    Define multi-provider fallback chains so requests transparently fail over on errors, rate limits, or outages—keeping your AI features reliable in production.

    Stay online, automatically
  • End-to-End Observability

    Get unified logs, metrics, and traces across providers with request-level insights into tokens, latency, errors, and model behavior in one place.

    See every token
  • Task-Level Abstractions

    Call high-level tasks—chat, generation, embeddings, tools—instead of vendor-specific APIs, so you can swap models or providers without rewriting business logic.

    Code to tasks, not vendors
  • High-Throughput Batch

    Run large-scale batch jobs with automatic chunking, retries, and concurrency control to fully utilize provider limits while keeping throughput predictable.

    Scale jobs, not code

When to Use — When NOT to Use

Use it if...

  • You need a general-purpose text embedding model for semantic search or retrieval.
  • You need multilingual sentence embeddings covering many languages in a single model.
  • You need relatively lightweight embeddings that are cheaper than very large transformers.
  • Your use case involves clustering or deduplicating large text corpora by semantic similarity.
  • Your use case involves reranking search results using cosine similarity between query and documents.
  • You need an open-source, locally deployable embedding model without proprietary dependencies.

Avoid if...

  • You need state-of-the-art retrieval quality matching the newest large proprietary embedding models.
  • Your workload requires embeddings for very long documents far beyond typical context limits.
  • You need cross-modal embeddings that jointly handle text and images in one space.
  • You need domain-specialized embeddings for code, biology, or legal texts out-of-the-box.
  • Your workload requires strict latency guarantees on edge devices with extremely limited compute.
  • You need embeddings that are continuously updated and versioned as a managed cloud service.

Frequently Asked Questions

  • What is GTE-Large?

    GTE-Large is a sentence embedding model by Thenlper optimized for high-quality text similarity, retrieval, and semantic search tasks.

  • What is GTE-Large best suited for?

    GTE-Large is best for generating dense vector embeddings for search, clustering, recommendation, and RAG retrieval over large text corpora.

  • What modalities does GTE-Large support?

    GTE-Large is a text-only model that accepts natural language input and outputs fixed-size vector embeddings.

  • How do I access GTE-Large through LLM.API?

    You call the LLM.API embeddings endpoint with the GTE-Large model name, passing your text inputs and receiving embedding vectors in the response.

  • How does GTE-Large compare to similar embedding models?

    GTE-Large typically offers strong semantic retrieval quality comparable to other large general-purpose embedding models, with competitive performance on common benchmark datasets.

  • What is the context window of GTE-Large?

    GTE-Large is generally used on short to medium-length texts, and very long documents should be chunked before embedding.

  • How fast is GTE-Large and what latency should I expect via LLM.API?

    Latency depends on LLM.API infrastructure and batch size, but GTE-Large is designed for practical real-time or near-real-time embedding workloads.

  • What does GTE-Large cost to use on LLM.API?

    Pricing for GTE-Large is determined by LLM.API and is typically based on the number of tokens or characters embedded per request.

  • Does GTE-Large support batch embedding through LLM.API?

    Yes, you can send multiple input texts in a single embeddings request to LLM.API to get batched GTE-Large embeddings.

  • What are the main limitations of GTE-Large?

    GTE-Large cannot generate or understand images, may underperform on highly domain-specific jargon, and does not perform generative text completion.

Start in 2 lines of code

Get My API Key